 Fun. Hello. I'm loud. Okay. That does help save my voice, although I'm happy the acoustics at the expo hall this year are much better than last year. I don't know how many were here last year, but when I went home after three days of talking all day last year, I was silly enough to have hung out with people afterwards in loud restaurants and then go home and have to teach music lessons, and I literally decided that that day I had to just mime things to people and demonstrate sort of what to do without actually speaking much. It was like, I don't feel quite so bad this year. So I appreciate that. Yeah, this is not the talk. I'm just bantering and testing my mic and commenting about the acoustics here and the technology. I hope the 60 Hertz hum is not too distracting. And I will, okay, so I will demonstrate this several times for people's amusement. Listen carefully to the sound of the hum and listen to it change. Well, let's see. I think it is not a loose connection. I think it is a hum, but from the thing I might have a loose, a color change. So I leave and now it changed again. I can, so I can control the sound from the speakers with the, okay, so bonus, bonus commentary about technology and sound for people who showed up early. There is obviously parallels between the issues of electricity and sound and all of these things. And so the way speakers work, of course, you have a magnet connected to the electrical inputs that basically electrify this metal chassis around the magnet and the magnets connected to paper and then the magnet vibrates, then you get the paper or whatever. And then so electricity going through the system is creating some sort of electrical field around that magnet. And there we go. We have a buzz. So I tried to, at various times, figure out how to deal with this buzz issue. And you can get little buzz, canceler types of things that try to do that. They don't always work because it depends on what the solution is. In my home teaching studio, it turns out that what I did was, at one point, figure out that if in fact I plug in my external audio and I plug that in to my PA, but I plug the PA into a different outlet, and then I plug the computer into the outlet on the opposite side of the room, then the buzz goes away. And it doesn't matter if I have the PA on, but I had to make sure that the external audio was in fact plugged into the PA because if I just tried to run the computer on its own without being plugged into the PA that was off, then I would have a buzz coming from the computer itself, which I was using or even just the headphones and in my recording. And so if I could plug in the PA but keep it off, then I will have no buzz. And sometimes that's the type of crazy things you have to do when you're trying to deal with audio, if that's hilarious. Yeah, so the issue, I think to some degree, varies a lot based on the computer you're using. So I recognize actually now, although ironically enough, someone is, what was going on there? That's funny. I'm going to turn off my networking because I have a conflict in some sense with my audio setup and my networking on this particular laptop. Looks like I got about three minutes, so I'm going to still wait until that. Yes, question. So what happened is that my particular audio interface was a powered one, so it needed to be plugged in and then I was still connected to the electrical system. And so when I switched to a different audio interface that was USB powered, then I was able to run without the buzz when everything was completely, you know, no connection to the power circuit. But it was like if I had the audio interface plugged in, then I originally was trying to think, you know, how do I just have the audio interface and the computer and maybe I plug them both in. It didn't actually work. However, it panned out to have the computer plugged into one thing and the audio interface plugged into the other one. That didn't even cancel the loop. It had to be that the audio output from the interface went to the PA. Or I had to have everything not plugged in and it was all just battery powered. Yeah. That's an interesting point. Right? Yeah, I didn't, I don't think anything, I was, well, the PA actually did have a ground. That might have been related to it, but that actually canceled it. Interesting. If I understand right, I should just go ahead and get started then? Sooner. I don't think I have any third wrongs today. So welcome, everybody. Thanks for showing up. My name is Aaron Wolfe. I am a music teacher. I live in Portland, Oregon, and I happen to also be involved in this crazy project called Snowdrift.coop, which has a booth here at the Expo as a non-profit sort of community run thing that we're trying to get off the ground. And so that's one of the big things that brought me here, but for some crazy reason I decided to also give this talk. And this is actually connected to what I do for a living because I teach music lessons. And I went through a process in 2012, basically frustrated with issues that I saw in where Apple was going and like I had been a Mac user. And that relates more to my students as much as to me and has to do with the idea that I was really happy to find that I could find programs like MuseScore and Audacity that are under the GPL and I was using them and they were neat and I could show them to my students and it is about enabling my students to use interesting technology that they use to make music. So, excuse me. So in 2012, I decided to switch to a Linux system after being frustrated with not being able to tell my students who are now switching to iPads and things that I've found this really neat software that they could use for free and everybody could share it and it was great. And Apple would not let them use it. So the only thing you can use on an iPad is the MuseScore reader thing which happens to be proprietary and it just will show you things. It won't actually let you do anything. It's not the program. And I don't like that direction. So I'm freaked out about whether Apple is going to turn the Mac into something like the iPhone in the long run. And, you know, they do because I tell people you should use Audacity. It's really cool. And then people say, oh, I tried to install it, but it gave me a warning something was wrong. It was a virus. Because, you know, it says there's a security issue. If you install this thing, it didn't come from the Mac App Store. So I could go on about that stuff, but it got me in 2012 to switch to a Linux system. And it's both very, very promising and also very troubling. So it's not as easy to use. It's not as easy to set up. There's a lot more variabilities, but there's also a lot of interesting promise. And after a decent amount of hassle figuring this out, I ended up moving to KX Studio, which is what I used today. And I will take a moment to explain to everybody what this is. There's a guy who originally lives in Portugal. I think he lives in Germany now. But KX Studio, which the K comes from the KDE desktop, but it does not, it's not a desktop specific thing. Actually is a set of repositories that you can add to any Debian based operating system. It just is a collection of settings for audio, a collection of plugins and tools and a ton of really cool things, super well maintained and packaged by this one guy who's basically volunteering his entire life to just doing this and is not actually even, I was kind of shocked to figure out how this even exists. But he lives on modest, very small amount of donations and is just a guy who doesn't even have a programming background prior to getting involved in this, picked up a Python book one day or something and started learning Linux audio and he's now one of the world's Linux audio experts and packages all of the plugins and software and things that I ever need. So you don't need to install KX Studio as a distro. You can just add the repositories to any Ubuntu or Debian based system. So if you even want a completely full FSF endorsed fully free software system, you could get Triskell or Debian core stuff and add the KX Studio repositories. But enough people were asking that he would release an ISO we can install. And that's what I'm actually using is his little I'm not really running a distro. I'm just trying to make these repositories but fine. Here's an ISO. And so that's what I'm running. It's Ubuntu based. And AV Linux is all another thing that I'll just give a shout out to it's based on Debian. And Fedora Jam uses a lot of the same stuff including some of the programs that the guy who does KX Studio put together. And that's another music distro. And there's a few others, but those are the big ones. So I will bring this slide back. Whoops, that slide didn't show my picture that's supposed to be on that slide supposed to explain the layout of let me see if it works if I do this and maybe it will. Ooh, what happened? That's really strange. It's not right. Glitches with my slides. Sorry. I will just explain. There are a whole bunch of different audio back ends that are involved in this. And so the standard systems come with pulse. One of the comment I'll make on the distro thing while I'm at it is that a lot of people use Ubuntu Studio or hear about that because Ubuntu is popular and you look up Ubuntu Studio. It has a few settings. It's not really super well maintained. It doesn't have all the stuff you'd want. So I don't oppose anybody using it. I would just say if you use Ubuntu Studio, you should also add the KX Studio repositories. And then you will have everything. At any rate, the KX Studio system allows me to manage all of this stuff. And I'm going to do a bit of a demo today. And so at this point, I'm going to stop and just ask if everybody could give me some sort of show of hands. How many of the people here today already do use and are sort of aware of the stuff in the Linux world for making music? Okay. How many people are familiar with how Jack works? Okay. A couple of you. Okay. So I'm going to go ahead and give this talk from the user perspective. This is a demo of somebody who's not a programmer, not focused on command line things, although I do now have gotten comfortable with some of that. Jack is the main thing you're going to be thinking about. But when you're talking about audio hardware, you want to deal with ALSA. And ALSA is a part of the kernel, so you don't have to worry about it. It's just a question of which hardware you have supported. So I have this focus write USB thing that's plugged into my computer. And despite that, it is not getting rid of the buzz. But the quality is a lot nicer. But you can do a lot of stuff just with the built in audio just in terms of playing around. So the easiest way to visualize this is to understand how the KX Studio tools work because they're excellent. So the first thing is this program called Cadence. And we can see here that I'm running an Ubuntu based system, even though it's actually the KX Studio thing. And that I have a low latency kernel, which helps with dealing with the things when I want rapid response, doing stuff live. And right here is a little thing to set my CPU scaler. And I can check that I'm a user in the audio group. All this stuff is set if you just use KX Studio. And so the configuration for Jack relates to the hardware. And it looks like this. And so overall, there's some engine settings, which I don't really touch very much. You can learn about that more later. But basically, I have some different interfaces. So this analog interface is my built in laptop audio. And then I have my USB audio. And if I use a duplex mode, I can actually use a different input and output if I want to play with those types of things. And then the sample rate refers to the audio rate. This is the rate of taking pressure samples if I was recording from my analog. And then buffer size is actually going to be how much latency I have. And that turns out to be calculated so that back here you can see I have a latency of about 21.3 milliseconds. And so I will demonstrate for you what the effect of that type of latency is. And so I'm going to, this is the type of thing you can do with KX Studio because everything is all set. I can just open my thing and say, I'd like to play with some, say an organ. And wait a moment, something is going too strangely with my computer here for some reason. This is Linux for you. Things are sometimes as reliable as you'd hope. Why is that not loading? What is happening? Jack is going crazy on me. I see what is happening. And so this is an example of the types of glitches you may find in using Linux. This focus rate audio interface and my computer for some reason and may relate to something about power settings lost its connection. And so what I'm going to do, this is a frustration I've had lately, is go ahead and kill the connection because Jack didn't know what to do and something went wrong with also connecting to my audio interface. And so now that I've done that, I will have to go back to here and notice that nothing is running. And that's not good. So is Jack still running? No. Jack is done. What is going on? Okay, so I'm going to bring that window back. That's actually even worse than I usually experience. How strange. So I'm going to unplug that and plug in my built-in sound. And so this is the sort of craziness that sometimes happens with Linux audio and this is not what I mean it to be. Oh boy, my entire computer is having trouble. Okay. And restarting my computer. This is not actually something I've even experienced before. And that does not happen typically with some other systems. I didn't used to have that problem with my Apple computers. And that relates to the Snowdrift dilemma, which I'll tell you about briefly while this computer starts up. We're trying to get better funding for free software that serves everybody's interests. Because I don't like this situation where I have more glitches to deal with and hardware is less well supported. But I don't like having Apple have control over my life and everything else that I do and being locked into systems that may undermine what I'm trying to do in the long run. So that's why I'm involved with Snowdrift.coop. But I'm going to turn off my networking again. And everybody sees that. And now I'm back. And here we are. Jack is running again. And I'm going to not even worry about the external interface. And I will explain that if you have a different computer, you might in fact never have these troubles. But these things are hard to troubleshoot. And so the most important thing that I will bring up yet again, but I will emphasize right now so that we don't run out of time, is that I found absolutely invaluable to have these sorts of resources. Oh, hey, this showed up again. How about that? So I will explain this and then I'll give it to you. The hardware is obviously your biggest issue. And so there's tons of different things. My laptop happens to be one that has been successful to do what I want to do in some cases. But there are unfortunate connections between the network and the sound and other things. And you can notice this uncomfortable amount of buzz that's happening now, which is exacerbated for some reason by Libre Office. How insane is that? And on top of that, FFADO is for FireWire. Nobody really uses that anymore. OSS is outdated. ALSA is the main thing that everybody's using for auto USB, auto stuff like that. And so Jack can run directly with ALSA. And in KX Studio, in the way I have it set up, I do not use pulse audio. It is gone. It is usually used for things that are not very applicable to music making. And so instead, I have a bridge that bridges everything through all the programs that directly support ALSA instead bridge through Jack to ALSA. And I'll explain how that works later. Phonone is also relevant. That's KDE type of stuff that relates to routing audio. So I leave that and boy, this buzz goes changes. Okay. But going back, this, oh, crazy. So now there you go. As long as this doesn't actually put out light, then we don't have a buzz. So I should have made my slides darker. I'm sorry for the green background, they should have been black. And then the white text or something. These are the main resources I recommend to everybody. You can of course look these up. But Linux Audio is a great wiki that collects a lot of things and is good introductory resources. Linux musicians.com is a PHP BB forum that is pretty active and has people who will help you with all sorts of things. And the open source musicians channel and this great website that has all sorts of guides and things, we've been music production. If you don't find something on there, you will find people who will tell you where else to find some answer to anything. And let's make that buzz go away. I should try to use black stuff as much as I can. Okay. So I was talking about latency. And so I'm going to load up my organ. And then the question would be, how can I play it? Well, I could tap on the organ keys like this. Now I'll try to make this very noticeable. If I tap here and now listen to the delay, you can't really play very effectively that way. But even to make this a little more practical, I will want to plug in this external MIDI keyboard that I have. So I go to this tool here thing and I can go to the tools and these are all, why did I open Firefox? Because I clicked on the wrong place on my screen. Sorry. Close that. Go away, Firefox. Okay. So Katya is a very simple version of this and Klaudia is one that does session management. And I'll explain that later. But the idea is that in Jack, each program has an input and an output of different sorts. In this case, here is my organ. It has an input for MIDI control and output for notifying of MIDI events that could go to something else and output for audio. And so this is plugged into the speakers and this is a virtual patch bay. And now on top of all of this, I plugged in my little keyboard here and that's this one. It's called the Q-Nexus. And so I can plug in the Q-Nexus to the organ and now I can play and there's still a bunch of latency. So I would want to go back to my settings here and say, oh, I need to have, I'm sorry for the, I didn't get a mirroring image well, to change my buffer size down to something smaller. And the problem here is that depending upon the settings of your computer, you can overload the computer if you have too low of a buffer. So we'll see what happens. I'm going to try 120 which is quite a lot less than 1024. And then to make an instant change without stopping jack, I can use the switch master button. And now you'll see down here, I could have changed it right here, but that's another way to set this. Everything seems to be working and now here's my latency. Much better. So basically usable. And if I wanted to change that on the fly, I can do it here and go back to low latency. Oh, look at that. I got 19 X runs, but it happened while I was shifting things around, so I'm not too worried about that. But if I was recording, I'd want to track those types of things because that basically is a glitch where something got lost in the overall flow. At some level, the maximizing of, or the minimizing of X runs comes from getting powerful hardware and powerful computers and figuring out all your right settings. But it's pretty easy to track here. And so there's an organ and I can do, there's a ton of other synths and other things I can play with. So the other tools that I'm going to highlight today are the hydrogen drum machine, which is I think one of the most easy-to-use beginners tools. And I tend to highlight things that are cross-platform, so I can recommend them to my students who may not be on Linux. Although this is definitely a Linux-focused system, as you saw with the splash stream. So this is a drum machine sample setup, and so I can have drum patterns, different sounds, and then you can easily just lay them out across this grid. And so here's the demo thing I had set up. Maybe not. What's going on with this? Oh, I have a pattern. One of the features I like about this program especially is the ability to adjust some of the controls. This is a humanizing feature. I can go to, over here, change the velocity, so it's not so computer-rigid. It'll modulate, sometimes louder, sometimes quieter. It sounds a little more interesting. I could actually modulate the timing a little bit. Doesn't sound quite as rigid that way. And I can turn on and off the amount of swing, and then you can actually add plugins and effects. And so it's actually a pretty powerful easy-to-use thing. Now let's see how this is working. In the case of hydrogen here, I have each of these samples is just a layer here, which is just a recording. And so in this, you can actually edit each of these instruments. So I can actually go ahead and put my own samples in any audio that I want. And then you can actually change all sorts of settings. So you can basically create your own sample sets and then trigger them in any arbitrary way. And then I can also create songs. So in this case there's like the syncopated version. I can say, oh I want it to do this three times. I'm going to do that one for the fourth time and play that out as a song. And I can make this long thing go ahead. And I can of course change the tempo. So very cool program, Hydrogen Drum Machine. I recommend that to everybody. It comes with sets of drum tracks that people have put together that are freely licensed. And so you can in fact contribute your own sample sets and we could potentially, if more people contribute to this, build a whole library of wonderful samples that people can use in their music. And so what's really cool about Jack is that maybe I want to do something very different from that. I want to use something like MuseScore where I'm doing music notation. And so in this case this is at the point of a completely fully professional level music notation program. So let's start a piece of music. What should we call it? Okay, scale. Ah, I can type. And of course we need to say, because that's the most important thing on any piece of music, and we will move on. And we say let's choose some instruments. If somebody named an instrument you'd like me to use. Okay, trombone is a brass. There we go. Trombone add. Let's just do a trombone. That'll be enough for now. We'll just go with whatever key I don't care. And let's have a pickup measure. It'll be four four. Eh, let's go with. I'm going to stick with four four and I'll explain why later. I like, and this is actually an issue of free software and tools. I don't like how tools would put you into this. This program can do anything rhythmically. It's fantastic. Hydrogen can also do some things, but I wish somebody would come along and better support odd rhythms in hydrogen. And so if I'm going to try to connect this with other things, I'll have to do some funny tweaking. Like pretend to hydrogen that I only actually have five eighth notes in a measure or something or whatever else. And then I'll have to say that there's actually really slow tempo or do some tweaks you can do. And it's possible, but it's not the same thing as hydrogen doing all of the odd time stuff that I wish it could do. So somebody please add that to hydrogen. Question. No, no. Hydrogen is more flexible than that, but let me explain. I'll explain briefly. So you have this size and it's like eight eighth notes and then there's a certain sort of resolution for where I put these in. And so I could say, for example, that I want the resolution to be these triplet options or off, which is putting them anywhere. And then I can use any of these sizes. There's 11 eighth notes or whatever. So any number. So I could do something like that, but that's in a set time frame. So it's basically saying there's this eighth note thing that's thought of as a certain level in the hierarchy. And you can just have an arbitrary amount of those. But that doesn't mean I can do the types of things I could do in music score, where I could nest quintuplets inside something else, or I could have five four and then switch to four four or whatever else. Instead, it's more like this grid. But yes, the grid isn't stuck to four four. So it's something, but it's free software. People should help improve it and make it more flexible. At any rate, I'm going to do something in, actually, I was not going to work if I do the, I think I'm going to not get, I'm going to get rid of this measure. We're not going to have to pick up measure. I like that idea, but I'll make this simpler and we'll get rid of it. Okay, so here's where I trombone, and I'm going to show you a brief intro to how to use music score. This is the easiest thing to use. It's a fantastic program. N for note entry mode, and then name a letter. What key should we be in? I don't know. Where in C? Okay, what was that? B flat. Okay, so I'm going to do a C. There's a C. And then I'll say I want to B, and I'm going to hit B, but I'll put it down and there's a flat. And I can just type out letters this way, and we can just do interesting rhythms by clicking up here, and that actually goes along with the number of keypads. So one, two, three, four, five, whatever, different numbers. And then I can quickly type in whatever I want, and so I can say that was nice, but I don't want to do a little fun run or something, so I'll go to G and O. I really wanted it to be the higher G. Just control up, and I go to a different G, and I don't know, let's go with an F and a G and an E flat or something, and now I can C minor. And so I don't know, I'll go back to the longer thing, and we'll do a B flat again. How about, there we go, B flat. Okay. And so let's see what that sounds like. Hey, the drums played. How about that? That's because in hydrogen, there's this thing called jack transport, which knows where I am in the song, and I can choose which thing sets the master. So in this case, it moved to 120 because that's our mute score was set. But I might want hydrogen to be the master, and so I'm going to set this to a slower tempo maybe, and say I want it to be like 80, and we'll go back to the beginning and play our song here. And that sort of worked, except that the tempo from mute score didn't actually adjust. So let's see, preferences. I go to IO, I'm in jack, use jack midi, time based master. So I think I need to tell it that it is not the master now. Hydrogen is the master. Let's see if that works. I have to go back to the beginning, and there we go. So you can actually create a thing where we were doing is, and I'll go back to the view where you can see where everything is plugged in. Mute score is plugged in to the output so that I'm hearing the sound, and the organ is plugged into the output, so I could be playing the organ live at the same time, and hydrogen is plugged in here, and so I've got all these different things, but notice there's an interesting mute score midi output. So maybe I want the organ to play those notes I just added for the trombone, and so I'll just have it send the midi out there. Let's see how that works. I just play it from any of these. Actually I think I can play it right here. Here's the transport, watch this. Yeah, I hear like a funny organ background there. Maybe if I add some crazy high, you'll hear it more clearly if I do something else with the organ. So let's see, I'll put in some higher register or something, and I'm not sure exactly what that'll do, but let's go back to mute score, and I'll put something stupidly high up here. I'm going to do something relatively arbitrary, so Alieptorix sort of knew a Navant guard music here. Let's see, oh that's a too slow. Let's go with eight notes, sure. That's enough. Okay, and maybe I want to take this, I'll move it up an octave or something, so I hear an organ better. I don't know what that'll do, but we'll see what happens. Back to some place where I'll play. There we go, I got an organ, and so mute score is sending stuff out to an organ. I could do, of course, put this on a separate mute score track, and we could have mute score sending some track to an organ, so I don't have to use mute score as built-in sounds, and then I can have it playing the drums with hydrogen, and I could even have mute score send audio output, a MIDI output, I'm sorry, into a, where am I? Here's, it's separated in this case, but here's a hydrogen MIDI track, so if I wanted to make a drum track, this is how I would use hydrogen or hydrogen's drum samples in a more interesting rhythmic sense, I could create any sort of drum line in mute score and output the MIDI from that into hydrogen, and then that will actually play just like everything else. Yes, question? Yeah, so I could put any arbitrary rhythm in mute score and have hydrogen play it, and of course I could do that at the same time as hydrogen is playing in other ways, but it's pretty flexible, so this is all a modular system, it's kind of a unix philosophy of you can mix and match any tool that you want. One of the other tools that I'm going to highlight at this point would be, I'm going to go ahead and use, I'm going to turn the volume down just in case, because I don't remember if the presets are going to do what I want them to do, but I use often this program called GuitarX, which is a virtual guitar synthesizer. Now it is, in fact, picking up my sound and doing all sorts of crazy things right now, which I don't really want it to do, oh there we go, no, it's a little hard to see on this small screen, but, well, let's get rid of the plug-ins bar, and there. Okay, so I've got a virtual guitar amp, and this is a, it's picking up my voice and it's telling me whether I'm tuned, and then we can set up the gains and all sorts of other effects, and I could, if I had a setup right now where I was going to use headphones and I didn't have this thing on, so I wasn't worried about all sorts of other stuff, I could route this through and maybe even live sing something into the guitar amp thing, or I could plug in my guitar through my interface, but for now just to make it simple, let's just plug the organ into the GuitarX, drum machine, GuitarX thing, drum machine. So I'm going to go back to Katya, and we'll take our organ, where is it? Here we are, set B3, and I'm going to unplug it from the output here, so I don't want that directly, I'm also going to go ahead and make sure to unplug the microphone from GuitarX, and then I'm going to plug in the organ into GuitarX and see what that sounds like. I can go ahead and play live because I already have my controller from QNexus going into the organ, let's see, I should be working, that's working, why are these down? Oops, okay, glitches of changing programs too quickly. Okay, so that's working, there's that, and I think if I now just turn my volume back up, you'll hear it. Okay, so I can go back to GuitarX now, and pick some sort of interesting preset, so for the heck of it, let's go with a, go with something kind of crazy, and I volume down a little bit, but get the idea. In that case, it's a very powerful amplifier, and so it can be used for a very hard rock sorts of sounds, but we can also do a number of other things, and so in this case we have EQs and multiband compressors and all sorts of other reverbs, and all this stuff is available as plugins in recording equipment as well, so obviously it would be possible now, although it's not a perfect sort of setup, I could go back to ViewScore, let's delete this little strange thing, and I'll just do a copy and paste for now, although there's other ways to do this, and make a little loop go on, obviously we could edit that a bunch, and we forgot that I had that plugged into the organ, so let's go ahead to unplug MuseScore from the organ, and just let it play the trombone, and go ahead and play. I would have loved to have had all of these things with my juggling, my attention being a little smoother, but I think you get the idea, so I'm going to pause for just a second. Does anybody have any questions at this point? Yes, yes, so it is a MIDI connection, as I showed you I could send an output from MuseScore or anything else, or any other drum program or MIDI sequencer, of which there are several, and so you could do the same thing where just like I have, let me let me give you a very quick demonstration. If I go back to my thing here, I have my controller right now set into the organ, but instead let's go ahead and plug it into the hydrogen, so I can play the hydrogen sounds on my controller right now. Yeah absolutely, and so the real question though is in terms of how we want to output all this, so you can see that there's, we're connecting a lot of stuff, and the question now is, let's say I had some combination of my playing live and a bunch of composed things, and I want to output this and have some finished recording. Well, let's start with a first earlier statement. Maybe I like something of what I did today, but I'm not going to actually want to record this because this is not very interesting, this is a bunch of garbage, but I like some pieces of it. So I want to save this for later, and what do I do? I don't want to have to set up all of this stuff again, and so that's where session management comes in, and so there's not a perfect support for this, because some of the programs support it better than others, and there's a couple different session managers, but the one that comes with KX Studio, well actually there's a couple ones, one of them is called Claudia, and it's sort of a more, it includes a broader set of things than the one we were just doing just now, but in, just a moment, okay, in Claudia, I actually have this set up here with a default internal name in terms of my overall studio, and I can load different studios, and a studio is sort of a set, like you have all of everything all saved together, and what you can do is you can also create little rooms where you have certain sets of things plugged in together, and so in your overall application I can use Claudia to specifically tell Claudia that I wanted to use a particular program like Hydrogen, or Muse, or I could have this recording program, Arter, which is an excellent, which is generally the most popular audio recording program, so one option would be that I take my results and I just play it and have, plug everything into Arter and have it record, that's one option, I wouldn't suggest that one particularly, but basically I can tell Claudia here I didn't do this before, but if I was starting this plan I would tell it I want Hydrogen, and then Claudia would know not only that I have Hydrogen plugged into whatever, but that I wanted Hydrogen as one of the things it tracks, and then when I save my studio it knows that Hydrogen should be opened, it should be this file, particular drum pattern that I saved with Hydrogen, and Hydrogen should be plugged into this, and this thing should be plugged in there, and the entire session is saved so that when I open it again it will start Hydrogen, choose the right file, and plug it into all the right things, and I'm just back to where I started, so that's how you save and come back, now there's a thing called non-session manager which is an alternative one, that actually in some ways works a little better, but there's pros and cons, what I actually do is take my basic setup which is like what hardware setting I'm using, I have my focus right or whatever else, and I save that in Claudia and I use non-session manager to keep track of where all the jack plugins are, which programs I want because it does a little bit better managing of the particular, like I can do command line arguments for a particular thing, any questions about that, is that clear? Okay, so if I wanted to actually then output my thing I would use the render program, and so the render oops, sorry, the render program allows me to simply say I want to start at this particular time in the overall song, and it's going to end at whatever in this case two minutes thirty seconds, and so it's whatever my length of time is, and I can actually hit play and jump to the end of the song in Hydrogen or in Musescore and just say now, and then I choose my format of Wave or whatever else, and bit depth and those things, and real time means that I basically hit play and what this actually does, and I will show you briefly, is if you'll notice over here my, this bit is my hardware, all of the stuff ends up plugged into here that's where the final output comes in, so when I hit render, this is fine I can just do this, sorry, okay so I will explain what happened, what happened was it replugged in guitarics, it for some reason thought that guitarists were supposed to be plugged into there, and non-session manager would have actually handled that better, but I'm not going to worry about that right now, but what that did briefly, I turned it off, let me see I can, okay I have the volume down, so I'll try that again render, where is that, can I close it? may close it, okay try that again, render, I will go back to here and where's guitarics, okay we're still plugged in there, so I don't want that feedback, so I will unplug that, but you'll notice that there is now something called jack capture, and jack capture automatically connected itself to a ton of things, basically everything that was plugged into the speaker output got plugged into jack capture automatically as the thing plays, and so that program actually then captures the output that the computer is playing and renders it to a file, I will stop that because I don't care about that right now, okay so all of this is mostly talking about plugging things into different things, effects synthesizers, MIDI types of stuff, and doesn't necessarily focus on live audio although I did a little bit of that, so I'm going to focus on that now and we have the most basic tool that everybody should use for if you're doing some audio recording at all it's audacity, how many people are familiar with audacity, seems like most everybody, so I'll give you a quick run-through of how this fits in, audacity is an absolutely fascinating program, it does all sorts of interesting recordings and it supports jack in that here jack is chosen and I can use whatever inputs I want, I can check that it's in fact hearing me clearly enough, whoops or not because it crashed on me for some reason, why did it do that? okay so there's a funny little glitch here, audacity is not a full-blown jack supporting program, it uses jack but you'll see this funny little thing port audio what is that, maybe I need to clean up this canvas because it looks a little bit funny, so I will refresh that one away, that was there for a second, so you don't see audacity, there's no audacity in here, what's going on? I'm going to record something and here we are, I'm talking, that's working obviously and I go back, it's not there huh, well let's try this again I'm going to, where, how did it get that sound, I'm recording something and look there's this funny thing called port audio and it's plugged into my microphone and I could have plugged it into something else because if I wanted to make an audio recording of the drums or something but when I hit stop it goes away, so it's only got a temporary thing and I can't use audacity in a way where I plug it in to different things, VLC works the same way it will interact with jack but I can't actually use it as one of these session tools along with everything else so I use audacity primarily if I want to go in and do fine audio editing or play with crazy things or do something like my favorite effect, which, let me move this up so that um hope that didn't work, I guess it puts it up there because I have another screen up there it really thinks I have what was that? Yes and it also includes a very simple version of this crazy thing called Paul Stretch where I can say that I would like to stretch my little recording of myself talking by a factor of say 200 uh maybe I won't worry about that right now I'll show you in a moment um Paul Stretch is amazing so here's what you could do with Paul Stretch I'm going to I'll give you a very simple example so this is a quirky thing that I just love so I'm going to show off uh I'm going to record something quietly that's enough and I'm going to save it uh there's a built-in version of this you know that was short enough maybe I'll try that Paul Stretch thing here now I'll show you the real one so there's a very simple version of Paul Stretch built into audacity but if I export my file as a wave that's fine that's good enough desktop okay okay now I'm going to open this crazy awesome program that is not installed by default but is in the KX Studio Repository so if you want to add this and you have the KX Studio Repository you can just add it it's called Paul's extreme sound stretch and whoa what happened that's interesting okay got rid of the buzz that was funny so I'm going to open my audio file I just made which I saved on my desktop it was called test and let's see if that works um so it's right now set to a stretch factor of eight and so it's going to take that little very short thing I made and spread it out over 32 seconds there's an ambient texture and you can see what you can do with actual things like a song that you like or an orchestra piece or some cord used from on a guitar or anything else and so these interesting ambient textures will follow the the pitch of everything that's going on some of that feedback that we were hearing is actually that I should close audacity probably whoops um no I don't care about that and what's going on I need to go back to here something is feeding back what's going on guitar it's got plugged in to the thing again that would happen don't do that okay I should probably just quit guitarics at some level even though I would have loved all this to be perfectly smooth what I want to give is an accurate experience of using Linux audio you need some patience and some things will be glitchy and this is how it is and I hope that it will keep improving I'm very very happy with today's state of things compared to how it was a couple years ago uh so here's Paul stretch again without the feedback no I'm getting feedback again what's happening why am I getting feedback guitarics got plugged into the microphone again that's crazy okay so I don't know what's why that's happening what I'm going to do is quit guitarics go away and we will now check out Paul's extreme sounds rich for real much more satisfying interesting sounds there so that's a factor of eight and if I was to know that the thing was going to evolve in a certain way I could actually play very interesting music against that or on top of that yeah okay so here's my factor of eight I could stretch that out let's keep going oh maybe a factor of et cetera okay we'll make that last three minutes and I could go on and on and the thing will go oh we can make it last oh let's go with an hour and 22 minutes for those two notes that I hummed now the thing that is really interesting about this program because this is the sort of silly things programmers might get into doing is you can do something called hyper stretch which takes another level of this and so it can start off at kind of modest levels like oh we'll stretch that out to two hours or something but let's keep going and I could be like oh you know let's do that it'll take 40 days to go through that and you can see where I am in this slider so somebody just was being silly with their parameters here because if I keep going anyway I won't hit play now we won't worry about that but even in the regular stretch mode this this software is a lot of fun and you can do some very interesting things in terms of adjusting the process so I can change the overall octave fill mixer here and I can do some interesting things with the tonal versus noise selection here so more towards the pure pitch or more towards the noisy ambient sort of sound anyway so I think it's a very interesting ambient sound creator and this is a obscure thing most people don't know about or highlight and that does not do anything with jack I had to save the audio and then open it up but I could then put that into a program and which I then integrate it with everything else and the program that I would use for that would be Arter and so Arter is a really truly professional digital audio workstation it includes some level of MIDI stuff I'm going to just go ahead and make a new session and we're going to put it on my desktop and just call it test and in this case I can set up all sorts of neat things now on a very basic level if you don't even have all this other stuff installed and you just use Arter the newest version works directly with also and you can skip all the jack stuff I of course like using jack best because I can integrate it with all these things but it can sort of be a monolithic thing where you just put everything in Arter my understanding is the actual creator of Arter is the same guy who made jack who later on decided that it was kind of crazy to deal with jack and people are doing all sorts of weird things with it and he was going to create this monolithic thing instead but it certainly supports jack and so everything is can be used however you like so open there let's see if this works where is it there it is but on my screen so I'll drag it down I think I'll drag it down can I do this bam okay so Arter is a full-blown digital audio workstation I can add an audio track I can put whatever I want in there I can record stuff and if I want to record you then enable the recording and then you hit play and now I'm recording and when I'm done you can have this audio which you can do very simple fading and you can go ahead and edit the audio you can also do some levels of time stretch I move things around all that stuff if you're familiar with that basically we can mix and match all sorts of tracks and it's a very high level high quality program we can do mixing and then it also does relate to the jack stuff so in here it's internal thing is telling you that it's just a monolithic thing doing its own thing but I could switch to jack and now it will follow the time frame and it will choose whoever's the master and it will follow the tempo settings so I could go in here and say oh I want the tempo to change to a five four time signature over here and I want it to become at different speed and then it will adjust those things in all the other programs and so it integrates very nicely is high quality and I have a mixer here where I can add all sorts of plugins so one way to do this would be to just go ahead and look at all the tons of plugins I got all of this is because KX Studio set me up with all these things I didn't have to think about it and I could add this synthesizer or I could add limiter or I could add all these other cool filters or choices for whatever I want to do and sometimes all of that's the most appropriate because I want to actually edit the audio in order and set up automation and do all the other things that people would do so people use this for professional recordings on live bands or any of the other types of use like that but I'm going to pause for a moment does anybody have any questions or anything in particular you would like to know about Arder? Yes No, not the full functionality so audacity will allow you to go in and do sample per sample edits like all of the noise reduction all of the you can deal with a quick detail you're doing very fine waveform things I would always go to audacity for that but it includes everything you would ever want to do with general you know slicing and putting in different things in sections and some level of manipulating the audio Let me be a little more specific audacity is a waveform editor it is about going into the raw data of there is this exact waveform it went to this exact sequence sampler level at whatever and we're going to manipulate that stuff and this is about mixing different tracks together and editing them in a creative sort of a song way and integrating a bunch of different effects and then automating that it should get louder here and all these other things yeah depending upon who you are I would not say that I would say if you are a beginner who doesn't know anything that you would start playing with audacity because you would just want to make some recordings and play with them but if you were just even even for a beginner if you knew that what you wanted to do was put together interesting compositions and you wanted to use audio not just midi compositions you're not just writing in a music score or in a sequencer you would use Arder and you would start with Arder you would make your recordings and then if you found that there was a particular thing where you were like I want to play this bit of the audio backwards and then I want it to have this particular quirky fade then you would open it in audacity and you would edit that audio for that use so mostly you use audacity for these fine tuning things of the audio details itself and you do almost all of the composition in Arder Arder is really the competitor to something like Pro Tools or Logic or these other programs that are big massive digital audio workstations is the term DAW any other questions about this so I'm going to move on and highlight one of the other things that the amazing sky who just puts together KX Studio is doing and it's a program called Karla and Karla is itself a plugin manager basically so what we have is a all these plugins that are installed in the system Arder could put them into the mixer internally Arder other programs like guitarics can have all sorts of plugins in fact you can use plugins inside hydrogen but sometimes you just want it to be an independent thing that you can mix and match however you like and so in this case you just add a plugin sorry again the screen issues just add a plugin from the list so for reference this is the here's the list of plugins that we've got with KX Studio it's quite a long list most of them are usable or interesting as some of them are sort of just funny and quirky and so let's say I wanted a organ I could find that the organs I've got available notice I do not see that one that I was using before which is sort of an independent jack supporting but standalone organ it's not a plugin this some of these are also available that way but there's the calf studio gear where's the name why is sorry oh here scrolled over okay there we are so the name is he's we have the calf organ the oh there it is so there's a set be free that's the same organ I was using before it is available as a plugin and then there's a different organ and there's the calf organ so I'll just choose that one and now we have this rack in which I can open the GUI and set all of the organ settings and it's quite nice if you want to go into all those details you could also edit in the functional sort of this is the internal universal just set all of the stuff so some of these have nice GUIs and some of them don't and this connects here's the patch bay again I now have my not that's the other organ I have my calf organ here and it's not plugged into anything but I could plug in the output see what that sounds like so let's see not jack to off the system there we are so I'm going to plug in my system and now in this view it's very interesting I click this I actually have a keyboard down here and so this can now be plugged in just like everything else it's an independent thing but this is not a an application that normally is a standalone application so this allows me to set up any arbitrary sets of racks of instruments that I can save their presets and I can save them as a unit and then all of them show up in jack and so that way I could add a reverb that I want to plug any of my other arbitrary jack things into I could set up basically any type of thing from all of the plug-in lists and use them independently whether I want them accessible in Arder or somewhere else so there's a lot of modularity there and he's got some crazy amounts of features here that I don't haven't even explored in terms of the ways that this will work I happen to use this often for one of the very few non-free software things that I ever use which I wish was free software but I can't convince my friend who makes this to make it free software because I don't quite get it yet and anyway I can understand in certain ways so I often use this to show off this plug-in reJS which actually comes from the proprietary software it's called Reaper is the DAW which is a Windows Mac-based DAW and they have this plug-in thing called Jesus Sonic which is like a programming language that this thing is written in and then this thing is a wrapper for that so because I don't want to subject myself to Windows and Microsoft but I still want access to this particular tool I can use this wrapper inside Carla and then open this up and get my view here where I'm going to open up Alt-Tuner and now I can plug this in as a MIDI thing in my rack and what this particular program does is that if I actually I'm going to make this a little simpler so that you can actually experience this I'm going to quit you know that's one thing to quit I'm going to close just quit okay I'll quit that okay so I'm going to use I would actually like to use Helm I'm going to try it that way so there's a oh sorry discard bloop I didn't save it oh well so this wonderful synthesizer called Helm the guy who makes this is somebody I've actually had a chance to talk to recently which is my favorite thing about the entire Linux audio world is that I actually get to talk to different people and so I know that I'm actually I think I'm running out of time here yeah okay so the last little bit of here is that if I plug in if I plug in the where was it cap organ all right not that one the oh it's the midi throughput I do the midi through to Helm I think that will work oh no reJS there it is okay so reJS will output to Helm and then I can plug in my controller to reJS and I don't want it to be plugged in here or here and now it's a little bit quiet but I hope you can hear this clearly this will be the last little hint of something reJS do you know Helm needs to be just a little louder where's the volume so you see a general output where is it uh here yeah okay I don't know if you can tell but this is actually now in just intonation whereas if they did not use that E for example the wrong E that would not fit as well Helm is doing it Helm is a synthesizer and so I can actually tune my pitches to be whatever I want them to be and this is not written specifically for Linux but I'm running actually through wine because Karla supports wine so that I can run plug-ins that would work in a Windows system under VST and I can actually run them under Linux because K-Studio actually has all of that support and so I can use this program to actually play in just intonation and to play with alternate tunings that will get me all sorts of different things that you can't get with a standard tempered system and there's a ton of interesting other programs I would like to show off but the point at the end of the day is we have a lot of options if you're willing to be patient and play with it in how to do very interesting creative things and I've barely touched the surface of this so I wish we had more time I could go on all day but it's thanks for hanging out and getting the introduction what's the what's the deal with timing? okay there's an hour before the next session okay so I can take I can hang out for some questions do you want me to start with the point where I didn't know how Linux worked or the part where I was able to like start programs and things but like I didn't actually I mean I'm not sure I mean right I think I will put it this way the tools have improved a lot when I first started just even getting the hardware set up was harder than it is now these things that this KX Studio guys putting together there's all those tools I was showing you it used to be that there was much clunkier looking things that were harder to mess with and harder to understand what was what and you couldn't click the thing and just reset that it's a much easier thing today than it had been in the past there's still room to improve but I think that somebody who's comfortable with Linux and you know knows how to basically manage to find their way around could do useful things today and when I say useful things today I mean if you add KX Studio repositories to your system you upload your you know download the things so you have your software you could start MuseScore and you could go ahead and write a song it's really that simple now integrating all of the things is a different question I highlighted these things that I think are are that easy to get going Hydrogen just works MuseScore is professional level Arter is works it's just fantastic now there are tons of other things that really were quirky and you notice that I had to restart my computer at one point and I decided to give up on using my focus right thing and what I would have liked although I just partly traveling I didn't bring a guitar or something and play something else live and the music that I put out wasn't I didn't blow your mind with it be doing amazing things I'm an incredible musician you don't really have any clue what sort of music I could make because I spent this much time just fiddling with stuff but it's really not that that hard I think there are places where you'll run into little glitches so if you want to get comfortable with the session management part where you set up all this stuff but then you want to save it you'd probably have to read some things it'll take you an hour to get comfortable and then you'll you'll get good at it after a little bit I definitely recommend non-session manager I think a lot of this has to do with choosing the right tools and there are people who have I have a friend who does stuff on Arc Linux so he doesn't have KX Studio and he has to sort of pick and choose all the different things and I don't actually know how he works but I'm not saying he doesn't work he just I don't know how he does it I'm running I will say I am running KX Studio because I went and said I don't want to think about all this stuff this guy who put this together including it being a KDE desktop which I'm happy with he went ahead and put out an ISO built on Ubuntu and I was just like okay I'll use that and that's the system I installed even though you can add KX Studio to anything yeah sure yeah good question well I'm pulling that up any MIDI this one happens to be a Q-nexus it's very portable and and a little bit higher pricey because actually after touch sensitive it's by Keith McMillan my favorite controller I think that I would sort of be happy to encourage promote for anybody that I don't have myself yet is the Lin instrument by Roger Lin because it is a it's a ribbon style controller but three dimensional pressure this way and up and down for different settings and it's a MIDI controller and the thing about it that's amazing is he actually made the software open source and so you can actually go ahead and do amazing things with this and I would love to see the community improve it and work on it and that's one of the very first times I've seen anything like that in the music hardware world they go for about $1,300 or something I think I want to get one even though it's because it's totally amazing but you can get little controllers like this for cheap not this particular one this one's a couple hundred but you can get $50 controller that's workable this focus right is the Scarlett 2i2 the basic Scarlett series is all supporting Linux quite well the audio the purpose of an audio interface is to have a much better hardware than the built in microphones and things on my computer so that even that thing I did with Paul Stretch I was just using my laptop thing because I ended up with a glitch for this and I didn't want to wait but if I had used this I would use it with my higher quality condenser microphone and I could get much better higher quality sound it's called a and overall this is an audio interface and this is a focus right Scarlett series an audio interface will plug in professional level microphones it will give you the right plugs to plug into professional sound equipment and go through USB and then you just have to set that up and different audio interfaces like this case allow work with my computer well enough that I can get very nice low latency and decent quality sound so let's see let's see music let's see what sort of stuff can I share with you briefly stuff that I've done in this and I could open up a full thing but I don't know want to have don't have time for that right now let's see new no new stuff new recordings yes it's a good question they're not responsive enough for nice live playing in some ways but I actually got a touch sensitive a touch display myself specifically to use a program called din which I think is amazing and I just didn't get a chance to highlight today it's a fully micro tonal amazing fluid basically inspired by creating super amazing synthesizer modern computerized stuff built on Indian classical music ideas and it has sort of like particular pitch statements that you can spread in whatever way and then glide between them smoothly and it's crazy interesting stuff so the thing is it started by a Linux guy who based this is an interesting story about make brief about the free software world and about the nature of free software and why I'm working on snowdrift.coop which I can talk all day about the guy who started this is basically some guy who wanted to make really interesting music software and he made one of the most amazing pieces of software ever okay we have a moment I'm going to show you din this is a separate side but why did that not work let's see I'll just do it this way I will show you what din does oh damn it I for some reason I must have reinstalled my system apparently I don't have it installed right now I'm going to enable networking and install it it won't take long because it's in the KX studio repositories and that makes everything really easy I don't know why I don't have it installed that's crazy it just must have been when I updated my system okay so I'll tell you what din does in a moment after I finish setting that up what was that so let me clarify my little story so this program that I'm about to show you in a second is built by a guy who basically was just had no income at all and just wanted to spend all day working on this program and he believed in free software so he decided it needed to be a free software program and he read it was just Linux only and it's a really cool thing it runs on any Linux system and after a while he was basically like I'm poor and I'm starving and I don't want to go get some shitty corporate job and I want to work on my software so he made a Mac and Windows release of the thing and made it proprietary and kept working on that and so the Linux thing fell behind and then more recently he basically got really angry about people complaining about how he wants to spend his time working on his program and he wants to have food and complain about him you know not making it free software and so I had to talk to him and try to clarify that like no like I want him funded I want him to have food I want him to keep working on this but I also want it to be free software and so if enough people would support him otherwise he would be happy to make it free software and his basic statement is he will never ever make it proprietary software for Linux because he doesn't think that's right for the community but he wants it to be proprietary software for Windows and Mac or whatever else and he spends time on that so the Linux system being open sourced if it was fully free software then somebody could port it to Mac or something and you know and he wouldn't have his proprietary controls on that so the Linux system the Linux version is basically behind but the new version he added like craziness that's bizarro and all these things that fly all over the place and it's a bizarre system so but he's a good example of the type of thing we deal with in free software because he's he was I mean we're talking literally somebody who is like sleeping on somebody's couch and trying to figure out how to work on making the software and the software works like this here's din and uh what do I do I forget what the particular control is or I haven't messed with this in a while whoops that's not what I wanted to do um to dim this noise okay oh I have to click something I'm a donor I did all I am actually a donor so there's all these things that are like a drone here and I can select these and delete them is it C this is how whatever I had it set when I was last using this to use some other things here or delete that there okay so the core version of how this works is that I can go into a input mode there's my flat three tuned to just this nation and so I can make my little um drone things and then I can go back to here and go if I went to the different modes or something that this does I forget how it all works at the moment it's all based on Bezier curves and so you can create waveforms with Bezier curves to create any type of sound control and synthesis that interacts in all of these different ways it's create amazing fluid melodies that work against any sorts of drone and so I absolutely adore this software it's completely amazing mind blowing stuff um how do I leave it forget how to quit the program let's see oops okay um yeah and so I'm very happy about that but there was one last request to hear some music that I made so um let's see where am I I'll go with this I was hanging out with some other friend and we did some stuff here's a drum beat that was in seven uh I've done in hydrogen with some synthesizer stuff and then some guitar there you go that's the sense of that there's live bass uh various other things that was recorded in arder with hydrogen drums and a synthesizer and live guitar and some other things like that um whereas like this is another track that we did this is mostly live recordings mixed in arder everything I ever do is under C-C-B-I-S-A I it's uh so I have a I can go back to I should go back to themolktoon.com and there's links to some of these things I'm working to get up other stuff a lot of it's older this particular piece I was really thrilled that somebody used this as I said I do everything Creative Commons Attribution Share alike I think all music in all cultures should be under that license and so somebody actually made a video about like making chutney or something there's some people from the Zill and they use this as their backing track so yeah free culture I'm always happy to talk about those things too and I've done a lot of other things I've dabbled with electronic music and a number of other things but this is uh yeah so I made music with Linux they work yeah so yes but nothing that does it very smoothly unless you have a special guitar controller that makes the guitar translate into a midi thing and there's a number of things that are super missing I didn't emphasize this I really really really like and have done lots of interesting things with pitch and exploring the nature of music and music theory in mellowdyne when I was on Apple and there is nothing like mellowdyne available at all I'm sorry just give up there's no mellowdyne or anything like it or actual autotune that's not just a goofy little I can do some stupid little something in tune to tempered systems mellowdyne is like a autotune that's much much more creative it allows you to go in and modulate the pitch and timing of individual events in a live in an actual record of audio at a mind blowing level of stuff proprietary software it's amazing and I really wish we'd have something like that that's probably the biggest pain point I have right now and anyway yeah so there we are thanks everybody for hanging out I'm going to go back to the snowdrift.coop booth and promote free software Mark looks like we can start so welcome to the HTTP2 session at scale my name is Nick Shadrin and in this session we will talk about this new version of the HTTP protocol about its use cases about how and when to use it and when not to first of all a little bit about myself I work for NGNX in San Francisco as a technical solutions architect and I often deal with lots of our users and commercial customers as well figuring out the technical questions and challenges that the current internet users have and figuring out the solutions for their environments I have quite a long experience in web technology basically since I connected my computer to the network I was dealing with different forms of websites basically first of all creating the websites securing them sometimes figuring out how to make them work faster and all all their all their reasons around it so my contacts are right there it's nickatenginec.com or you can tweet me at Shadrin so first of all I used a lot of different links to different resources in this slide deck and I collected all those links in one page so this slide with this QR code I will leave it after the presentation as well so don't don't bother taking pictures of every slide because the slide deck is also available on the conference website the first part of the talk we will see the difference between the HTTP and the HTTP1 and the HTTP2 protocols we'll talk about what features the new version of HTTP gives us and what kind of new enhancements and new performance benefits you can get just just immediately from using that we will review the HTTP1 and the HTTP2 optimizations most of us who wanted to make their websites work faster well we have already implemented some features for HTTP1 so we'll see how an immediate use of HTTP2 can either benefit or degrade performance depending on what optimizations you use for your old school HTTP1 deployment we'll see very interesting ways of troubleshooting the HTTP2 protocol it does have some challenges there it's not as straightforward as the HTTP1 troubleshooting and the next part we'll see a very interesting part the benchmarks everybody can make their own benchmarks and there is a a number of different benchmarks available for comparing HTTP1 and HTTP2 HTTP2 and obviously I made my own benchmark and I will tell you why mine is better than everybody else's all right and then we'll talk about how to configure it how to do that with the the nginx web server and which features to enable what kind of configuration and the log items to expect all right so first of all little bits of the HTTP history it's been a while since HTTP received a major update first of all it appeared the first drafts appeared in the late 80s beginning of the 90s and with the version 0.9 there was a very simple a way of accessing an html page using a very simple get request to the url so there was no concept of keeping the stateful connections there there was no concept of different resource types caches or all the other optimizations that we currently have in 1996 the HTTP version 1.0 was finalized with some of those enhancements but what we are going to be comparing with HTTP 2.0 today is going to be the HTTP 1.1 the major features of HTTP 1.1 which were added compared to 1.0 are the keep alive connections and extended abilities to control and manipulate the cache and there's there's also a number of other different features but performance-wise the keep alive and caches are very important for us and in 2015 about less than a year ago it was the HTTP 2.0 version was finalized and HTTP 2.0 was based on the on the open protocol called speedy so currently we have a proper standard not some protocol developed by a set of companies but a properly defined standard version 2.0 of the HTTP protocol so let's take a look at the example example request and example response in HTTP 1.0 first of all who is familiar with the HTTP headers and HTTP request and everything all right so I guess we can just skip this slide looks like everybody knows HTTP 1.1 well enough so let's take a look at the predecessor of the HTTP 2.0 protocol which is called speedy it was announced in 2009 by google and it became very popular with the implementers of the web servers and the implementers of the web browsers so speedy is well supported and the major idea for the speedy protocol was to reduce the page load time and to make performance enhancement to the HTTP HTTP 1.1 protocol so major features of speedy included the compressed headers the flow control with the multiple streams of data and the server push compressed headers with the speedy the compression was done with the GZIP algorithm and we will know that all the HTTP 1.1 protocol can also compress data just as speedy can compress data but HTTP 1 cannot compress the headers and when you have smaller requests when you have a large number of requests or a lot of headers are flowing through the network having that data uncompressed it sometimes degrades your performance so HTTP 2 was introduced in 2015 and it's majorly based on the speedy protocol the difference between speedy and HTTP 2 is in the method of compressing the headers HTTP 2 does not use GZIP compression it uses HPEC compression and it makes basically it makes a map of the headers and data on both the server side and the client side so that compression becomes stateful and for the implementation of HTTP 2 with the proper HPEC algorithm it does use significantly more memory because all the compression data needs to be stored on both sides both on the server and the client side the multiple streams of data allow you to to put different requests inside the same TCP connection and the different requests request data is in response data is coming through that connection at the same at the same time so basically we all know that in HTTP 1 the browsers tend to open multiple connections to the same server in order to download images, CSS files and other resources in parallel at the same time with HTTP 2 that is performed within one connection and the implementers of the browsers do not open several connections to the same host everything is done within within one and it is done with the prioritization HTTP 2 allows to send the priority information with every stream of data so the browser can know which data it needs the most which data it needs to come with more priority which data is less required for the user to start the interaction with the page and also HTTP 2 includes the server push the server push feature the server push allows you as a website builder to send data to the client without the client's prior request for that data basically you will be sending the request and response together in the response so you will tell the browser what the browser should have asked for it makes it makes total sense when you have high latency networks for example when when someone is asking for index.html page you already know that they do need the javascript.js and style.css so you can push that information to the to the client also what you can do is interact with make your pages more interactive for example when you need a notification to be pushed on your page or maybe some interaction data like chat style workflow when the data is slow in both directions so it is it can be used as a as a way to to work around the web sockets another method of sending sending data in both directions all right there is an interesting question about encryption and HTTP2 if we look into the specification and we look at the the standard it doesn't require that the connection would be encrypted however no implementers of the web browser have implemented the use of of non-encrypted HTTP2 traffic so every browser once you will only support HTTP2 when the connection is done with the with TLS so so due to due to this particular need for everything to be encrypted we at NGNX also included the encryption version only for HTTP2 there is an interesting way of how does the browser know how to how to initiate the connection there are several ways of doing that first first way of implementing of first way of switching the protocol from HTTP 1.1 into HTTP2 is to send an upgrade header and to switch to switch the protocol from HTTP1 into HTTP2 or using that the upgrade header functionality basically you will tell the client will tell to upgrade the connection to HTTP2 to send the data with a with a binary format another way of doing that it's called the NPM it's called the next protocol negotiation and it is an extension to open SSL which allows which allows the the client and the server to negotiate to negotiate which protocol they are going to be using for that particular connection and that negotiation is done early in the TLS handshake so we can avoid the use of HTTP1 completely in that particular connection there is also another way of performing negotiation to HTTP2 it's called the ALPN application level protocol negotiation the difference between those two extensions is that ALPN basically saves you an additional round trip in negotiating the protocol and basically at the first handshake the client sends the list of the protocol that the client is supporting the server picks HTTP2 from the list and starts sending data to the client in the NPM negotiation the server is announcing the list of protocols the client picks the protocol and then and then the connection starts so there is a difference in OpenSSL versions that support NPN and ALPN NPN is supported in OpenSSL 101 which is currently used in the most long-term support and I would say enterprise-friendly well-supported Linux distributions however the ALPN negotiation is supported from the OpenSSL version 102 and onwards and that version is not included into some of the very popular Linux distributions currently obviously we expect that to be changed soon so ALPN will be supported everywhere if we would talk about Ubuntu Linux the version 15.10 supports OpenSSL 102 and version 14.04 the long-term the LTS version it does not so that kind of limits us in in some implementations of the protocol negotiation another thing to mention is that NPN negotiation is supposed to be removed from the browser support pretty soon along with the support of the speedy protocol so we need to align our servers and our infrastructure to support and to work with with the current versions of the web browsers so if the web browsers are dropping the speedy support we will have to switch to HTTP2 at some point when the browsers are removing the NPN support we will have to switch to the versions of our server software which supports OpenSSL 102 and the LPN all right next let's go through a number of different optimizations that we can perform within HTTP1 and let's see how those optimizations affect the website if we are switching from HTTP1 to HTTP2 the first optimization is called the domain sharding we know that the browser opens a number of connections to the same host in order to perform in order to perform the the file download and simultaneously so when the browser the browser opens only up to six connections per host and sometimes so when we want to when you want the browser to download more resources at the same time we put our resources into into the different sub-domains or into a set of completely different domains like ww1 ww2 and and so on that way the more domains we implement the more connections the browser will open and potentially sometimes I would say it becomes a little bit faster last name sharding thank you that was on purpose yes I was told that all the time especially sharding that is correct all right so basically what we're doing with the main sharding is we are distributing the resources across multiple domains and we're making the browser to download those resources at the same time so you're you already see that it doesn't help with http2 it does it does not because opening multiple connections and messing up with http2 priorities it's not it's not going to be that intelligent on the network level as it's going to be done on the browser prioritization level so making that moving making that optimization more more aware for the browser so the browser will be able to choose which resources to download first and with which priorities it makes significantly more sense so if you are using the domain sharding features with them and you are switching to http2 it makes sense to simplify it and go with the same the same cdm domain or even just the same domain for for everything it also depends on somewhat depends on on the geographical distribution of the website and the cdm use another optimization that many people implement in for making http1 websites faster is making the image sprites basically combining multiple images into the same large image file and on the client's side you can easily carve that into different smaller pictures smaller pictograms it's pretty easy to to do that with with the client's site technologies so does it help with http2 well a little bit so the if you start sending multiple requests instead of one larger request it does not affect performance in http2 as much as it's affected it's with http1 because of the compressed headers because we're using the same connection and because we're sending the data within the same through that same connection we're sending the multiple streams within the connection another method of optimizing is the concatenation of the javascript the css files and using that is very similar to to the image spritan what you're trying to avoid when you're doing that is not sending the set of headers all the time to reduce their amount of round trips on your website that's exactly the same as with the image sprites if you are using http2 it does not affect there is not much effect in doing that and there is a downside of using those optimizations and all of those optimizations they add to your time they add to your DevOps time to you will need to create those sub-domains you will need to manage those files create the deployment scripts and scenarios which are more complicated than if than if you would be using those the http2 protocol and not those complicated optimizations so let's look at the current statistics of http2 because we we need to know if the web and your clients as the browsers are ready to support this protocol so since we do have internet connection here we'll just go to live website instead of instead of the screenshots so we're looking at the can i use website and this is the current current statistics for the http2 protocol support in different browsers so what we can see is that major browsers like the microsoft i.e the latest version the edge browser firefox fox and chrome also the opera they all support http2 some of the browsers don't support it and the old versions of internet explorer which actually have very low usage they do not support http2 the version 11 supports it but it's only limited to the windows to the windows 10 operating system the major market share according to this the can i use website it's with chrome and it supports it really well except for some some older versions and obviously the older versions of the browsers are going away so that that percentage is going to increase safari supports http2 with the the latest macOS and the old versions of safari don't if we can go to the ios it's the same thing here and since we all know that the iphone are updated more frequently and the users are keeping their latest versions of ios better than with other mobile systems that percentage is expected to drop if we look at the opera mini browser that one does not support http2 but basically opera includes their own set of optimizations they are they are acting as a proxy and they are changing the content and making a bunch of their own changes themselves so basically this is not significant they're relevant to to our set of optimizations because they are including their own if we look at the android browser oh by the way the opera mini operates in a similar way as this the uc browser which is used on many devices in china and some asian countries that browser also includes acts as a their infrastructure behind that browser acts as a proxy with a set of their own optimizations so the direct use of http2 is not relevant to this browser as well and if we look at the old android browsers that those browsers do not support http2 as well but that share of the browsers is supposed to shrink significantly since users are updating to the newer phones all the time and all the newer phones with the new with the new chrome for android they support http2 properly all right so if we look at that through the overall stats we'll see that about 70 percent of your clients are supposed to be supporting http2 and if we take take out this eight percent and this five percent basically we're in in a very good shape for including http2 in in the general in your general website infrastructure and please remember if you enable http2 in well basically all the implementations that i'm aware of there is backwards compatibility since if if the web browser is not able to connect through http2 it will most probably connect with http1 there is about one use case when it won't but it's very very fringe use case so let's get get back to our slides instead of doing the screenshots we looked at everything online so the next page it's the http2 user statistics from the w3x website so we already see that it's used on more than six percent of of the websites according to w3x and also we can see that that that percentage grew up significantly in the last just in the last couple of months basically when I was submitting this talk to this event we were at about two or three percent somewhere in this in this range now it's about this it's basically tripled if we look at the historic trend on the w3x here on this page we'll see sorry on this page we'll see that http2 right here is the most the most growing site element at this point basically everything else is staying on a flat line and if we compare the http2 growth with speedy we'll see that speedy currently has about six point six percent of the website and http2 has six point two starting with July 2015 so this is a very significant growth and if we are implementing that today on our environments we are staying at that growing at that growing curve which is which should be really good for us it might might be something good in this protocol right let's go back to the slides maybe not maybe not maybe not everything is good about this protocol so first of all there is a number of different downsides to http2 one of those is I want to mention that not everybody needs to secure every particular request and every page if your website mostly consists of cat pictures and funny videos maybe encrypting every bit of that data is not really required also if if your website is mostly doing uploads and downloads of larger files it's the http2 optimizations they don't affect that website that much and maybe you just don't care about your website working a little bit faster so sometimes sometimes it it does sometimes it doesn't make sense to implement this protocol and there is one one huge downside of the http2 protocol it's a little bit harder to troubleshoot remember we looked at http1 request and response and everybody was pretty much familiar with it everybody knew what that means it's very readable you can browse with telnet everything is fun and easy however if we look at http2 encrypted and even decrypted traffic it is way harder to understand you won't be able to perform http2 browsing from your telnet line but even if we look at at the encrypted traffic and this is just just the beginning of of the of the tls handshake we can already see a few things there we can see the client my chrome browser is announcing that it works through http1 speedy 3 and http2 h2 is is the protocol identifier here and also we can see that the server responds with the information that it works with http2 and with http1.1 so even even though we we can see this this handshake being completely unreadable there is a bit of information we can extract from that without even without even decrypting everything so what if we want to decrypt our browser traffic and what if we want to look into inside of the http2 protocol and to see all of those streams frames headers and all the other information there is a way of doing that with wire shark and I really like wire shark as a tool for troubleshooting and it gives me a lot of information on all the traffic that's going through my system everything is really easy there is a way for us for doing decryption of uranian browser traffic without the knowledge of the private keys of the servers that you're interacting with the original way of decryption that the SSL traffic so wire shark was related to you uploading the private key into the special wire shark settings and then I well basically I never liked it it always meant that I would have to take the private key into the client system and that doesn't sound like fun so this way it's significantly more fun you just need to specify the environment variable the SSL key log file and you need to open the browser with that environment variable and then what we need to do is we need to put that that session that session key that session key files into the SSL settings of the wire shark so once you do that your traffic will become very readable and the the new versions of wire shark starting with I think with the version 2 of wire shark and currently they're 201 or somewhere like the along those lines it does support the HTTP 2 protocol so you can see this headers frame settings frame data and all the other information there this is definitely something that makes sense to research a little bit further you will be able to see the all the headers that that are coming through all your cookies cache information and and so on so for the troubleshooting part of of HTTP 2 I definitely recommend using this approach all right let's go to the interesting part the benchmarks so everybody everybody knows how to make their own benchmarks and everybody makes the best ones so on a at nginx cons 2015 in september we had a talk by valentine bartena for one of our core developers on HTTP 2 and that talk had a bit of pessimistic view on that protocol so what we looked at was a set of different tests so with some some page some generated pre-generated page and those tests showed something quite interesting so what where's really the benefit of that protocol if we look at at this graph on on the horizontal axis you see the latency of the web server basically your your delay your network delay to the web server and on the vertical axis you can see the first pain time for the page so the page starts appearing on your web browser let's let's say between so the at 300 millisecond delay you're expecting the page to appear in between two and two and a half seconds so the difference is not that significant between the HTTP one unencrypted HTTP two and HTTP one encrypted so the blue ones are HTTP one the green ones are HTTP two the the yellow ones are HTTPS so with HTTP one so if we put that on a different scale the graph looks like that basically we can see that HTTP two the this the black line only has a tiny bit of benefits over unencrypted protocol and the benefit with the benefit compared to HTTPS is also not that huge so we're starting to figure out what's the deal here why people are so excited about it and why is it when does it work and when it doesn't work so I did my own benchmark and my own benchmark included the nginx 199 the ubuntu 1510 with the open ssl one or two with alp and negotiation I used the chrome browser and what I did I just put my page in the constant reload with no caches enabled to see how it how fast will that page reload and then I started to figure out what kind of page I should be using for benchmarks I'm pretty sure not two of us will have exactly the same page in our website they are completely different we we're using different clients and server-side technologies some of our pages are extremely simple like my home page or they can be extremely complicated like nginx corporate website the nginx.com so your mileage may vary what I did I took the benchmark page from the current free css template and I chose the one that looks reasonably modern it it is using some jQuery the javascript it's using lots of css proper markup and I added a little bit more images to that so I made the whole page to have the total of 54 different objects so I figured it looks like pretty reasonable setup maybe maybe your websites are they have more objects like that maybe less than 54 so there would be an interesting show of hands here so for your web project is you are your web projects do you usually have more than 54 objects and less than 54 objects okay about the same amount of people so we'll consider that to be something like a median page size all right so I started measuring measuring that HTTP1 and created HTTP1 and HTTP2 for the same page within the constant reload so what we are looking at here is a very interesting set of results what I did for this test I disabled the keep alive so every HTTPS connection required the full TLS handshake and when our latency grows we found that it affects performance of the web page very significantly so on around 100 so we can see that it's a 200 millisecond delay our page loads for more than 12 seconds on average and if our delay is around 20 milliseconds we are somewhere at one or two second page load time so next thing I did I enabled the keep alive and ran the same test again once again it was the constant reload of the page so for the keep alive connections we are taking away the initial initial SSL and the TLS negotiation from both HTTP2 to our original setup we can see that the latency here goes up to 800 milliseconds and right there at 800 milliseconds we originally saw that there was some more significant difference between those protocols so I went ahead and I increased my latency up to one second what was quite interesting in the one second delay is that the benefits of HTTP2 they seem to be shrinking so if your network delay is extremely high let's say if you have satellite connections or most of your clients are coming from a completely different part of the world or you're using 2G connections or something like that you might see that even though the benefits exist and they are still quite substantial they are not as noticeable as the benefits on the lower delay so what I did next is I divided one by another and I got this unusual graph which is a percentage benefit of HTTP2 and it looks very interesting that for once again for that page that I was using because your page will be different for the page that I was using in that benchmark though the best results for HTTP2 were at about 250-300 millisecond delay which is quite reasonable and very much real world network delay so that's the that's the very interesting set of benchmarks and this is the graph which I think shows that my benchmark is the best all right by the way if you want to argue about about this setup I will be hanging out here for the rest of the day and I will be happy to engage in any technical conversation or show you how that benchmark works so let's go into a little bit of a practical set of practical slides about implementing HTTP2 for your websites with Nginx what we need to do is we need to take one of I would take the latest version of Nginx which is the 199 currently and for the configuration I would use the configure parameters for HTTP V2 module and with HTTP SSL module it will technically compile without SSL but how would you use it since no browser is using that anyway with when you are using the pre-built packages of Nginx or someone has built it for you you can check with Nginx-v you will see the configure arguments and if you see the HTTP V2 module it means you can use it so the setup your configuration for Nginx for HTTP2 is extremely simple for your listen parameter where you have your port number and the protocol you need to add HTTP2 together with the SSL parameter basically that's that's pretty much it so for the SSL certificate and the keys you should definitely use more security than that is shown at this example the security aspect of having the proper keys protocols perfect for a secrecy setup and so on that is a little bit outside of the scope of this talk I definitely recommend looking through the slides and the presentations from the security track many of the of the presenters were using Nginx as an example and they showed very nice configuration snippets SSL parameters and other methods of making your site secure so definitely a great great track to revisit if you haven't already when you have your your clients coming to your website you will need to figure out how many of those clients are in fact using that HTTP2 connection are they using it at all so in your logs your variable request will show get and post requests with HTTP2 slash 2.0 so basically you will be able to find all that information in the logs and you can parse those logs easily to figure out the percentages of your traffic to see how your own usage of HTTP2 is growing in the wild and also there is another tool for that we recently built and we are in the process of building the monitoring system for the for Nginx which is called Nginx Amplify basically you can connect your Iranian Nginx instances with a small agent to the Amplify SAS based performance monitoring tool it will give you the configuration recommendations it will and this will give you a number of different graphs and also it will give you the HTTP version graph which will show how many users are going through going to your website with HTTP1 or HTTP2 very useful tool you will find the link to that in the in the link set that I I will send that I will show here at the end of the presentation so the useful tools that you can use for implementing this protocol definitely you can monitor the usage of different web technologies with the Can I Use website I found it very useful for figuring out what I should use and what I shouldn't there is a great tool for encryption so well not all of us have corporate certificate authorities and access to expensive as a self-certificate so let's encrypt gives you an ability to generate free and supported SSL certificates for your websites they will have three months validity but they will be accepted by the major browsers which is a very very big deal very very useful for smaller developer projects and so on and the web page test tool gives you the diagrams of how your your page is loaded in in different browsers from the different parts of the world obviously you can test your website from your own location but having the flexibility of that tool checking your web page performance from other places is very important so I definitely recommend using that tool a little bit about nginx so you can contribute to nginx project at the hg.nginx.org you can find all the current source code and you can you can download it you can you can write to the mailing list to the developer mailing list so the user's mailing list also you can go to you can just grab current nginx sources from github it's a read only mirror and if you don't like or don't want to code but still want to contribute there is a wiki and all of the all of the modules and at every project in the world we all need the better documentation the last link shows you an interesting promotion that we have for the scale conference we are giving you a non-production developer licenses for the commercially available nginx plus with the commercial features and if you want if you want to test out that software without the restrictions of a free trial you can do that on a yearly based license for non-production use so I will leave this slide again on the screen and if you have any questions please ask those questions I also have some nginx stickers so please go ahead it does enable HTTP one as well so let me repeat the question so when when we enable HTTP two into in the listen directive does it only limit it to HTTP two or also enable HTTP one yes it does have the HTTP two connectivity will have to be negotiated by the client by the client and the server if they don't negotiate HTTP two they will revert to HTTP one I will have to think about that send me a note note please I will I will respond to that soon that it will probably involve a change in the code absolutely lesson creeped is a great project that's a fact you're next not sure about a TCP dump being able to so the question was about the command line tools being able to support HTTP two and description of it what I would do in this case I would probably take the TCP dump pick up file and open it locally with the with the wire shark not yes I understand the the command line tools and the developer tools for HTTP two protocols are something that is an inactive development so this is something that's changing changing on the fly we should expect more those tools available yes please so the depth the depth of the graph if you are if you are using a lot of elements on the page I found that when we have more elements on the page you have usually better more benefit with HTTP two I can give you an example right here so when we go into our list of links there is a side-by-side comparison here you go and this one this page shows you 165 small elements small images here so all of those are small parts of one page so when it when it already starts you will see again how how fast or how slow those those connections are also that page is publicly available you can see the whole engine next configuration below yes please server pushes the technology that is defined in the standard of HTTP two unfortunately it's not currently implemented in the engine x configuration or engine x code but it is defined in the standard it's we are working on it it's it's really we can have a separate discussion of how server push should work through a proxy but send in the request and response back through a proxy is not as simple as it sounds yeah please for the rest of the APIs and not just the browser connections your clients can use HTTP two for different other APIs and not just for the web pages probably the benefits will be significant for the because of the header compression with many API calls the payload is quite quite minimal but the headers can be significantly more than the payload depends on the traffic flow not sure that multiple streams and prioritization would would be a huge benefit well we'll have to see how your your API traffic flow would work on that case also if you are building your own clients building them with HTTP two is something quite new so so you would have to perform more research and more troubleshooting on that area as well yes so when we will look at at this frequently asked question it will tell you that gzip compression with speedy was prone to some significant significant vulnerabilities and and the the standard defined the the new way of doing that with HPAC which is prone to those vulnerabilities that's the major reason yes it's it is supposed to be used with the both ways and and as as you said as a as a replacement to web socket at some point as well web sockets over HTTP two is something well there is a draft it exists on how to implement web sockets with HTTP two or over a speedy protocol but that is not currently in the standard so basically when the when you when you want to implement web sockets when your sites are running HTTP two the browser will be smart enough to not go with HTTP two when you want to upgrade the web socket connection so the browser will go HTTP one for that particular connection you can still use web sockets on your website it will not be HTTP two enabled currently all right well it looks like we're out of questions but anyway I'm handing out around this room and outside for until the end of of the hour and you're welcome to grab some some stickers so thank you one two past one two well I guess I'll go ahead and get started you guys in the back want to come closer you're more than welcome to so thanks for coming to my session today about project hosting 3.0 my name is Lance Robertson I'm the director at the open source lab at Oregon State University we've been doing open source project hosting for over 10 years we have a lot of projects that you probably access every day like the Apache software foundation Lenox foundation Python software foundation whole bunch of other people so needless to say I have a little bit of experience in this realm so for today I'm going to kind of go over the brief history of host hosting at least from my perspective correct me if I'm wrong kind of talk about some of the advances I've seen happen over the past 10 years and then I'm going to start talking about kind of a vision of the open source lab in general and like where we want to go with host hosting because things have drastically changed since we started over 10 years ago and then I'd really like this to become a discussion of like where things are going I know there's several people here that we actually host for so it'd be good to hear that feedback so first off let's kind of talk about the types of hosting that are really out there file hosting typical static file hosting what we normally refer to as FTP mirrors and so forth mirror hosting that's been the traditional type of hosting that's been around there's also some hosted platform hosted hosting so things like source for storage out of the you know through the years they have or GitHub really now have some kind of a platform that they provide a software service for co-location hosting which is what we've done a lot with over the years and it's kind of gone by the wayside on a lot of projects but not completely for a variety of reasons and more recently in the last five years there's been a lot of continuous integration hosting so things like Circle CI or Travis CI for automating testing of your code and things like that so it's been really important type of hosting has been out so as far as the file hosting side goes it generally kind of falls into one of these categories there's the universities like OSU MIT Indiana University there's ISPs and then there's just other random organizations or companies that do some kind of FTP here hosting sometimes we look at our our sync logs on our FTP server and we see interesting sync requests from companies like Facebook and things like that so there's a lot of companies that maybe do some hosting and so forth but that's typically and historically been the way things have been as far as social platform goes we have you know of course source words which we won't really talk about Ganyu Savannah it's been around for a long time launch pad came out of canonical Google code which is now no longer there and GitHub and more recently GitLab is kind of a as another platform and it's kind of gone through these iterations for the years and it's been really useful as far as co-location going goes ISC is actually one of the longest standing co-location facilities for open source projects a lot of the core infrastructure is hosted at ISC we've also been around since around 2003 been really important I know other universities like Andean University kind of would do some ad hoc co-location hosting there's a whole bunch of others that probably do a little bit of co-location hosting but really there aren't that many places because it's difficult to do unless you have dedicated manpower to do it so that's kind of been one of the things going on with co-location hosting and then as far as continuous integration hosting goes Circle TI came out Travis TI has come out and then drone has been also out as well this is a continually changing realm as well there's probably a new one out already that I don't have listed up here but you know there's a lot of those platforms out there that are really important but they all have the limitations as well so that's kind of a brief history of where project hosting has gone now let's kind of talk about the major advances that I've seen over the over the last 10 years for one as well known GitHub I think that's the biggest thing that's ever happened to false hosting I don't think it ever has replaced what we've done it's made our life actually easier and a few years ago we finally started using it ourselves we were always wanting to eat our own dog food but you know wanted to just not have to worry about hosting things and just use it public cloud computing has also obviously changed a lot of things also it's complicated a lot of things depending on what the project needs are public cloud hosting might actually work well for them other projects might need more specific reasons specific things that they want to have or some other reasons they want to have it but that's also changed quite a bit of the things and opens our hosting over the years and there's a lot more content delivery network choices things are really discounted services for projects like I know fastly is one of them so that's really important for projects maybe not initially what they need but when they get larger and larger they really need it and also that provides kind of a newer a better way to deliver your software you needed to outside in the world and there's a lot of CI testing platforms that have come out whether it's hosted or not things have gotten a lot better and I think that's really created a lot of some projects are just now starting to figure out that hey we need to actually automate all of this and we're kind of in the middle of all that now so that's been really important Oh by the way if you have any questions feel free to interject at any time or comments or flames or whatever so let kind of look at the comparison between co-locations versus public cloud to kind of at least from my point of view how I see so from a co-location point of view it's more expensive you know unless you know somebody that's willing to cut the costs or do it for you it's really cost prohibitive to do your own co-location somewhere so public cloud hands down much cheaper co-location you're really not that flexible on what you can do unless you buy the hardware or have the money to buy the hardware with public cloud you can expand it as quickly as much as you want and so forth from a performance point of view obviously if you co-locate your own hardware you aren't sharing that resource with anybody else so performance what you matter and you don't want to pay a public cloud to do it or get really expensive instances that's where you know having something co-located is really important when the public cloud as we all know things vary things come down things come up you have not much control so for open source projects that can be problematic and that kind of goes down to the next point you have a lot of control if you want to do co-location hosting if you want to have a specific layout of how you want to do things obviously it means you need to have money to pay for the hardware too or have a hosting company willing to give you that hardware and the public cloud you're at the will of the features of that public cloud and maybe they have all the features you need and it doesn't matter and then from a co-location point of view you know you have that hardware ownership cost of the hardware maintenance what happens when that server goes down you got to replace the hard drive who's going to replace the hard drive is there anybody there that's going to replace the hard drive lots of those things that are annoying and then public cloud you just pay for the service you know you don't have to worry about that so we've kind of tried to advance over the years as best as we can where the open source lab has historically had you know very few amounts of full-timer people working we have a four to one ratio of students per full-timer working there so it makes it really difficult to kind of advance things sometimes but one of the things that we've done in the past 10 years is try to catch up with virtual computing and private cloud before OpenStack even existed we implemented a we converted a very manual Zen cluster into a genetic cluster which is the software package that Google created and it's just we have KVM instances running it doesn't take much effort to set up it's if things run we can you know do things and it really helps us out quite a bit we're just now trying to get into more open stacking kind of things but those have their own issues but we're trying to be more flexible on that and trying to catch up with everything like I mentioned genetic and we have an OpenStack cluster internally we're hoping to the maintainability of that is what's pain and it's I'm really nervous about opening that up to our projects because then we have to maintain it and go through for it if it breaks and it doesn't quite have the manpower we don't have quite the manpower to deal with that we're just starting to get into some kind of the world of things that's dived into we've been a user of Gluster and partly might be just how we have it set up incorrectly or if split brain issues come up and we have to fix it a student sometimes have a little difficult time understanding how that technology actually works but I think overall the technology I think it's a really good technology but you have to use it for the right job so I went to a session yesterday from Facebook talking about how they use Gluster FS at scale and it was interesting seeing what they did like oh well maybe we should try that differently or do this and you know it's really nice I kind of for a specific type of storage that you use it's great if you're using a lot of small files that you need to do a lot of operations don't use it but that's kind of our loveite relationship with it I had an experience at a prior job with NFS and HA NFS and it was horrible you couldn't scale like you wanted to and that's kind of where we're at we've also done a lot with configuration management over the years for the longest time and still to today we've been using CF engine 2 but more recently we've been moving everything over to Chef we we actually tried to use Puppet at the time it was probably what 2012 or 2011 I can't remember when we were looking at it and it's one of those projects that you just can't put one person on you gotta really dive into it and we almost were ready to play but it just wasn't quite working the way we wanted it and it was actually scale I think that year I came here and I saw a bunch of talks on on Chef and I was like it's been a long time since I came to look at Chef and we dove into it and what I liked about it the most is well it can be overly complicated it's it's it's actually Ruby language so we can get some of the developers to start using it we can actually test the code using some standards hopefully and kind of automate a lot of the things that we're wanting to do and it just kind of introduced new problems but it kind of sells some other problems and it's still ongoing we we probably have anything we do new deploys now all Chef but we have a lot of legacy stuff still running see if engine too which is kind of annoying we also been doing a lot of integration testing on infrastructure with anything on Chef we have a lot of integration testing so that our students while they're working on it they can make sure hey did this thing actually do the thing before we deploy because with see if engine it was kind of like yeah it'll probably work it doesn't run it and make sure it works so it's been quite a lot of time on that and something else that we've had a problem with is just giving us infrastructure pretty quickly with see if engine it was just not working Chef makes it a lot easier as long as we have everything kind of set up for that and we're also working on we also have worked on standardizing a lot of our deployment of services so in the case of our FTP mirrors we finally got that converted over to new new hardware but it was also running on now it's running on Chef we kind of took our antiquated bash script system and moved over to the Chef and now if we wanted to redo it and make it better later we have that ability with Chef pretty easily to do and something else that we've tried to do a little bit with is delegate a lot of this infrastructure code to the project so one example is the PHBB project we took an opportunity when some of the servers got hacked pretty badly couple of years ago I think or was it last year I can't remember which I think it was last year yeah yeah it was last year last it was last Christmas thing because I remember it was over Christmas we took the opportunity basically of case like well we got to rebuild everything anyway so let's get our Chefs to go Chef stuff going on it so we created their own private cookbooks so we would do most of the management of it but it gave them the visibility of knowing what was going into their infrastructure they maybe didn't need to have all of the Chef infrastructure up and going and behind it but they can at least what we were doing we can make pull requests and they can comment on it and be like yeah that's what we want but they can make issues and so that's kind of our experiment with that seemed to work pretty well and so we're kind of expanding that out to more projects moving forward and and that was the one piece that we couldn't quite get working with Puppet really easily because we couldn't do a lot of wrapper cookbook or wrapper wrapping around that code as easily as we wanted to with Puppet with Chef honestly now if I was doing if you scratch I might actually look at Ansible but um so let's kind of talk about what do projects really need and this is my point of view I would love to know what projects actually need but I think the common thing I've heard common theme I've heard over the years has been testing resources being able to test the code whether it means putting it in the Jenkins and running things over and over again maybe we send you specific that type of hardware for some kind of architecture that we need to do testing your builds on or whether it means we have this really unique set that we need to deal with so well a lot of projects I think need a flexible testing compute resources Travis CI things like that they probably feel they feel a really great niche but there's certain edge cases that just don't work well with that and kind of what I think we as the OSL could help with is help with those edge cases you know we could work with the Jenkins project to come up with like a Jenkins as a service thing where we can kind of make flexible services out of that and we can you know work and have dedicated hardware with that they don't have to pay us we can offer that service to projects there's also some customizable test integration tools that just don't fit on some of these platforms out there and we kind of want to fit that niche as well you know there's certain edge cases that just need to have some certain type of testing that needs to happen and there's also unique challenge testing challenges a lot of these projects you know they get to a certain scale and maybe a large corporation is having a particular problem with the software and they can't maybe scale out to figure out that problem maybe we can provide the resources to do that so maybe whether that means having a machine with an ungodly amount of RAM to test out a specific bug when you have that much RAM or it means we need to scale out you know tens of hundreds of thousands of instances to figure out a particular problem so I think that's one area that we might be able to help with I also think a lot of projects just need an easy way to kind of self-serve or have some manage hosting a lot of projects that we host they just want their crap to work as I would say they don't want to deal with setting up a patchy managing it services they just want it to work they want to deal with a project they want to deal with the code they want to deal with everybody else part of the problem is a lot of it just means we need they need to host complex platforms whether it's just what they wanted so in a lot of cases it's like well we need to host get lab or we want host Garrett or Jenkins Jenkins is pretty easy to host but you know it has all these various things where the specific knowledge that needs to be known and how to set these up properly and I think there needs to be an easier way to kind of manage those things other things include mail managers mailing lists in general that's probably one of the more common things that we provide for a lot of projects and projects want to use things like Jira you know any Java application that uses a lot of memory you know that has unique resources and you have to figure out how you want to deal with that so there's just a whole bunch of things out there and like I said they need the service but they really don't want to manage it so that kind of comes down to us and you know I would love for us to kind of take that onto more of a platform to make it easier to for those types of things to be spun up something else that I haven't heard too much about but I think it's going to be a little bit more important but neutral CD and Mirian when I say neutral I mean in the sense of the vendor that's providing the CDN so the open source lab has been viewed in the past and currently as the Switzerland of open source hosting we don't have a particular vendor that we deal with we're at a university we try to be as agnostic as we can with projects and that's actually an important reason why some of the projects come to us because we're neutral while they maybe could get some hosting at fastly for some CDN maybe just you know as a project they don't want to rely on that maybe they want something else and so we we feel like we can kind of fill that niche we have our traditional Mirian network but it just doesn't scale and it's not really a true CDN network and I think there's kind of a need that projects need to have that so a lot of projects get popular and they need to scale fast we have a lot of projects that come in and we have to deal with that like I said our current FTP Mirian infrastructure is not flexible enough it's antiquated to how we did things 10 years ago and it works but you know it's not API driven if you wanted to upload a file you kind of have to have we have all the syncing happening via cron jobs there's a few edge cases like with Devin Repos that have push capability but all in all you're kind of on the schedule you can't just tell us hey sync everything and that's really a problem in this day and age and a lot of projects maybe just need a couple of files hosted there's also the problem of being geographic really diverse currently the open source lab is in Oregon our FTP servers are actually in three locations we have one in Oregon one in Chicago one in New York but that's the only thing that's geographically diverse in our whole infrastructure which we all know that's not good so that's another problem and easily hosted by trusted entity whether that's us or whoever I think that's important to some projects as I mentioned there's also an I need to access special hardware and usually what that means is we need a place to host this that is stable and up and but we don't want to have and some developers apartment or something and or some of these machines require specialized power cooling requirements or they need to have some some NDA contracts or whatever who knows so in a case that we've done like we've had a long time relationship with IBM and more recently in the last year we've been doing a lot of things with power eight we've had access to power architecture for a lot of the open source projects actually a lot of the open source projects that have power PC built probably were built in our data center with the stuff that we have hosted we just got some arm 64 machines more recently I think right yeah so we have connections actually the biggest connection we have with that is the GCC compile form project how many of you have heard of that oh yeah figured you would have but yeah it's a project that has a lot of compute resources for people that have access to test their code on and we actually just got an agreement with them where they had Facebook wanted to send them up a compute hardware where they didn't have a place for it to host it so they came to us and they're like well we don't need all three racks but you know we think that we could you know as long as you give us some of these machines we could do the rest and being an open platform we were kind of interested in that so we're in the process of getting three racks worth of Facebook open compute hardware and so it'll be interesting to see how we want to use that and kind of go through that and get projects access to that if they wanted to because that's certainly a platform that is open but hard to get access to there's a lot of you know work and needing the ports and the fixed bugs on some of these other hardware a lot of these this hardware needs you know you need to have real access while you can maybe emulate it that doesn't catch all the edge cases there's a lot of things we've actually caught a lot of bugs on the Power 8 side of things of you know initial bugs actually on the hardware and the firmware where I've been uncalled with IBM engineers like huh yeah that's a that we should fix that kind of stuff we're kind of diving into IOT it's the new buzzword these days that's still kind of an ongoing effort on what that actually means on how we're going to be a part of that but currently we're we're helping with the all seeing alliance hosting some other test infrastructure but I certainly see a lot of open source things happening around IOT and seeing where that's going so how do we get to getting to all the place all the things I just kind of listed that's the important question well the first thing that we need to do is have a technical upgrade we need to build and explain our infrastructure whether that means building out our genetic and open stack clusters or whatever technology we decide to use but the point is we need to have caught infrastructure we need to have some kind of automated build services that are really important for a lot of projects well that means we need to write software for that or just offering the service and actually also being test services and support so when projects run in the problem with testing we can actually dive in and help with them we have a lot of projects that just want to have data or just simple metric data on the stuff that we host the classic example is as I have over 10 years worth of FTP server logs in my archive that I've never done anything with other than maybe run AW stats with there's an incredible amount of information in there and it's a lot of information that projects want to know you know maybe it'll help a project know hey we have a whole bunch of people coming from this certain part of the world maybe we should devote some developers to kind of see you know what's going on with that or we we don't see a lot of developer or a lot of traffic from over here so we're interested in kind of diving into that I hate the word dashboard but that's it's kind of what it is you know just being able to show that I also think it's important that we create some kind of a CDN network and I'm not saying like akamai level basically an API driven mirroring network similar what we're doing with our mirroring now but it's API driven similar to S3 I think that's important for a lot of projects and we also want to improve our infrastructure security whether it's our security or the project security the weakest link and anything that we have hosted is that one machine that was maybe had one log in but it's been running for so long somebody forgot about it that still had access to other things on the infrastructure that gets owned and then owns the rest of the infrastructure you know having people dedicated to looking at that all the time and working through that is really really important another big idea that I've had is building an OSL university network university because well we're the university we have really good connections plus provides that neutral layer we've had a lot of I've had a lot of people from various universities come up with us it's like we love what you guys do with the OSL but how do we do the same thing where we're at and we're like well the first thing you got to do is build a data center and they're like no we can't do that so we've we've been working on this idea for a long time and we're finally getting to a point where we think we can implement it so basically we want to start collaborating with not only North American universities but global universities essentially we just want to be able to host a half rack of beer so it'll be standardized we'll have some geographic diversity it'll probably be just running some cloud services we're not going to be doing remote vocation at least not initially but just ability to have that has more hand support in my prior life and being a Gen 2 Linux infrastructure lead I had to deal with this where we had servers from all over the world and having to deal with those various IEPs was a pain in the rear maybe I just like doing that but I think for our point of view we can help with that pain on the rear and work through all of that and the projects just have to come to us to deal with any issues I have already mentioned bringing cloud services so if we had that you know CD and network a part of that would be hosted in that half rack and the other thing that's important for us at the open source lab is mentoring students so there's been a common theme that I've heard over and over again at least at the sessions I've been to where how do we get interested in doing this stuff or how do we even get some access to do that back when the day went out in university you never would have root on anything at the IS level if you wanted to work on information services and most of the stuff I learned on my own because I just tried doing stuff and at the lab we basically have scaled that up to work at OSU but that's great for us at OSU but that doesn't help the other universities so one thing that we're working on is developing a way to basically have some find a university and a specific mentor or a person kind of to lead that up and we can help mentor those students remotely it also would probably mean we would send some of our students over to that university help them get things going on or maybe they come to Oregon State and come to us but the idea is is this provides a mechanism for a particular university to invest in the type of ideas that the OSL does without having to build a data center yes they at least need to have a half-rack ideally but we can grant the root access or whatever so that they can get the experience they need to deal with these problems from a day to day and I think that's really important and I kind of talked about that already kickstart the OSL concept to other universities I think that's really important I think we also need a re-engineer or backend services that kind of fits the new day and age of things so continue working on standardizing our server management any IIT organization really isn't homogeneous we always strive to be homogeneous but I think we really need to try and do that to make things easier on us and for the communities at large now this doesn't mean we do a lot of unmanaged hosting which basically means they can spin up VMs and they can run whatever they want I'm not talking about that I'm talking about our internal stuff of what we manage we want to standardize we need to catch up with technology trends whether that means with certain kind of platform with a service you know containers we need to think of we need to catch up with that we need to have fully testable infrastructure which we're getting towards we need to make it more robust to failure you know hardware fails software fails often how do you work around that how do you work with network outages and things like that we need to be able to deal with that in a lot more automated way and we also need to make it a lot easier for our new services to play for projects so if you have a project that comes to us and say hey we want you to host Europe for us can you do it you know currently it's like well crap we need to like figure out how we want to host it do we make a we spin up a VM for it and then we got to get it in Chef and all this it'd be great just to have a system that this deploys it as a service and it's there and it's done and hopefully it has some way of updating in an easier way we can scale it if we need to that sort of stuff so this kind of goes on to bringing creating an OSL platform with a service I'm not saying being the source towards version of it but it's more of a you know fitting what we need so majority of our hosting is a simple web applications whether it's PHP whether it's Ruby now it's getting in the node you know you name it it's it's pretty simple stuff usually it's just a V host with some kind of application that needs to spin up there you go maybe there's some shared file storage that needs to happen whatever but really what we need to have is something else that's scalable so some of these projects they grow pretty quickly right now if we scale it we got to spin up another VM and go through all of that that's really not how the day and age works we need to kind of get up to the times we need something that allows us it also needs to be API driven for projects so it can be self-service so all we have to do is just grant them access and maybe set a quota and they can kind of do what they want and it'll also speed up our and expand our capability so we can automate you know a bunch of these things and we don't have to do it ourselves and I think having some kind of a platform as a service at the lab is really important and historically we would have a project VM and it would host all the stuff on that one VM that worked great back in 2007 but it's not the best idea in 2016 maybe this were some things you know some projects that said what they want and I'm not going to tell them no to that I'm not suggesting that we want to do that but there's a lot of projects that they just need something that just freaking works and they maybe they don't care how it's implemented if it's connected to some Git repository and they can update the code that way if they're happy they can do what they need to do so something else I wanted to talk about Supercell it is a project that we started back in 2009 Supercell I came from the Midwest that was back when cloud was becoming a thing and Supercells are big thunderstorms under head clouds this idea was basically building a cloud infrastructure or a compute resource for projects to do testing of any sort so we kind of did something with Facebook back in a day they wanted to be able to do some testing with hip-hop at the time we had Gennetti there was no open stack so we used that as a compute resource we worked on creating a web interface for it because it was primarily a command line interface to kind of make it work we have a couple you know dozen projects using it it was useful but it was pretty much an ad hoc just compute resource that would spin things up so it was great you know but it was one of those things that we created and we just kind of let and set in the rot and it's still important so I still think it's viable and I think we need to continue doing it so what I want to do is rebuild it with open stack and just expand it we're working with a couple of companies that kind of help either with the hardware or with the software side of that and kind of make that happen it'll ease onboarding a project if they want to use this resource we can offer pre-built managed CI solutions hopefully as well that would be great and we can you know since we're academia you know we have PhDs and master students working on various testing technologies that are fairly new and maybe projects want to have access to that to maybe catch new bugs we have a master student right now working with Paul McKinney from the Linux or from from IBM he works on the RCU and he's doing some really interesting stuff on testing specifically just to the RCU imagine if you do that with any project out there it's pretty interesting the other thing that we're diving into is education and diversity I don't know why that's on the Supercell slide but we're trying to make an open source track at OSU in the Logical Engineering Computer Science Department we're going to be creating some online classes later this in the fall well we're going to work on the DevOps part a little bit I hate using the word DevOps but we're working on creating basically some courses that people in the DevOps world need to know whether it's Linux or some administration application deployment test infrastructure database server management or you know things like that we're working on creating curriculum around that we also want to diversify our workforce in general as we all know that's not the best right now so it kind of sums things up I think testing resources are really important for projects we need to place the host weird unique hardware projects need some kind of a managed hosting service whether it's a platform where we fully manage a VM or whatever we need more API everything needs to be an API and it needs to be some kind of common tool set that projects have access to to do what they need to do and us personally we need to increase our academic mission around DevOps open-source software and so forth so with that we need your help specifically I need your help so right now I kind of want to start with a discussion of the future of host hosting based on kind of what I've talked about more importantly what do you need if you're a project what is missing what did I not cover what's important to you you think we might need to look into what should we be doing or what should we not be doing and yeah that's what I have so with that any questions oh got one way in the back I'm interested in your implementation of OpenStack I mean you know everyone says it's got to be this massive infrastructure of machines and what happens to make it realistic you guys have that kind of scale or well right now we have we manage it all with Chef and we have a basically a tune compute node cluster we have a really simple setup we're using Icehouse right now we're using Nova Networking because it's just kind of worked at the time we're primarily using it internally for our Chef configuration management run so when we do test kitchen we'll run spin things up do tests and tear it down we're using it actually in this computer Linux system administration class we also have a cluster running on power 8 with IBM that we worked on so we have any projects that want to have access to the power 8 platform they actually just get access to an OpenStack instance running VM but that's how we're doing it and part of what I've I need to work on is like how do we go to the next version with Neutron I'm looking at either Kilo or Liberty or whatever the M1 is going to be and kind of go from there but my point of view is I want something simple but it's a complicated thing because there's so many things that work and then they don't work and then when they don't work how do you fix it the nice thing about Gennetti is it's simple and it works but it doesn't it's not as flexible as OpenStack so you know what's funny is is actually our controller nodes are actually VMs on Gennetti because those are our pet VMs and we put our pet VMs on Gennetti but you know I really want to that's kind of how we have it deployed we're still figuring that out is basically my answer as everybody else is so I'm kind of curious with the with the OpenStack deployment is the long-term vision you have for that to be providing projects with basically API keys and quotas okay yeah basically my plan is is basically API API keys and quotas is what I really like to do I don't want to have a ticket come in like hey spin up this VM and we go on the horizon or whatever we do it god no no no no that's why I'm talking like API's API's API's you know but you know most of the time we spend are dealing with tickets is to do simple system and tasks that we can automate so that's really what I want to do I'm not planning on killing our Gennetti stuff because it's still really useful we might find ways of making it more API happy and easier to use but that's really what we're thinking like you know that the open stack environment could be used either for testing or whatever you know it's kind of it's there we might not being what open stack is we can't give a good SLA if we have SLAs on it because it might just break so we kind of need a I'm nervous about that but I think it's important especially on the testing side sorry to bang on about open stack but have you try to want to install PPI's that are out there like we're running within Alex Day or on bare metal we're primarily a CentOS shop so not really doing much on the Ubuntu side of that unfortunately or horizontally however you want to look any other comments or feedback I'm curious what you're thinking when you were when you were talking about the GCC compile farm and onboarding a lot of those non-X86 architectures one thing we do in Gentoo with all of our architectures is actually with some upstream projects is I just give them accounts on it because like the GMP guys Strace lib tools these projects that need to work on pretty much every architecture I mean Gentoo included but we have the ability to get machines for it it seems I mean you guys are hosting a lot of our machines and I in turn and kind of doing like hosting for these guys just to give them accounts on these architectures I feel like the GCC farm kind of intersects with that mm-hmm is there any talk of kind of opening that up a bit more without the GCC guys feeling like you know these are our resources why are other open source projects getting CPU time on it you know what I mean kind of I've been trying to have a better understanding of how we fit in with that GCC compile farm that's a good question and I don't know the real good answer to that I mean I want to we can we have the capability of hosting it so in the case of the GCC like the power eight systems for the GCC we can manage the hardware for it and they can do whatever they want but we also have this open sec infrastructure so and I think we've given you guys access to at least spin up DMs and do whatever you want and kind of go through there and we've been spinning that up for any projects on that particular architecture but that works for that architecture how do you do it with other architectures you know not everything's going to work with KVM maybe you don't want to have you want to have raw access how do you deal with that you got to manage shell accounts and go through all of that so I mean what is what do you what would you like to see I guess really what I'm looking for is I don't want to say Travis but the equivalent of a commit CI system but on different architecture yeah and that was another but go ahead whereas I think the the CI at the or Travis at the moment I think it offers I don't know if it has PPC but I think it just has 32-bit and 64-bit x86 so if you want to do CI on any other architecture you're pretty much yeah screwed so these other guys are basically I'm giving them accounts just to do CI for their projects that's it so you're basically saying if we have the capability of at least running kind of a Jenkins or some kind of CI system to run on the architecture that's what you would like you would love for us to be able to do that yeah I think that would I think that would help the most the people that I talked to and what would be the best way they would have access to it like we there's a specific tool set the specific API they like or they just want to have a Jenkins account and do whatever the hell they want I mean still bought Jenkins like whatever I think if they have the way of submitting jobs I think one of the advantage of the Travis with the GitHub integration is that you do get to include the script that does whatever random crap that you want so having a Travis CI kind of system was probably the most ideal where you include it and it configures it and it does it yeah yeah it's good to know yeah that's a good view you did any other questions wondering about the impact of GitHub itself you know because it's become the like the mother of all project hosting are you talking about the impact on us or just projects in general well no you guys I mean I know in the Drupal world Drush moved from Drupal.org to GitHub and it seems to be a trend you know you don't exist unless you're on GitHub yeah honestly I think it's been a net positive for us doing get hosting in general manually sucks and they feel that naturally well they have their own issue tracker if you like that they have their pull request system if you like that obviously we could start offering get lab instances which might be useful for some projects or Garrett for some projects I think that would be beneficial for them but I think a lot of the projects just use that I mean honestly I pushed I held back so long for us to actually start using GitHub and I finally like crap this is just not working plus for our students it's the resume essentially so I think for us it's not a bad deal like they fill a net we fill a net I think you know someday they may be no GitHub maybe there's something else out there that replaces GitHub but yeah the only thing I would add to that is just reliability GitHub is becoming kind of a single point of failure so get lab and others can be essentially used to give us more control and risk aversion then get lab GitHub engineers yeah yeah something else that I've been thinking about doing on the past side is like I don't want to reinvent the wheel so you know whether that means we're going to look in things like Cloud Foundry or things like that you know maybe that's what we need and we just put that in there and we give API access to projects and that's how they manage things I think that's going to be beneficial but they probably don't want to spin up the infrastructure to do all of that yeah we actually have an internal get lab instance not running because there were some things that like we just didn't want on get or get hub private you know private stuff that we wanted on there and so we started playing around with that and it's it's pretty nice but it's like how do we maintain tens of thousands of or not thousands hundreds of those that people wanted to host that does anyone know if there's like an online get hub but for subversion isn't that well google code used a lot of subversion back in the day and web svn itself is usable as a web interface to svn so like it already has htp DAV access if you wanted to deploy it yourself you can as far as like hosted services launch pad probably only bizarre okay so for the few of you in here that actually we host stuff for is there anything specific that you think that I'm horribly wrong about or that I'm missing or you think I should focus more on who's going to ask what not yet so something like open chips or well it seems like a simple container would probably do some of that something that they're not off people they are aware how to write those that they have no idea how to deploy yeah they just assume to send mail yeah anybody here excited about ipv6 because I don't have anything to offer there but I'm actually one of the things that we've gotten a lot we've had a lot of requests for serving bits over ssl and so we've been going down the fastly route like working with them but we haven't gone full fastly if you will I'm curious how much you've like the challenge with the traditional mirror network model is the security the certificate aspect of it and I'm curious when you've thought about or OSAIL city and if you're talking bouncing like mirror redirection type stuff or mirrors.osu.osl.org being multi-homed across the university networks that you would be part of it I really hadn't figured that out far enough ahead so whatever makes sense you know I'm kind of open to anything that works for projects because before we have a minute I want to get feedback from people like you like what's really important you know we've just recently got a request to having an SSL certificate on our FTP mirror which the hardware we have it on now won't be a big deal but then it's like well what new problems will that create for people if they just so happen to try SSL on something on there so you know that's important to know yeah and our problem like on our FTP mirroring network is we'll have virtual hosts that are that are on O.S.U.O.S.L but we have a lot of other virtual hosts set for other projects domains so then if we do that then how do we deal with adding all the other domains and dealing with the authority yeah it just gets complicated like yeah we can turn it on but it's going to probably break other things you know huh it makes it better yeah well any of the questions otherwise I'll call it good contact info a lot of you already know how to get contact with me and thanks for coming out and then it is like the voice of Zeus himself okay cool how about that cool well folks thank you so much for joining me at this advanced stage of the of the conference I think we're really kicking it into gear by now and from this point on it's just going to get better I do appreciate the fact this is of course the last day on a beautiful weekend for everybody who's coming in from from the East Coast you're welcome it's a beautiful day it's the last session to the talk for me it's I want to be sure that this being the probably the last one that you attend hopefully unless you have a time machine where I've gotten the reading of the schedule completely wrong I want to I want to end this on a very high note and be something that you guys really like and appreciate and have fun with my name is Brian Reynaro I am an engineer at MongoDB I am a developer advocate which is it kind of an interesting space to be in essentially I am a consulting engineer an integrations engineer a kind of a jack-of-all-trades that has been let off the leash so what I do is any time that anybody is using MongoDB from debugging it debugging their implementation of it preparing to scale out getting their data model all set whatever they need anything that touches MongoDB I can help them with and often do but now I am here to to talk about the internet of things and using it with MongoDB and thank you very much for joining me so the first question that I have for you I did tweet out a couple of days ago I wonder if anybody saw this this tweet there was a little bit of a challenge it was trying to get people to come in because I thought it was going to be the last one on Sunday does anybody know what this thing is can anybody answer this question what is this mysterious Egyptian device is it Egyptian is it otherworldly is it an extra terrestrial piece of technology any guesses what does it look like could be a water pump but if not good guess I like that one points for trying what does it look like grain mill windmill what you are looking at is a is a first what I would typify as the first institutionalized codified organized permanent wireless network this is a semaphore station this is from 1796 1796 it was invented by a gentleman named Claude Chapp and they could send there was a number of these stations each across France they could sparse them out from they stretch them out in intervals of about 20 miles so pretty pretty far distant and the way it would work is this is exactly a semaphore that you've seen on you know old naval vessels and stuff like that you know guys with the flag same principle is that the arms would be moved into different positions to indicate a different letter in the alphabet okay 1796 there is a network of these stations around France they still exist and they could send messages from one side of France to another in a matter of minutes right in fact there is famously a battle that was fought in Alsace as the French in the Alsatians are want to do and there was a battle fought and they got a message from Lille to Paris within an hour using the semaphore stations and that was a distance of 140 miles in 1796 1796 in fact in San Francisco there's a very famous hill called Telegraph Hill and it's not a Telegraph hill because of an electric Telegraph they actually had one of these on top of Telegraph Hill where Coyt Tower is now and the idea was that you there would be a guy up there watching the ships come in through the Golden Gate and then signaling that the ship was eminent its arrival was going to be eminent within a couple of hours right depending on the tides and stuff like that and if they could identify the ship they could identify the cargo it was carrying so if you're down on the other side of Telegraph Hill in the old city of San Francisco Yerba Buena you would look up and you would see at the Telegraph was telling you the semaphore line was telling you what cargo was about to come in port potatoes, eggs what have you eggs were rarity in San Francisco at that time the price dropped out of that commodity because you knew that there was a resupply of that coming in so you didn't overpay anyways so the gentleman who created this was a person named Claude Chap right 1796 invented the semaphore line revolutionized communications interestingly enough actually the semaphore line was in place and operative it was only replaced until 1846 by electric telegraphs that his brother would say was why would you want to have an electric telegraph that's totally unreliable anybody could come along and snip the wire right as opposed to a cloud bank coming and obscuring your network and causing connectivity problems but this was a very successful system right and it's an example of internet of things out there deployed in the environment that are exchanging data over these wireless networks that have to deal with environmental factors that have to deal with loss of connectivity and all sorts of different kinds of Byzantine faults if you will interestingly enough kind of sadly enough Claude Chap didn't live to see 1846 he instead threw himself down the well of a Parisian hotel to kill himself in 1805 just a few years after he invented the semaphore line why I don't know but I would just say the engineers never make money on this kind of stuff okay so how are we going to make you successful all right so what we're talking obviously about internet of things and the idea here is that what is the internet of things as we earlier there was a discussion here that was having we were talking about what is the internet of things why has it suddenly gotten so super hot as a topic because these are concepts obviously that go back to the 18th century if not earlier but why is it now that we're we have so much interest in the IoT space I think people are starting to realize what what is potential here or what is coming it's like dawning on us that all of these systems are starting to come together an example of IoT I'm just going to go through a couple of examples here because it's not just personal devices that we're dealing with we're dealing with any kind of system if we're going to define IoT we're dealing with a system that has transducers of some sort has interaction with the physical environment in some way to collect data that's going to be transmitted back to a central system and that's where the trouble comes in right is getting the data back on the central system and dealing with it so here of course is one example of IoT that we've worked with this is an embedded induction loop when a car goes over this sensor this induction loop it induces a voltage in the wire and trips the sensor the transducer goes off and we know that there's the presence of a car at this stop sign right not the sexiest implementation of IoT but you will love it in its absence you notice its absence because you're always waiting for at a stop light you're waiting for the timer to run out before you're allowed to get the green light this is one example you can with these sensors obviously deployed across networks we did a little bit of work collecting data from for presentation that it did a couple years ago that was really interesting about the state of New York publishes its traffic metering system actually a lot of states do that by now but you can actually go through their API and pull out vehicle speeds based on different segments of the road and they include weather conditions too as well so you can actually do some pretty cool analysis off of this published data set and maintained data set to see how does weather affect traffic conditions on certain kinds of roads and characterize those roads based on their average times and how many and how long it takes to get from one part from interval to interval on the car road another one that we've done a lot of work with is with power companies Internet of Things also includes smart meters there's a big field here these of course are now excuse me these of course are now fully wireless systems they have their own NIC cards in them there's no more meter reader that comes to physically read the meter anymore they just drive a truck through your neighborhood and pick up readings off of these meters if they're not already transmitted through the the transformer the electric transformer that's at the top of your poles so constantly drawing information off of these off of these meters it's kind of a fascinating space because what what kind of data are we getting off of the system we're getting voltage and temperature we're getting amperage off of these things so the deal is is we're trying to figure out how not just basic metering of how much power a household is using but maybe how they're using the power for instance I recently got an electric car and I'm completely pretentious about it so I called up PG&E and I said can I have a special rate PG&E well growing up down here it was SDG&E when I lived in San Diego I don't know what they call it now but PG&E is the north north part northern California power they would you know I said like well I plug in my car and off peak hours can you give me a discount no yeah yeah I can see that you do like I already got the car and they could tell from my meter reading that I was plugging it in at certain times so part of the work that we've done is to analyze some of the data that comes off of smart meters and then we say hey guess what the the goal was with this particular company was sending real-time notifications to someone's account that say like if you you're in a peak usage time if you turn off your washing machine which seems to be going right now you will save X amount of money please do now there's all sorts of fraud detection with these things as well because it turns out that people like grow houses are out there I mean there are some malifactors out there not necessarily that anybody's trying to analyze this data to find out if there's a grow house that people are trying to evade having to pay for electricity all the time so they'll rip these things out turn them upside down to try to get them to run in reverse literally foil to block the transmission of its data they'll actually take a meter off of a derelict house and put it on their house so that the billing goes to someone who doesn't you know totally different house so there's actually a component of location of where these things are in the data wild wacky times okay and then I think that there was a gentleman talking about home automation is another big field home automation is also very important of course for I'm speaking about it now because I think to put that in here but home automation of course is a big field as well H-back costs trying to keep that down now there are green mandates again to lower the energy consumption of large managed buildings and let's say office spaces as well that's a big component of what we're dealing with I think that there's also another component here while Fitbits are not necessarily all that necessarily advanced but there's they're trying to come with this idea that there's connectivity between devices themselves that play a part in the way that we want to deal with these systems that is to say that you have a network of devices that might be communicating with each other before they come back and communicate to the centralized system and also this idea that we at these these kinds of data that we're getting off of these new instruments these new devices are ones about ourselves which plays an extremely important part about the kind of processing that we need to do on this data the way that we architect our system what we're dealing with today is almost assuredly to be something different in the next six six to 18 months right talking including about embedded medical devices which are evolving as we speak and what kind of data will we get from them the I think probably the most meaningful thing that I've heard people say about IoT in general a lot of people talk about data volume and that's indeed a big big part of what's going to be happening on on these systems is just a huge amount of data coming into a system the volume the need to retain it the bandwidth required for the ingress of all of this data the ingestion of all this data but another aspect that I think is really overlooked at this point but won't be for long the value of the data that you're collecting on all these devices really doesn't mean anything unless you can process it some way you have to gain insight in this case we're creating a social network a social graph off of data that we can derive off of the way that people are interacting their proximity to other people what they're doing at what time and what context in marketing there's a term that I actually kind of like called a system of engagement system of engagement takes all of this data accrued from all of these different places could be devices could be apis personal accounts coordinates it and is trying to create an experience for the user that gets you to be engaged and doesn't fight you like if you have a if you have a flight booked to go to a city you want to know that there's a traffic alert along your route these systems are context aware so to borrow a little bit from that idea of systems of engagement I think an important part of IOT is remembering what kind of meaning you're trying to derive from these systems what kind of insights you're gaining what are you doing with the data that's actually important okay so when we talk about high level architecture and this is extreme that this is just kind of like setting a bare bones you know most people think of system architecture services architecture in this basic way I've got clients with data services layer in a back end database of course there's a lot missing in this kind of a diagram but it's I just wanted to start off with that one to so we can build out there in discuss a little bit of its absence we don't have monitoring systems in there there could be several layers of data services there could be the front end clients that are micro services that talk to the actual application clients that we see out there before you get towards the database it wouldn't be just a single layer you have a data services layer in front of the database and then your application clients ahead of that and that are actually talking to the back end systems but there's also a need for analytics archiving and monitoring of these systems as well which we're going to go into in a little bit but keep that visual because we're going to start building out from there okay so if we're going to start off we need a data strategy if we're going to be dealing with these architecting systems we need to figure out what we're going to do how we're going to handle our data and not just physically but the actual data model together now some of the things that I want to talk about today they're buzz terms and one of them is how many people have heard of event sourcing or CQRS? yeah there's some we'll go into detail about what that is in a very interesting pattern some would call it like a pseudo architecture of CQRS that's why I find this term a little bit more applicable is data strategy because it involves those ideas involve not just how you what database systems you're using how they're arranged and the data model itself but a little bit of the processing and how the nodes of the database system are arranged themselves as well so when we talk about data strategy let's talk about first off the way that we're going to model our data and the tools are going to be using this is a little bit of paying the rent because I'm going to obviously be using MongoDB with these systems but there are some several advantages with this idea the first thing how many people are familiar with MongoDB? great how many people have used it before? okay so really fast I just want to go through here some ideas about a flexible schema I hope that doesn't appear too small on the screen if it's not you'll have to move forward so I don't let's do it everybody stop what you're doing move forward because my slides are too small anyways the idea here is we talk often about flexible schemas which is an important aspect for this domain the reason that's an important aspect for this domain is that we are dealing with a lot of different types of things in the internets right we're going to be dealing with a lot of different devices that have different characteristics they could be very simple attributes that they're met that they're metering on they can be very complex but there's going to be a myriad of these devices some are older versions of a device that don't have the same features as newer versions do but they're still out there in the field active and collecting data so we need a system that allows us to easily ingest that data into our database maybe we'll become more strict as we move on and our classification of the data and the schema but for the time being we want to maintain a flexibility to ingest new kinds of data now the reason that I make note of this flexibility and how it applies to MongoDB is that in MongoDB there's not a schema enforcement you can use this flexible schema that is to say that records that are put into the database as long as they're well-formed JSON they will be accepted and there's not a need to check the presence or type of a field gives you a lot of flexibility for instance these are two records in the same collection or table if you will and the idea here is that they're of two different types right these are two records but the thing that's different about them is that they're two different types of vehicle what I like about this particular kind of demonstration or this example is they are subclasses of a parent class a sports bike is a subclass of vehicle just as helicopter is but they're two fundamentally different vehicles which we may want to put into the same collection even though they're fundamentally different things they share common ancestor or common understanding we want to keep them in the same collection because we want to access them by the same kind of query patterns okay so you see that in this particular example I'm using a discriminator field to determine the difference between these two objects but they share common attributes and one of them is is that this company Augusta makes helicopters as do they make sport bikes right so maybe I want to search on all of the vehicles made by Augusta for some reason or maybe all of the transducers or devices made by Siemens or General Electric or what have you Arduino if I so choose but the interesting thing is where the flexibility comes in is that we have what I'm referring to is a polymorphic attribute rake and trail is an attribute of the motorcycle that has no relevance to a helicopter and therefore does not need to be in that individual record just in the same token a motorcycle with blades a four-bladed motorcycle would be interesting crowd-pleasing but irresponsibly dangerous for the rider so we don't include they're not built with these blades and and they're not therefore necessary in this document so we have flexibility about the way we can store our data okay so that's just a little bit of level-setting about this document model let's talk about this this data strategy how we're going to actually deal with our data these sensors these internet of things they're sending in in many cases most cases they're sending in a state what their state is at a certain point something some condition some reading some measurement what what is happening to them at that moment and that's being sent back into the system those snapshots or those instantaneous snapshots in time really don't have a whole lot of meaning for us until we start thinking about them in aggregate now one way that we can deal with this is that we say we have an object that represents the state of a device or a system right now in the database and then every incoming message that comes into the database updates the state okay seems reasonable like I have a I have a smart meter I want to know what its current voltage is so I have a I have one record that corresponds to that one smart meter the new reading comes in from the meter and I update the state of that meter it's voltage last time I checked the last time a message came in was 220 volts it was at a temperature of 70 degrees Celsius and there was an amperage of 210 amps or something like that right it comes into the system that's the current state the problem is is that's an extremely limiting way to deal with our data and it's not we're losing insights we're losing ability and robustness against our data because if we're persisting state only we've lost a record of how we got into that state so the the fundamental idea behind event sourcing a buzzy term is that we don't want to think in terms of state on the on the system that is to say that we don't we do have an idea of state that we're going to derive state but we don't want to immediately think of a record persisting a current state of our objects what we're going to do instead is we're going to go into the system and we're logging these events an event log or domain events right if our domain is is smart meters it's going to be something involving current current power temperature all that kind of stuff if we're if we're dealing with a system a mobile system a component of that data is going to be where the where the object is physically on the planet at that time but what we're we're logging here is not we're not updating a state we're recording or persisting to the database deltas or changes or logs this would be akin to saying like I want to find out how much I'm going to have to do expenses right if I do a trip so how do I handle my expenses when I go home and submit an expense report I collect my receipts and then I then I derive how much I'm owed for my expenses this is like recording the receipts as opposed to constantly updating what my account is every time I make a buy a coffee or something like that right so we replay this log to derive a state right so at time 12 tick 12 the the state of this system the application state is nine okay now there's a lot of advantages to doing this if I lose the application state if the system crashes what do I do I replay the log and I regain my state no big deal very robust that way but there's also another advantage is I can see what the time what the state used to be in the past right I can see how I got there or maybe what the conditions were at that time five or ten I was up down all that kind of stuff I can go to any time in the past provided that I've not deleted out the logs and regenerate state in the past so maybe there's something I've learned about processing this data like I might find that there's certain users that are that I can reach or let's say there were users that I found that they have associations to one another I want to know if they were ever physically in a proximity to one another I can go back and search for that in the past I can change my data processing model my stochastic models and apply them to the way the world would have looked like in the past so if I update like bidding models or anything like that I can go back and see how well that would have played in the past so event sourcing is pretty powerful and in the domain of what we're doing here it's especially powerful because we want to keep track of how we got into the state we might want to find out from our sensors was there like if a sensor is about to fail was there variations in its readings does that indicate that there's some kind of failure on the system in the case of the smart meters that we were dealing with they wanted to determine if a group of sensor group of smart meters together in a location we're starting to act wobbly maybe it's not the an individual meter that's the problem it's the transformer that's about to fail and they need to get out there really fast otherwise the whole neighborhood is about to lose its power or something like that so another way to think about this is the A application state is derived is the first derivative of the log I think that I just like saying first derivative of the log I say that because derivatives are we're thinking in that it's continuous function off of the system or that you could do an interval off of the log typically with event sourcing you need to find your true state you need to replay it from the beginning of time and move forward to get into your application state you can't just start in the middle because that would obviously corrupt the system so that's a little bit of difference but we'll handle that in a second the idea here is that particular problem if I have to go back into the past to derive my current state if I lose my state or it gets corrupted somebody attacked me and they changed my application state in some way maybe by inserting erroneous records into the system or maybe I just got it wrong maybe I released a version of my software that was making logical errors or interpretations of the log that's okay I can go back and regenerate my state but if every time I need my application state I have to go back and replay my entire event log that could be prohibitively expensive in computation in disk in network I have to go back I could have terabytes of this data so the obvious solution is to create snapshots at regular intervals so when I need to go back in time to a certain known state I just go back to the last snapshot let's say I wanted to go time interval five then I go back to the snapshot I stored at time interval four load that and then play the event log up to the actual point in time that I want to recover two okay so lots of different ways this is important actually the snapshots this is particularly important because they are a view of the data if you will very much a view of the data at those times they themselves are generated by replaying the event log and as I said I can replay the event log with new information or changes to the way that I I process that log so not only do I have an opportunity with event sourcing to create snapshots of a type of state I can make different types of these these views right I can have multiple views all interpreting the event log in a different way for some reason very important okay so let's talk about some examples it just came down from from san francisco and the idea here is that there's a published data set on crud ad that I'll have linked to in the last slide that you can get you guys can check out and play with if you like but it's the last 30 days of taxi cab movements around the city I thought that was really cool so they're tracking taxi cabs and if they have somebody inside them and it's a lot of data for the last month is over 11 million records 11 million position locations of these cabs so it's every minute they're updating every maybe not a little bit frequently than that they're updating where they currently are where that cab is so there's a lot of data to play with possibly before uber and Lyft there might have been a lot more data to deal with but that's a different issue okay so here's what one of the records what comes in as actually the data that they provide is just a csv actually a space the limited list to I had to be ingested flat file here's how I format it and then just it into MongoDB okay same document model pretty simple the idea here is that number one each of these records is going to have an individual key a primary key that identifies one the cab that the taxi that is that we're recording what this reading is for this position this GPS location is and I've I've concatenated it with a pipe to include the time stamp right so each record is a snapshot in time of where this where this cab actually is okay in addition to that I have additional fields of the cab itself that I might want to play with that without having to parse out the individual record but the real sauce on this one is this format this GeoJSON format which is basically coordinates of latitude and longitude coordinates of where this car actually is at this point and you can see it's longitude first latitude second in MongoDB we can actually we have a geospatial index that allows us to index these these elements and search on them quite fast which is a nice feature that's going to make everything much easier to do that we're going to be doing this example second is what is the occupancy of this car zero is nobody's in it one is there's a fare going in playing around this data set in New York they actually show how many people are in the cab included with the fare price and the tip amount which I thought was really cool and I was going to play with that a little bit too as well to see if people tip more for longer distances and that kind of processing is much much simpler and then the last thing is that I want to know there's an actual smart date value in in MongoDB that I take the timestamp and I interpret it as an ISO date type an intelligent date type that I can index and search on in order through the system as well so I know where the cab is when it was there and if there was a fare currently going on or not now what I did for the processing on the system is as the stream of these these records come in I'm keeping track of I use like in electronics systems you have an edge trigger like if the voltage goes up that you change state on the increase of the voltages the system if if I met a state of fare zero right and I'm not currently on a ride when when that fare flag comes up the first record that comes up as one means the trip the starting that's one way that I can interpret these records and so for every concurrent every concurrent record that comes in while the rate while the fare flag is up as one I know we're on a trip we're going from somewhere to someplace with the rider and when it goes back down to zero the trip is over on the next record that comes in I consider that to be a zero I can actually I ran the code through off of these records ingested them in the MongoDB and then ran the code off of these records I can send it just out on the code so you guys can play with it as well with the data set and then of course started spitting out not how much the the car was where it was driving around when it was empty but when it had a fare so here's one that's overlaid on a map really kind of cool we see that I'm going from somewhere around the opera house over here down to down to my old neighborhood and that's one cab ride pretty cool can anybody see anything interesting about this particular route yeah a little known fact there's a tunnel between oak and fell now actually this is this is one of the issues that we'd be dealing with why do you think this happened why do you why is it that they teleported this card you can't yeah there's a there's a loss of connectivity in the system which makes and we can expect that of course right a couple of things maybe maybe the record's gone it just didn't make it depending on the system that we're dealing with maybe the record's gone it just if it doesn't if it doesn't make it through the network we'll never recover it other systems like smart mirrors will say hey I'm not able to transmit my data so I'm going to hang on to it for up to 40 days until somebody takes it off of me because I don't want to lose this data this system doesn't have that so the way that we handle that kind of failure mode is going to be different on a device device basis another company that I worked with they didn't want to deal with the logic of whether the transmission would get through or not so they had these record these sensors transmit everything they ever recorded every time and so the BOD rate of what was required of the network would just increase over time because these systems would continue to and they were hoping they were relying on a the unique key constraint in MongoDB that says like if you've inserted a record with this identifier before I'm going to return an error that you can't you can't reinsert something with this unique key that was their data management platform little unfortunate easy for them to code though anyway so that draws in a fact about what's what we can deal with this kind of a system here's what that record looks like this is the aggregate view of of my raw data stream was where those records right of where the cab is at a point in time from that using event sourcing I go through the log and I derive this view which is that trip that you see on the map those are the geolocation coordinates of where it was you see that I have it defined as a route I see the starting and time about where the where the cab was how long it took me to go there if I want to get really sophisticated I can use a haversign function to determine between these geolocation points what is the distance and divide by the time and then I get how fast the driver is running too as well I'm not about throwing that in here because it's kind of cool but I can always regenerate that I can add that functionality in because I've stored the records in my database and I can go back against these domain events and reproduce and enrich this data that's the the advantage of these views so actually I started really playing with this data it was a lot of fun and I said like well let's see everywhere this cab has taken a affair in the last month bonk and that's what that looks like now I'm not the only one that's played with this data set it's actually pretty cool and you'll see people more sophisticated than myself that actually put this on an animation and it's really cool you see the cabs going back and forth and they actually remove the city map on it and the city becomes apparent to you as the cabs are going through there now when I zoomed in on this I was playing with this a little bit you see that this particular cab is on every street in downtown right like he he nails them all he or she nails them all the thing I liked about this one is that he cut he takes somebody up to across the Golden Gate of course and he's got someone he took to Oakland right to the airport and he has a lot of trips that he makes down to SFL which is off the map here but I thought that was pretty cool now there's a 11 million records wanted to collate what are condensed all of these trips onto the map and really see how they're going that'll be for a subsequent follow yeah there's a question how do they do it what software is it I don't know I'm not exactly sure if I knew that I would have done that myself because it's a crowd pleaser that's for this presentation dot 2.0 I'll come up with some interesting stuff about that what I wanted to do was a heat map about the actual like I said use the haversine function and find out where the streets are the most the fastest streets and the slowest streets and then really get into the propeller head and that's in both really going down the the rabbit hole at what times like you know like what times are the streets fast is there a cab that's a speed demon with a lead foot and there's another one that drives really slowly and which one do you want to hail depends on how where you need to get yeah oh oh I'm using I'm not using map reduce in this particular case I'm using like how many people took electronics or basic electronics in school maybe we're programmers and maybe not so much but you know like an edge trigger you know what an edge trigger yeah yeah that's what I'm doing is edge triggering if I'm in a state that I'm there's not currently yeah I've got a flag or like a latch and I say like okay the fair flag went up start when it goes back down I will realize that that last record was both was the one cool and there was one more question oh thank you so much yeah so the question was how do how do I know when this goes when when a ride is started when it has not using an edge triggering function and your question there was another question oh what animations do they use I don't know I wish I knew because I would have been you guys would have been totally impressed more than you are currently okay so let's move on geolocation with this data there's lots of things we can do one of them is defining an area of downtown Manhattan for instance which I've done here as well this is this is a type polygon polygon which is supported by this kind of industry format of GeoJSON it's also supported by MongoDB so I can actually index on polygons like I can define polygons in the database why would I want to do such a thing maybe this is a service area maybe this is a zone a toll zone that as a government entity when I see your cab go into that zone I can bill you automatically just for your presence there maybe I can derive from that zone that my rates go up on the consumer side not the government side and then the neat thing is of course is this could be a floating zone that I'm moving around I can find if a person is calling if it's a service area for instance someone requests a service are they in the zone that I service right there's different variants about this that are really kind of cool is this device being activated or turned on in a place that I wouldn't expect it to be and is that can I attribute that to fraud right this would be geofencing as they say right all sorts of interesting things that I can do with this kind of data but the point is of course that this in itself this zone or the fact that the person appeared at that time these are different views off of this this data stream that we're getting off of the system so it's a pretty important part that we're recording these domain events we're recording these logs event logs so that we can replay them or on either retroactively or on the fly to create an idea of what we're looking at in the world now that's event sourcing event sourcing is closely associated to CQRS some say the guy that coined event sourcing Greg Young Greg Young says that event sourcing relies on something called CQRS Command Query Responsibility Segregation I'd pretty much go along with that myself too the idea here is that the fact that we're generating these different views of the data based on the event log right we can think about this is that when we access the database we have this basic pattern of ingesting the log ingesting the data as a log into the system but then we have these different kinds of views stacked up against them different ways that we look at the data right it's hard to see but this is kind of a couple shades of gray right here each shade means suppose I mean this is a different type of view on the data okay these kinds of this kind of our this has architectural impact that you'll see in just a moment the idea here is that we have variant read models and views these views are fundamentally different than the the the data that's coming in they're built off of the data that's coming into the system but they are aggregates or their interpretations or their averages there's something that we've done with the data to interpret them and that has an implication that because the data going into the system is different than than the data that we're extracting maybe we shouldn't use crud how many people or a crud object create excuse me yeah update delete create read update delete so from the application side when we think of interfacing with the database we have an object that represents the main object and we we perform these crud operations back and forth from the database by this one object we get we use getters to retrieve the data from the database and retrieve it as this object or we can use setters to set the object and then push it back into the database in one way or another but because the data stream is different right we have these domain events coming in on one channel but the way that we're interpreting it or these variant models these objects these pieces of our code should be separate because they perform separate functions they should have they have perform separate responsibilities therefore that there is this idea of command query responsibility segregation and the idea here is that your reads are handled by different portions components of the code than your rights do in this case command means writing writing to the database we're going to mutate the state of the database even if it's only an insertion of a mutable log records we're changing the state of the database and that's interpreted as command we're changing something on the database but we don't expect anything back we're inserting into the database we don't expect data back except for metadata that says yes the insertion was successful on the other hand the query doesn't change the state of the database system doesn't change the state of the data store but we do expect something back we expect something back from our query right so fundamentally different pattern fundamentally different objects the reason that this is interesting this is a cool idea is that on these systems where I may be doing lots of heavy analytics off of the incoming event stream maybe I have a disproportionately or asymmetrically high read load in comparison to my write mode possibly and if I have an asymmetrical read load to my write load I can asymmetrically scale these objects that can have lots of readers as opposed to very few or one-to-one read writer object now you can start to see if these these components of the code are different right and they're presenting the responsible for different views of the data at least the queries there are this it lends itself quite aptly to another buzz term that you probably have been hearing a lot about can anybody guess where this is going if this is a service area and instead of one big reader writer I use lots of small reader writers in my service small service microservice circle gets the square so microservices this is a pattern that lends itself to microservices now an interesting side note about microservices a lot is said about them right and when you ask somebody well how do I define a microservice what are the edge of a microservice what I take right now is a lot of people talk about the idea of these separations of concerns separations of code for maintainability but when you say well how big should my microservice be people scratch their head and says it depends here's a way to define a microservice is what is the view of the data that you want to see out there and one of these little read objects or query objects constitutes a microservice as does the writer and it has a single responsibility ingestion of the log stream ingestion of the domain event right and then the interpretation of that domain event log stream whatever you want is right over here right so now we're getting into microservices so I think that we have at count right now currently four buzz terms going in this talk did really good number one iot is itself buzzy but I've also talked about event sourcing and cqr ask for command query responsibility segregation which you have to say 10 times before you invoke it in your own code and then and then microservices but indeed that's fitting because a lot of this stuff is boiled into the complexity of an iot system which is as complex as you want it to be right so that's where these terms this is the fascinating part to me this is where these terms start to gel and have meaning to one another and relevancy to one another and context to one another the impact that cqr ask has on your system architecturally because ostensibly that's what this talk is about the architecture of a system is that if we have an asymmetrical load of read or write or one pattern over another pattern in our application tier maybe that asymmetrical load can be serviced asymmetrically by our data store so in MongoDB we have the secondaries we have all the rights go into the primary and then we have a system of replication where you replicate out the rights or the data out to the secondary notes which you can send reads to okay that's called eventual consistency now it's not eventual consistency is a tool it's a pattern it's not perfect for everything there's some caveats that you need to be careful about when you're using this kind of a system but in this kind of a pattern if we're regenerating these views on the data eventual consistency might be okay for me to use because I can I may not need to know exactly what the current state in the event stream is I'm regenerating this to take analytics for instance or something like that that doesn't require the most up to date data you get one needs to be careful about that okay so let's talk about how these systems get tricky let me just do a quick time check okay cool it's great these talks about buzz terms they're always about how it's going to be transformative it's going to change everything in the industry that's great I think that anybody who actually tries to implement these things they're obviously good solutions these ideas there's real hay behind them but I find like well I have to think pessimistically I maintain servers I've been on pager duty and the one thing I've learned is to be cautiously optimistic and strenuously pessimistic so where what's the tricky part of IoT in these systems well mishapery like for instance these these are systems this this guy almost got clobbered by the camera drone right these these systems they're out in the environment and as we saw with the cab example they don't always work they could themselves be reliable but we can expect service interruptions of some type so we need mechanisms to identify this and take action and again as we go into the future there's all sorts of problems that we could we could enter we can encounter with these kinds of systems because they're dealing with environments that we've not put sensors before I mean like if you think about embedded medical devices there could be a number of reasons that you lose connectivity to the system okay so how do we deal with these systems this is another emerging idea is a service management system this is actually kind of an interesting thing to me and I think this ties into a lot of the discussion that we see about containers that we see about infrastructure as code this idea that I can code my infrastructure or infrastructure as a service how does that work right I think as these systems become more complex we're going to need systems that handle that complexity complexity automatically for us especially since these systems they're expected to be up and reliable so a service management system is a component of the architecture that's responsible for detecting failures and taking action to resolve those failures or resolve those systems now we've seen examples of this in components of our overall architecture and indeed in MongoDB we've got a the replica set is there as a redundant system if one server goes down you've got this redundant system that they elect a new primary and they carry on right we've seen this in subcomponents in this system there's a service management system and it says like well not only is your architecture the back end it's also the front end these deployed systems these these objects what does it mean if the transformer or the central node that I'm collecting all this data from these service module the smart meters they stop collecting data or something like that can I deal with that problem and maintain service so the idea here is part of this is that you have the service management system has these agents they get their configuration or the understanding of what the world should be from the the federated configuration management database the idea is is that the the database stores information about what types of sensors are out there where they are what the recovery protocol is who to notify if they come down and the agents use that data to update automatically like maybe a ticketing system that says hey we've detected a failure we are using monitoring systems like Nagios and whatnot we enter a trouble ticket for that and then once once we have a resolution of that of that problem we update the trouble ticket automatically and it's resolved the ticket is long and all that kind of fun stuff could be quite sophisticated for these systems but if you consider the amount of these these devices that are distributed out there this could be an interesting part of architecture in the future maybe it's possible to completely automate this based on the the idea and side of a configurated management database I think with all the maturity that's coming with containerized systems it's possible and and orchestration systems so like I said the idea that you're containing you have here is where these devices are this is a data set I imported this is all the free wi-fi spots hot spots in New York so if you need connectivity just give me just tweet at me and I'll send you this map but yeah if one of these fails where are they could where do I send where do I send my service technician to right the other idea is that part of the this service management system is it's necessary to have the right metrics and monitoring on the system so you can keep your system in flight super important right we have all these deployed systems if you don't know what the state of these deployed systems are it's going to start munging up you know you're going to get if we're performing analytics off of the data that we're getting off these sensors the standard deviation of our processing or results or averages is going to go wild because maybe a lot of these devices are failing and it's affecting adversely the accuracy of our metrics we need to know that so that we can adjust accordingly that we don't get into this negative feedback loop that we're losing sensors and it's affecting our analytics that's affecting the way that we deploy and use our sensors right so monitoring just like you monitor your back end infrastructure it's going to be important for monitoring the state of these deployed servers and then of course it's very important that if you're monitoring these systems is that you have an accurate system for alerting right and then letting that's the juror ticket that's letting people know there's a problem going on in this system I love human like non-technological well what's not technological without Batman but I love using analogies the most famous alerting system of all in the case of smart meters again where we've seen a lot of work the one of one of the records that we'd see here would be something along the lines of you know this component has a threshold like we have alerts that we want to configure that we're setting for this specific type of meter what are these thresholds readings does the temperature go out of whack that was what they were really they would see as the leading indicator of an imminent failure of the system was the temperature on these sensors goes really high and it's about to die I thought it would be something different like the voltage would go up or down and the amperage would go up and down now it's the temperature so using this configuration data I can go back and set my monitoring thresholds right so as I'm replaying the sensor log that comes in off of this system I'm comparing it to that threshold and when the threshold is is exceeded I trigger off I go into this alert state and trigger off a trouble ticket and bugzill it through the API doesn't matter of example now how do we how we actually do this this this is involved this is another part of trickiness with event sourcing when we replay the log we we exceed this threshold so it's pretty simple we send a notification to the subscriber service right this that takes an action the SMS takes an action to go start the trouble ticket raise the alarms dive everybody wake up you know all this kind of good stuff one thing that you need to be careful with using event sourcing on these external notifications is that if you are replaying if you've lost application state and you replay the log does your system know to differentiate between the first time you pass that threshold or is replaying the log going to re-trigger the subscriber service because the subscriber service doesn't understand the data model the same way that your your event sourcing system does so it's possible that you can re-trigger these states by replaying the log or regaining your state in fact to use another human analogy there's a very famous example of this how many people are over a certain age that knows know who these people are number one what show this is and what he's saying has anybody seen this yeah I'll let you do it cheeseburger cheeseburger this is from 1977 Dan Aykroyd John Belushi and a very young in the back Bill Murray right and the sketch is is that there's this cafe where all they serve is cheeseburgers and this guy Robert Klein is saying I don't want a cheeseburger so we're early in the morning for cheeseburgers I want I want eggs and John Belushi says look he's having a cheeseburger he's having a cheeseburger cheeseburger cheeseburger cheeseburger cheeseburger every time he says cheeseburger his brother interprets that as an order for another cheeseburger and he starts putting it on the he's replaying the event log and his subscriber service can't make the differentiation right that's the classic case of this kind of renotification cheeseburger cheeseburger cheeseburger and then he finally says cheeseburger cheeseburger cheeseburger Robert Klein says okay fine I'll have a cheeseburger and so John Belushi turns around and says cheeseburger and then his brother says no more cheeseburger it was funny in 77 anyways so what do they need what do they need to solve this system what is the problem what would you do any ideas who didn't put themselves through school by working in a restaurant I guess I'm you know well the answer to this is quite frankly they need a circular buffer right you've seen this right this is quite literally a circular buffer right and the way that it's you put the ticket on the buffer it's a queuing system right and who's responsible for taking the ticket off the chef the cook right it's a consumer driven queuing system it's not a push system it's a pull driven system meaning that the the person the subscriber on the system maintains their own state about whether they were seeing a message or not it's harder you could implement such a thing on a push system but it's much harder coming in on the on the wind I think we're almost out of time so within in this case the way that we do these external updates is that when the notification comes through we're gonna be pushing on to a queuing system like Kafka or something like that and then the subscriber consumes it off of off of the system that's how it affects okay so one last thing about gaining insight on these systems and I'll try and wrap this ooh I'm right on the money okay this is the good part so give me a few seconds I'm gonna run over a little bit this is the Gulfstream first charted in 1770 right how did they do that how did they know that there was the Gulfstream there how did they do this very ingeniously a guy came up with this idea that he would put a card inside of a bottle and throw it into the ocean inside of that card it says if you find this please write down where you found it your latitude and longitude and the date that you found it and if you'd like to tell us what the ocean temperature was and weather conditions were even better and then mail this card back to me so I can aggregate that data this is a deployed sensor system and they figured out where the Gulfstream was who was this guy that thought of this well yuck yuck yuck it's Ben Franklin right and he he actually cut his scientific teeth doing this everybody loved him famous commerce clipped two weeks off of an Atlantic voyage if you figured out where the where the these merchant vessels kind of knew that there was something out there and they were out running the U.S. battleships to get to England and stuff like that so he had to figure out where this Gulfstream was so this is the point of this particular foray into the past is your collecting data now you have to do insights with it right so in those systems we need we need to integrate our data systems with processing how many people have heard of Hadoop okay how many people have heard of spark okay yeah spark is pretty buzzy Hadoop is very buzzy or had had been still kind of is these are related technologies and the idea here is that they're going to be doing processing of this data beyond perhaps what you're going to be processing and just in MongoDB alone MongoDB is really good for grouping and averaging and aggregate functions with the aggregation framework but when it comes to machine learning libraries you don't really want to have to roll your own that's where these systems come in they're there to handle huge data sets and processing but also they come with a lot of machine learning libraries that help you to do some sophisticated stuff I'm going to zip through this but the way that these technologies work is there's this kind of concentric rings of abstraction at the basis of the distributed file system that these systems will perform the processing against the distributed set of nodes and then the the spark and Hadoop layer in the middle of there actually performs the processing of this data in a distributed way across all these nodes this gets particularly sophisticated and complicated to people because this is a whole architecture in itself and a lot of people get frustrated about these systems because they're so complex right in fact you can see that typically the way that people would process the kind of integrate their systems with this back end analysis is that they would have their data services up here where the data is coming from into the application database and they have to ETL it over into these HDFS nodes then to be processed by spark or Hadoop right this looks like crazy right I mean it's a lot of people deploy that way but there's a lot of overhead associated with maintenance of these servers as well as the seemingly trivial but deceptively difficult ETLing system from one server from one system to another system if that ETL system breaks your analytics break and if you're trying to do a really fast closed loop analytics system for real time or nil real time analytics if your ETL breaks you start losing money because you're not able to process this data so where MongoDB comes into such a system is that we replace the the HDFS layer or you can replace the HDFS layer with MongoDB so that you're performing these analytics against the database against the data in the database and you significantly reduce the complexity of your architecture down into something more manageable when this this idea here that our architecture is now kind of morphed is that we have these data services with them itself composed of microservices that are what the views are on that data or what those microservices are to do with the inbound log stream is influenced by the analytics that's occurring on these back end offline systems that you're integrated with this is how you get value off of the data right this analytics and systems of that so I'm over time it's just one last thing I wanted to say the connector is available on GitHub I'll put a link in there later on well here's the link actually it's in my slides there you go and you guys can play with that as well if you like I'll I have some examples up there I want to cook up some more that you guys can run off of the taxi cab stuff and play with one last thing I wanted to do is who can tell me what this person is this is Emmy Hennings she is a inscrutable data as artists from World War One she bears no relevance to this talk I just thought I'd throw in another stumper there thank you very much for your time enjoy the rest of your weekend if you guys want to have questions I'm perfectly happy to answer questions I'm five minutes over I don't know if that means that I've run into higher rates for this room if anybody says who is the speaker I'll point to somebody else and get out the back door thank you for your time it's much appreciated