 So I'm presenting above the clouds and investigating an investigation of systems from 32,000 feet. I Decided to undertake this investigation while I was on a plane ride. So the title seems appropriate. I was deploying a Cloud-based system over that crappy airplane Wi-Fi So I'm going to be talking about distributed systems and visualization This is me. This is going to be me through the whole presentation. I Work for basho. I am a software engineer. I Work mostly in Ruby, airline and JavaScript so distributed systems so With basho so we've got this this Database called react as our primary product and it's essentially distributed key value store So distributed systems have a different set of concerns than centralized applications and here I'm talking about things that are cloudy That where you don't know Where a piece of data exactly lives? Maybe you don't know exact exactly where a computation is taking place. It's just somewhere on a distributed network My motivations for this investigation Covered three of them the fascination the opportunity and the problem So the fascination for me the fascination with visually studying distributed systems Comes from a moment that I had when I was considering the solar system And I am gonna have to pace for this. I'll just hold the microphone. So this is Some of you might remember this this is a screen grab from the Intro to Star Trek the next generation where they swoop by the planets. It's really cool So demonstration This is the Sun, okay 10,000 okay, some of your basement coders. So I should tell you this isn't actually the real sun, but let's pretend Close close so this is let's pretend that this is a scale model of the Sun so at six six inches in diameter Where's earth? It's up in Kobe's row There's an empty seat with a glass Okay, so if this is the Sun earth is back there How big is earth? Next to the glass is an empty plastic bag an almost empty plastic bag in there is a very very small piece of blue plastic Yeah, so you right now all of you have have particles of dust on you bigger than the earth to scale to the Sun here So I encourage you, you know during the question period to go over there and look at the earth and I'll leave this up here look at the Sun and think about the scale and the vast distance between the two and Just how complicated that system is that tenuous grasp of gravity threat of gravity keeping that Dust particle of earth spinning in the right place. I mean really that the planning that must have gone into making the solar system is amazing Okay, it was an agile job. They did it in one week iteration But the the point is That visually seeing this kind of scale Gives you or gave me anyway a complete a fundamentally different Concept of the solar system and our place in it then reading the numbers so compare the mental activity that you undertake imagining that this is the Sun and that that particle of dust over there is the earth Compared to the actual numbers Right, it's the same data. It's the same information There's the scale But visually seeing it this distributed the system distributed around the Sun. It's called a solar system Visually seeing it viscerally experiencing it is much different There's some other kind of knowledge that you get from seeing the system visually Okay, so that's my fascination with with this investigation second one is the opportunity So a couple of years ago. I had a Rails app. I was managing a Rails app. It was a typical Rails setup, so we had Apache Four Mongols behind that and there's a Postgres database It's not a Ruby conference unless somebody talks about Zed so normally says where I would thank him for writing mongrel, but he was mentioned yesterday, so I'll skip that slide so we We would hit Traffic spikes that would slow the application down common problem. This was a typical Rails app from a couple years ago So the hosting service that we were on we had six servers one for Apache for for Mongols and one for Postgres and They looked at our Usage and our log files and they said well you need more Mongols and It's a really good hosting company. I don't think there's anything wrong with this assessment. I think it was the right conclusion to come to However, they make money off of Hosting instances by basically on on RAM the amount of RAM that you use in per instance Mongrels take a lot of RAM, so it was going to be really expensive to double the number of Mongols that we had so before we did that before we put that money down I decided it would be useful to Well to take out a different kind of expense I guess my time and Investigate the system a little bit more so to do that I Kind of do have an operational research and I wanted to simulate this system so an operator in Q research or anything like that you can look at all of any of the parts here as basically having two things a queue coming in and that happens on some sort of rate and Then some service that the that the node does and that takes a certain amount of time So patchy has requests come in at some particular rate and it performs some action Then sends the requests on a mongrel same with an individual mongrel It has stuff come in request come in from a patchy at some given rate Takes a certain amount of time to service the request and then does something else with it So of course there's a lot of tools out there to commercially Simulate a system like this. I found most of them cumbersome and I'm a Ruby developer, so I wrote my own It's called DSQ stands for darn simple queuing I'm not really going to talk about DSQ too much, so I'm just going to show you that like it is indeed simple So you require DSQ. It's a DSL within Ruby In this case you can create a new simulation It only has one node so it has an arrival rate. That's distributed along some exponential curve As service rate so the time it takes a server request is distributed along some exponential curve And then it will run that for a default of 60 units of time and print a report a report might look something like this Again numbers So you can see that initially there was nothing in the queue When I stopped it, there was four things in the queue about a thousand things went through it And there's other numbers so you can use this to get an idea of well, I used it to solve this problem But you can use this to simulate an event or things going through a system possibly a distributed system But it's not it's hard to visualize it. There are some products on the market that will 3d render this kind of thing I don't find them particularly useful because I think that the 3d rendering tends to be cartoony So it's hard to it's hard to visualize this in the end what it turned out was the traffic pattern the variability in Timing between incoming requests was actually causing postgres to cough So adding more Mongols would not have helped and Looking at the problem again looking at the data peripherally There's there's really no way to know this you had to either do it and find out that you were wrong and didn't solve The problem or you had to simulate it and in that case in this case We did we simulated it and we found that just increasing the the CPU and ram for postgres a much less expensive endeavor Enabled the application to handle the same kinds of traffic spikes Okay, so the opportunity well money is a big one You know, we've got things like open flow coming up and and Cloud technologies in general, but particularly when you look at things like Network where you can ad hoc increase the capacity for some part of your application or service and soon ad hoc Throttle the bandwidth for a different service being able to Visualize this and fundamentally understand this If you're able to do that if you have tools to do that there's going to be a lot of opportunities Saving your time your frustration all of which ultimately translates into this so my third motivation for this investigation is the the problem is a challenging one so simulating systems is hard I wrote DSQ because I thought I could make it easier, but it's still difficult to collect the data and Set up a correct model and run them and understand what you what you've ran So if simulating systems is hard than simulating distributed systems is probably distributedly hard You know show show me what the system is doing show what everything If you've ever had an application where you want it to log stuff and so you start to put inserting I won't say you would do something so crass is just put raise Statements throughout your code, but say you start putting log statements throughout your code That's I think that's like Cash expiration right it's a fundamentally difficult problem in computer science to figure out how to Go into any given point cleanly observe and then manage your observation points Airline actually has tracing which does this very well, but I don't think that would translate to ruby So the distributed system just to take one that I'm familiar with for example I'm going to talk about react. So again it react is has kind of that cloudy feel to it for those of you who aren't familiar you could think of nodes in a react cluster as Peers will say they're people so if these people are in a cluster I can go to him and say what's the color of his shirt and whole ask and give me the answer If I say what's the color of your shirt? He'll tell me the answer quicker, right? But they all talk to each other none of them are special you're all special But none of them are more important than others. There's there's a no master slave configuration So the system scales horizontally really well has a lot of other Nice properties that fall out of that, but how do you study it? Right, that's a problem. How do we visualize this? So just as a very simple ish? Distributed system. Let's study it. So on this laptop. I have seven Nodes running in a react cluster just to prove it. Here's bash. So there's my The directory I'm in I'm in Cascadia Ruby dev Dev one through seven all seven of those are individual nodes running their own airline VMs and Then for all seven of those I just run a ping and I get seven pangs back I'm gonna show you With react we like to break stuff. So we do live demos all the time, but Last time I did a live demo. They said it was a little too exciting So I'm just gonna tone it down a little bit and use screenshots and video But this all of this is running on my laptop. So if anybody wants to see me break stuff Better later by all means bother me. So here's a view of the cluster So it just says that all seven of those players are in the in the cluster. So how do we study this? First way to do it probably easiest way is to benchmark it. So we have this tool called basho bench Which is really cool Basically what it does is it spawns a number of workers and has them go hit stuff so I Won't give you any any I don't want to give you any ideas of what nefarious things could be done with this But airline is a very good at opening up lots of processes simultaneously not operating system level processes processes airline processes that live in the airline VM So I could I could open a hundred thousand processes on this laptop not a big deal each one of those has isolated memory isolated state so With basho bench we can open up a bunch of workers in their own process that just go hit something in this case Hit react insert a piece of data get a piece of data. We have drivers for Cassandra mango redis all sorts of other stuff So it's a great way to threshold test a system just pound the crap out of it with traffic and see where it stalls Or if you can break it Here's an example configuration file not too much going on here The first line I'm putting in mode max. So I'm just saying, you know open as many connections With with seven concurrent workers open connections as fast as you can as soon as you get a response Go ahead and send another request Okay So basho bench running on this laptop produce this graph You don't really need to I won't go through the whole thing the top one the top line is throughput So every few seconds it put a dot on the on the graph and This ran for two hours So right there About a half an hour in I killed one of the the nodes in the cluster just kill nine just Took it out and so traffic the throughput came down and then it found a new steady state So that's pretty interesting. So this is an interesting way of Visualizing the performance of this distributed system kind of sort of here's another example Taken from Isis that we put it in nice graph form. So This is the same exact test run you can see something interesting happens there. That's when I kill the note And this is hard to Skyo hard to Skyo spikes So that gives us some kind of visualization of how this distributed system performs But it's not particularly great. First of all these this runs after the fact, right? So in a benchmark situation you get the the visual the report after the fact. So it wasn't real-time But it's also benchmarking Don't run benchmarks on a production system, which brings us to the other options monitoring If people from New Relic here, right? Yeah, New Relic is awesome. They will pry that out of my cold hint. No So New Relic is great. And it gives you a View into the system as it's running Very very useful I'm gonna Mention another program that does something similar graphite Gives you kind of this real-time view of the system performance I'm not going to compare the two products at all, but just they're out there So you can you can get this introspection into a running system? promise these these aren't really The problem for me is that these aren't really geared particularly Towards distributed systems these give you a good insight in the case of New Relic excellent insight into Running instance of rails if it were around when I had the previous problem. It would have been great to Look at the performance of the database and the different components of the application in real-time But I want a better feel a visceral feel of the distributed system. So there's this thing called GL tail And what this essentially does is it looks at a tail file There are tails a log file rather and it looks for specific things and it represents those things say requests to a website as spheres So I apologize for the darkness up here, but I have no control over the background So here you can see this example is requests coming into Some other website. It wasn't mine. This is not hitting react, but so you can see I think the bigger Circles are bigger file downloads So a bunch of stuff is happening and it's very pretty I'm not really sure what's going on there, but maybe a cue is backing up and then it gets flushed So I like that it gives you a sense of things changing through time And if you get really good at understanding this and comparing it to what's actually happening on your service might give you an Idea of something which brings me to ubi graph Quick poll hands anybody heard of ubi graph Okay, I didn't think so I thought So two weeks ago a gentleman named Creston who's big he's big name in the airline world posted a video using ubi graph And I thought oh my god. How have I not heard of this before? So essentially what ubi graph does is something similar, but it 3d render renders spheres in a relationship What Creston did was he hooked that up to Looking at the processes running in the airline VM. So again before I said it's trivial for an airline to start start a lot of processes so it's does this in in supervisor trees hierarchies more or less and If you if you want to perform some work, you just spawn a process It does it in its own little memory state when the work's done. You just kill it Becomes very it becomes trivial to manage memory that way you can if You want to be overly academic about it. You could think of the processes. It's almost being little Objects stateless objects, but little objects that talk to each other through message passing and then some you know When you know when you don't need an object anymore. You just remove it from memory so what I'm gonna show you is of a video again that I did here where I start ubi graph, which is just a black screen and Hopefully it'll show up reasonably well And then I start this process running in airline where it looks at all of the other processes and the relationship to each other and Constructs what react looks like so this is ubi graph and There is react so I'm gonna turn labels off here, so that's easier to see you can add those labels are actually all the different process names Right. This is the process tree It's really hard to see but You'll be able to follow long enough, so this is the process tree and The those green spheres towards the front here those are processes that connect to external ports so in React we can use JavaScript to run map reduce functions. So these connects to JavaScript VMs and Again, it's it's It's just another way of visualizing something happening zoom in here, so that's a little bit easier to see hopefully and So what happens is when I can actually submit a request to react and you can actually see web machine, which is an airline process that handles the the request coming in will create new Processes to handle that request as it comes in saves the data somewhere and then just kills the request so here in a In a second it's gonna start heartbeat creating processes so even on a running system even on Processes coming into existence and being removed Because it's happening too fast So somewhere in the center you can see it's kind of twitching Those are requests coming in saving the disk and then dying the process is dying So that's it's mildly interesting But that's just one request coming in so what does react look like under pressure? So here's the same thing React comes into existence Let there be light There is a way to dim the slide back here this software ubi-graph unfortunately, it's alpha and I have no way of changing the background color Yeah, that's that's definitely worse so doesn't it's not Too important But you can see there's stuff twitching again in the lower Right-hand corner and so now basho bench is running just beating the crap out of react and So processes are just really quick being created and dying down there Again, it's kind of cool, but it's not it's not too interesting. It's just it doesn't tell you anything But I like the tool So I thought well How do rubies a great? Glue language, how do I use ruby to do something cool with ubi-graph? So there's seven nodes seven circles Those nodes represent the nodes of my cluster Turns out it's super easy to write stuff to to create your own 3d model in ubi-graph So that's what I did. I did it in a rake task Here's the first part. It's it's kind of you don't have to I can't see the laser pointer either you don't have to Read through this or grok this but just know you know the first line of code XML or RPC not our favorite, but that's what it uses who cares it creates a client and Then seven times it calls to create a new vertex really short really small piece of Ugly code, but It does what it needs to but that just creates the cluster. So now how do we see that under load? so same thing I Create the cluster Right, I'll zoom in a little bit Too close. Okay, so they're all gray. I apologize to anybody who's colorblind, but and Then what I do is I start throwing traffic at it and the brighter aqua they are the faster they're serving requests Right so you can kind of see them them blinking on and off in and out here It's under high traffic. So everything is kind of evenly balanced that if I modulated the traffic might be easier to see Which nodes are performing better than others, which would be a sign of like sickness right node sickness our computers get sick and Just looking at it. I can tell other things if I had a If I had this up on on a monitor You know over my desk and I saw something like that happen. I would know that one of the nodes just died So what I did was I again I just went back and kill nine to one of the nodes so visually this gives me a better experience of Fundamentally understanding in this case a very small distributed system and I think that's that's pretty useful. I restarted the react node and Just to prove that I actually did it. There's some ruby code showing stuff It basically goes into a loop and for a certain amount of time it just hits the server and changes the colors of those nodes So next steps bigger systems, right bigger distributed systems Hooks into specific services different data services different web applications services zero and Q fun because that is fun There was a point here Yeah, so the point is so computology So this is the practice of exploring and constructing inductive theories from from observations that you make in the field and Again the the point for me is that these visual systems Visualization is a fundamentally different way of understanding what you're working with So if we can move towards better ways of visualizing the systems that we're working with I think we're gonna understand our distributed systems and See problems in a much more coherent way so I will continue to be exploring the visceral models of system simulation Acknowledgements. I thank everybody who made all the software that I relied on and chain bent for Arranging this and this is still me Do I have any time for questions? No time for questions Thanks guys