 Okay. Our last talk for today. Lindsay talking about envisaging your system. Hi there. I'm Lindsay Homewood. I work as a senior engineer at Bulletproof Networks in Sydney. So that basically means that I write code and administer systems. Today I'm going to be talking about Visage, which is a little project that I've been hacking on for the last 18 months or so. So I'm going to present Visage in the context of a real world scenario where you might find a use for it. So imagine that you've got a Linux cluster. You've got your standard application servers, caching servers, database servers, maybe some upstream reverse proxies like just before. And all of a sudden you're finding a whole bunch of slow HTTP requests. And you're noticing that mostly through a perceived load time increase. But you don't really have any hard data to be able to back up that. And you're also noticing that occasionally you've got full timeouts where it's not reproducible. Occasionally you'll go to a page and it'll work and then the next time you go it'll timeout. It's really difficult to pin down. And the problem here is you've got identical configuration across all the machines. So using configuration management, maybe something like Chef or Puppet. So you know that everything should be okay. So how do you actually go about solving this? Well one easy way to do it is just using a scientific method where you have a hypothesis and you design an experiment, collect some data, run all that through and to see if that works and you have an answer at the end whether your hypothesis was correct. So this talk is really about data, how we collect it and how we visualize it. So it's really about metric collection right across lots of different systems. You know we need graphs and charts to be able to back up whatever our assertions are, whatever our hypotheses are. So what sort of tools can you use for doing that? Well the typical ones that pretty much everyone is familiar with is stuff like Sysstat, Dstat and you know SAR and then you've got PS and TOP and all that sort of thing. But the real problem that we find with tools like that is it's really difficult to do correlation because each of those tools outputs the data in a slightly different format and it's very difficult to sort of take that. Maybe Dstat is a bit of an obstinate, it doesn't quite apply to that. But on the whole you've got all these disparate tools, different formats of the data. You want to find a way that you can take all that data, mush it all in together and then be able to visualize what's actually happening. So for that I use a tool called CollectD. CollectD is a lightweight statistic collection daemon with an emphasis on collection. But you can really think of it as a platform for collecting and storing time series data. So it's plugin-based, the default, the standard distribution of it has close to 100 plugins now and that covers everything from really, really low level memory usage, CPU usage, all the way up to what your Mk server is doing, how many patchy requests, what your MySQL queries are like. There's even a TeamSpeak plugin for the gamers and the crowd. One of those plugins as well is a network plugin. What that basically allows you to do is it makes CollectD network aware. So you have lightweight collectors on your front-end machines that barely impact the performance and they take all that data and instead of writing it out to disk and having all those expensive IO operations going on behind the scenes, they simply take the data and just forward it over the network via UDP. The last section here is well-defined APIs. So the network protocol itself is very, very well-defined. Fantastic APIs for writing plugins, doing your own thing. And the documentation as well is actually absolutely fantastic. But inevitable question. Okay, so we've got all of the statistic collection sorted, but how do we actually graph any of that? How do we make use of that information? So CollectD uses ROD tools as primary data storage behind the scenes. Most people here, I'm assuming, are vaguely familiar with ROD tool or maybe it's predecessor or MRTG. And most of you will probably have this exact reaction when you see it. It's pretty horrible, but it's what we have. So if you want to talk about replacing ROD, you can come and talk to me after this talk. But the real problem that I have with ROD tool is the visualization of that data. There's all this interesting data that's sort of locked away in here. You know what's happening at all these different points? What is the exact time in that particular thing happened? How do I merge different data sets onto the same graph? You know, there's no real easy way to do that. So I had this idea that maybe we could use some modern technologies to be able to do that. So I wrote this thing called Visage. And Visage is what basically it takes the ROD data and exposes it as a JSON that you can just hit in any browser. And it takes that JSON and it uses JavaScript and SVG behind the scenes to render that into graphs that you can see on the screen. So it's just like any other DOM element and you can interact with it. It means you can do all sorts of interesting events and whatnot around that. And the real difference that Visage has compared to all the other collective graphing frontends out there is that it's profile based. So it's based around the idea that there is a set of graphs or a page of graphs that I might want to go to occasionally to be able to just get a quick visual health check about what my systems are actually doing. Or it's also got a builder as well behind that for constructing complex pages of graphs. And it makes it really, really easy to do that. Getting up and running is very simple. You just use AppGet to install a bunch of dependencies or YAM or whatever your system is. It's distributed as a Ruby gem. So you just go gem install visage app. And then you run visage with visage app start. So you can all see that. Oh, no. I'm going to make it smaller. So I just started it up. It's running on this particular address over here. If I go to that. Okay, fantastic. So this is the default view out of the box. So I'm going to go and create a profile here. So you can see a list of hosts that I've got that I'm collecting data for here. So in this particular case, I'm just going to go Ubuntu star. So it uses shell glowing behind the scenes. And I may say want CPU statistics. So you can see there we've got the Ubuntu host and these CPU metrics down here. And in this particular thing, I'm going to call CPU on Ubuntu. Create. Done. Fantastic. And it's all the same data that you'd normally expect in one of those ROD graphs. But it's all rendered dynamically in the browser. So you can put your mouse cursor over any of these data points. You can see exactly what the data was at that point. You can see the exact time that that was as well. There will be time zone support added relatively soon. But the awesome thing about it is because this is an aesthetic image. So we can interact with it and turn lines on and off and that sort of thing. So the one problem that I have with a lot of these graphs is that the idle is quite high all the time because ideally the system isn't really doing all that much. So I might just want to get rid of that. So I get rid of that. And I want to get rid of this weight as well as a bit of a spike. And you notice down the side here that it's redrawn the axis based on the data that's in view. So it's a really easy way of sort of drilling down. You can do a lot more complex drill downs as well. So I say want to just drag over that particular area. So I've just investigated that. I want to know what's happening at this particular point here. And I might just want to zoom out because that data isn't interesting anymore and I can do the same again. One of the other recent features that I've added is a live feed. So if I reset the zoom, what that's actually doing is it's pulling visage back in the background every 10 seconds. And it's retrieving the latest data. So you probably saw this then that it updated. And I'll do it again in about 10 seconds. There you go. So it's sort of a nice easy way of building up a dashboard of graphs that you can sort of put up on a TV in your office or something like that. And there's also more complex date selections as well. So I just want to see the last day or two weeks or something like that. And you'll notice here that there's a bunch of gaps in the data. And that's actually because this is running in a virtual machine. The virtual machine is turned off so no data being collected. Sorry about that. Yeah, I'm with the show. So how do we actually instrument our own statistics? So CollegD has a bunch of plugins for collecting all these different things. But sometimes we want to get data from our own applications into CollegD so we can view it in line with other things that are happening on the system. So a particular plugin that I'm going to focus on for this part of the talk is CurlJSON. And what CurlJSON does is it basically uses Curl behind the scenes, gets data from a URL and treats it as JSON and extracts different bits out of that. So in this particular context I'm going to look at a Ruby on Rails application and I'm just going to build a little Sinatra and run RackMidware and I'm just going to slot that in. So it basically exposes a URL that you can poke out to get these stats out. So if you look at it just here, very, very simple. You know, only less than 20 lines of code. What we're doing here on this line is we're saying that we, if anyone tries to hit metrics slash metrics slash GC, it'll build up this hash here of the low-level garbage collector statistics. And this is like a really low-level programming language thing, but you can easily instrument things like, you know, if you're running like an e-commerce store, you might want to look at registrations and sign-ups or whatever you want really. So it's building up this collection of statistics and then it's calling to JSON on that. So if I look at that, there we go. So I've just got it running in the background there, very simple. So nothing particularly amazing. The cool part though is then when we plug that configuration, the cool JSON configuration into collectd. So what I'm going to do here is I'm going to go, it was 176. Okay. And if I restart collectd, so I'm going to go back to the builder here and I'm going to go Ubuntu star and then I'm going to go curl JSON GC. Okay. So you can see all the statistics that we put in that configuration file before. I'm going to go Ruby GC on Ubuntu to local domain. Done. Okay. So I've had this running in the background a bit, but there you can see it's been just, this has been hitting that URL, getting the statistics out of it and it's all stored in RAD so we can graph it like any of the other statistics. We can do other things as well like combining different statistics onto the same graph. Okay. That ends my talk. Time, quick time for questions.