 And I'm from Sydney, Australia. I flew all the way here for Fosdam. And today I'm going to be talking about making monitoring delicious again. So obviously this talk is going to be about monitoring right. But first things first, we need to get some terminology out of the way so we're all on the same page. So we have the concept of a check. And a check's purpose is to perform some sort of verification or validation that something is working the way that you expect it to. Developers also know these things as unit tests. And this is an example check. It's very simple. We're just pinging four times. And generally what happens at the end of that is it will return good or bad or ugly whether what you were testing was within the parameters that you were expecting. And a monitoring system is constantly monitoring for failing checks. So basically it's running through this gigantic list of things that you want to check. And it's going to notify if something is amiss. Something is not the way that you expected it to be. So monitoring systems then are essentially asking three questions. They're asking, what is the next check that I need to perform? Was the check OK after I executed it? And who do we need to notify? Or do we need to notify anybody at all? So we take these three questions. And they actually map into these three distinct phases. The fetch, the test, and the notify phase. So if we represent that in a diagram, it's basically this gigantic circle that's going around and around and around. The fetching, the testing, and the notifying. And within those phases, there are actually some sub-phases in the fetching phase. We're doing some sort of lookup, maybe from a database or from a flat file or wherever. Then in the testing phase, you've got the execution of the check and then verifying the result. And then in the notification phase, you're deciding whether you need to notify anybody. And if you do need to notify, then we need to call out to some other system to do that, whether that be via SNPP or XNPP or whatever the protocol is. And traditionally, monitoring systems have done this within a single process. So are some microphones still going? Great. So traditionally, monitoring systems have done this within a single process. And it's been traded quite monolithically. You might be using threads or whatnot within that single process. But generally, this is all happening on the same machine. And if you look at other things like clustered Nagios and whatnot, generally, they're just replicating this across a bunch of different machines. But all these different processes are just happening in one place. And the thing that you realize about monitoring when you look at it in these terms is that it's actually what's called an embarrassingly parallel problem. And that's one for which little or no effort is required to separate the problem into a number of parallel tasks. And this is the case when there are no dependencies between the things that are actually happening within the system. So if we recognize that it's an embarrassingly parallel task, you can start thinking about the common data that needs to be sent between all these different components. So in this particular case, in the fetch and the test and the notify phase, we're sending around an ID of a particular check and the command that we need to execute. So that's being sent here between the fetch and the test phase. And then on the notify phase, we're sending the same ID. And the result that we got after executing that test. So we can actually collapse these into single phases themselves. Like, you can't perform a test without having a fetch, right? And in the same way, you can't actually perform a notify without fetching some data or some description. So the cycle itself can actually be broken out into two distinct cycles. We've got the testing cycle and the notifying cycle. And then you have some sort of transport mechanism in between to send the data backwards and forwards. And once we've done that, we can actually start making some other assumptions, like pre-compiling the checks that the testing phase needs to do. So we can make that a very computational, inexpensive operation. It doesn't cost a lot to actually look up the checks that we need to perform. We can do other fancy things like making the transporters the scheduler. So the test phase doesn't actually care about when things need to be executed. They just know that they need to execute something now. And the transport is actually doing all that scheduling stuff for us. The other thing that we can do is we can remove the data collection from the monitoring setup entirely. We can use other tools like Ganglia or CollectD to do that for us. And we can just focus on doing the monitoring itself, the actual notification. So we've got these distinct cycles here. And the data going backwards and forwards. And this is where Flapjack comes in. Flapjack is a tool that I've been writing for the last year or so. And it follows exactly the same principle. You have the workers, which are doing the testing phase and the notifier, which is doing the notifying phase. And you have BeanstalkD, which is in the middle, that is doing the communication between both the different bits. And then for the pre-compilation that I was talking about a second ago, we have a populator, which is just getting some data out of a database or however you want to represent your checks and injecting it onto the Beanstalk. So a worker just needs to go, OK, give me the next check, and the Beanstalk makes it available to it. The nice thing about that is then we can start parallelizing the number of workers that are actually executing those checks. It doesn't just have to be a single worker. You can spin off as many workers as you want to deal with whatever workload you have. So if we look at Flapjack, Flapjack is written in Ruby. It aims to be distributed, scalable, and it talks the Nardios plug-in format because there isn't a lot of point in reinventing the wheel. It aims to be easy to install, easy to configure, easy to maintain, and easy to scale. And it should be just as easy to scale your Flapjack instance from one machine to many machines to execute the checks across many machines. So instead of just keeping it on like a single machine and running it, you can distribute the execution of that across as many machines as you want. So now that we've split up the monitoring lifecycle, we want to look at the individual components that Flapjack uses to achieve this goal. Before that, we actually need to look at Beanstalk, which is the messaging transport system that makes all this possible. So Beanstalk D is a simple fast-working service that lets you run time-consuming tasks asynchronously. It's written in C. It's based on the memcache protocol, so it's very, very lightweight. You install it on your operating system using your distribution's package manager, and you start up a daemon here. Generally, your distribution will provide an in-it script for doing that for you. So within Beanstalk, it's just like a lot of other messaging systems where you have this whole idea of producers and consumers. So a producer, if we look at the first three lines here, is just connecting into this Beanstalk, and it's putting some information on the Beanstalk. And then the consumer here is connecting into the same Beanstalk, and it's just looping forever. And what it's doing down here on this Beanstalk reserve method here is it's just blocking until a job is made available to it. Then once it's got the job, it will just put out the job body, and then it deletes the job off the queue once it's done. And this is essentially the way that the Flapjack itself works. The workers and the notifiers are consumers, and the admin populators are the producers. And Beanstalk D has a couple of useful features that make this whole thing really easy to do. So by default, when you connect into a Beanstalk D, it just connects to a named queue called default. But Beanstalk has the concept of tubes, which are basically named queues, right? So in this particular case, we have a checks tube and a results tube. And so that means that we can put the workloads on the individual tubes, and they don't ever have to touch one another. So the workers are just connecting into the checks tube, and the notifiers are connecting into the results tube. The other nice thing that the Ruby bindings for Beanstalk D provide are a YAML, an easy way to serialize and deserialize actual Ruby objects when you put them onto the tube. So that means that you can deal with Ruby objects at either side of the message queue, and everything is nice. So if we look at these components again, we've got the Flapjack worker, and I like to describe the worker using sort of like this little story of the eternally forgetful shopper. So this is the shopper, right? And he goes into the shop, and he wants to buy something, and he's looking around, and he finds the thing that he wants, and he goes to the checkout and pays for it, and going back to his car, and he's thinking, oh crap, I forgot something, I have to go back into the store. So he goes back into the store, and searches for the next thing, and finds it, and checks out, and blah blah blah, and does it again, and again, and again, and again. So this is the way that the Flapjack workers themselves work. So the worker is basically in this gigantic loop that's saying, give you the next check that I need to do something with, then it will execute that check and capture the output, take the return code from that, and store it, and then it takes the output of all of this, and it puts it onto the results queue as a result, and it sets the check ID here, puts the output on there, and also puts the return value. But the fancy thing that it does is then it takes the same check, and it recreates it on the tube, sorry, on the beanstalk, but at the very, very end of it, and it sets a delay on it, and beanstalk D won't make that, but we'll make that check available to other workers until that timeout has happened. So for instance, the frequencies here might be set to 30 seconds, so the beanstalk won't make that job available for 30 seconds. And then what it does is it deletes the check off the queue and just goes and it does the next thing. So the worker is very, very simple. It just starts up, attaches to the console by default, and you can pass it a bunch of options. Generally, you're using it with the worker manager, though. So by default, when you run the worker manager here on the first line, that will start up five workers, and you run it with the workers option pass to it, and that will start up another 10, so that means you have 15 running, and then you run stop, and that will stop all the workers that are currently running on the system. The nice thing about this approach is that you can do near linear scaling, so it means that the more checks that you have in your system, the more workers you spin off, and Flatjack copes with that extra load quite well. It also lends quite well to failover scenarios where you have part of your worker cluster go down, and you just wanna be able to get back up on running. So say you have some sort of maintenance window that you need to have, where you need to take down half of your cluster, but you want your monitoring system to be keep on running. So you spin up a whole bunch of new workers, you take down the part of the cluster that you don't care about, or sorry, that you do care about, that you wanna do your maintenance on. Do whatever work you need to do, then bring them back up, and everything is fine, and the monitoring system keeps ticking over like there aren't any problems, like everything is completely normal. So the next part of the system, and probably the coolest part is the notifier itself. So notifier works just like the workers in that it starts up, attaches to the console. There are a few more options that you can pass to it for configuration and whatnot. So, and you also have the manager as well, and that's generally the way that you're starting it. But for debugging, starting it interactively and seeing it works quite well. So we have this recipients configuration file here, which eventually will probably be moved out into a database, but it's very, very simple, it's just any file. You specify a bunch of stuff here, and all of this information is made available to the notifiers when they decide that they need to notify. Then we have the notify configuration, which sets up all sort of deep dark mystic stuff inside Flapjack, but I'll talk about that all these different sections here in a bit, in a second. So probably the coolest thing about Flapjack is the APIs, and I truly believe that all parts of the monitoring life cycle should have as many hooks in it as possible so that you can customize Flapjack to make it as easy as possible to make it fit your environment, basically. So there are three APIs that Flapjack exposes that make it really easy to customize. We have the Notifiers API, the Filters API, and the Persistence API. So the Notifiers API is very, very simple. You just create a Ruby object, and in the constructor, you get past a list of options that you can do what with as you please, and then you implement a Notify method, and when the Notify method is called, it will be passed a who, so the who, the person that we need to notify, and the result that we need to notify about. So this lends to some really interesting things, like say a mock NRIPE instance, where you could use Flapjack to execute, do all the execution of your checks, like with your existing Nagios monitoring system, but it doesn't actually do any of the notification, it just feeds the information back to Nagios, so you can use Nagios at the same time as using Flapjack, and they run in parallel. The next thing is an Elastic Notifier. RIP now, down here, he wrote a fantastic tool called MCollective, and what that allows you to do is do large scale system orchestration. So in simple terms, what you could do with an Elastic Notifier is say Flapjack is telling you, Flapjack is telling itself that it's not able to keep up with a number of checks in the system, because you've loaded in a whole bit of extra checks. So an Elastic Notifier would then send out stuff to machines that are ready to run Flapjack Worker and say, okay, you should spin up and create a whole bunch of workers. They will deal with the extra load, and the system basically sort of self heals and looks after itself and codes for the load, and it also works in the other direction as well, where you have too many machines running the workers, and say you're running this on EC2 or something like that, and you don't want to be paying for all these extra machines. The Elastic Notifier could do the opposite where it goes, okay, shut down all these machines until we've reached the optimal load for the system. The next API is the Persistence API, and there's a whole bunch of methods here, and if you look through the documentation, there's a lot of information about how to build different Persistence APIs. Everything is very well tested as well, so the tests are a fantastic source for working out how to write your own Persistence APIs. Right now, there are two Persistence backends that are provided with Flapjack. There's a SQLite and a CouchDB. They also have a MySQL one in the works as well. The Persistence API gives you a whole bunch of advantages, such as subclassing. So let's just say hypothetically, you have a MySQL backend, and you're using that on your Flapjack instance in your business, and you find that there are particular workloads that you need to optimize for to make it run faster. So if we take this MySQL backend, and we subclass it, and we call it a MySQL with Memcache backend, and we say take the getCheck method, and what we do is we make a call out to Memcache first to see whether we can get a copy of the check from Memcache, which is obviously gonna be faster than hitting the database right. So if we don't get something back from Memcache, then we just call the original method, which is the original getCheck method on the MySQL class, and that will do the lookup in the database and get that, and then we store that in the Memcache, so the next time somebody needs to get that particular check, they can just get it out of the Memcache. The other nice thing about the persistence APIs is it represents all the information in the system just using standard Ruby objects, just the hatches and arrays and that sort of thing, which lets you do a lot of nifty things like migration. So if we have, say, some testing here, you wanna say, okay, I'm using the SQLite persistence backend, and then I run the standard set of persistence tests, and then I migrate to the CouchDB backend here, and then I run the same test again, then the results should be the same. This is a great way to verify that if you migrate your monitoring system from one configuration backend to another, that everything works in the same way that it was working previously. You can also do other things like benchmarking. You can build different loads in the system that go, okay, well, let's say I have 30% of my checks that are failing all the time, and then I have 20% that are sort of warning, and then the other 50% are working all the time, and we run all these different benchmarks across all the different backends and different configuration options, and you can see for your environment what different backends are going to work best for you. And finally, web interface as well. The persistence API makes it very easy to build just a single web interface that doesn't care about how you're storing data in the backend. It's just talking over this API, so it means you write the web interface once, and then you never have to customize it for each backend that you're dealing with. And the final set of APIs in the Notifier are the filter APIs, and these are probably the coolest feature of Flapjack. So Flapjack takes the approach that we should always be notifying unless there's something that's blocking us from notifying. So we have this filters chain here, and what this particular method does is it's going through all the filters and it passes in the result, and if any of those filters block, then we don't notify. So let's just take an example filter here. We have an OK filter, and what the OK filter does is it says, okay, if the result is not warning or is not critical, then we do need to notify. And then you can couple that very easily with other things like any parent's failed. So in a monitoring system, right, you're going to have hierarchies of checks where some checks depend on other checks which depend on other checks and whatnot. So if a child check is failing and its parent is failing, you obviously don't want to notify that because the parent check is more important. So this is really easy to do. You can go here with the persistence API, you pass in if any parent's failing of the particular check that we're dealing with right now. And if they are, then we block, and that means that we don't need to notify. So it handles that problem quite elegantly. And you can also do other things like filters for downtime or for acknowledged alerts or anything like that. The sky's the limit basically when it comes to writing filters. The final component of Flapjack is the admin interface. And I won't really talk about that all that much because basically I've thrown out all the code that I wrote because it was crap and I'm working on new stuff that's fantastic. So the next important thing about Flapjack is that it talks to the Nageos plugin format. And this is really important for a couple of reasons. Mainly because there's not a lot of point in reinventing the wheel because you're just going to do it wrong. The fantastic thing about Nageos and the Nageos plugin format is that it provides a formal interface for writing plugins and consumers. So the interface being exit zero, exit one or exit two translates to good, bad or ugly. And you can provide extra information in there as well with the extra reporting stuff. And the great thing about this is that it's so easy to implement that that's why there are tens of thousands of Nageos plugins out there. Why ignore all of them and switch to something new when they all do a fantastic job of what they do already. And the other great thing is that it's the industry standard in the monitoring world, right? Everybody understands and talks the Nageos plugin format. So there's not a lot of point in switching away and trying to convince people to use something that's better because it works quite well. So the other thing about Flapjack is that it really strives to not do any sort of data collection at all. It is essentially a notification system that things are bad, whatever those things may be. And it leaves the data collection problems and the actual writing checks themselves up to other projects that do that much better. And it really subscribes to the Unix philosophy of doing one thing and doing it well. So there are three different, I posit that there are three different types of checks. I think that there are gauges which are for getting sort of lower level statistics like things that like ganglia would provide information on or other things like collectee. So lower level stats about CPU usage and network usage and all that sort of thing. Then you have behavioral checks saying, when I interact with the system in this way, am I getting the result that I expect from it? And things like QCum and RGLs do that quite well and I'm gonna talk about that in a minute. And then finally, trending. And there's nothing really that does that all that well at the moment. And the trending is more of a function of the monitoring system itself and eventually the filters will probably implement some sort of trending in some way. There's Reconnoita as well which is another monitoring system that is doing some interesting stuff with trending. So that if you're interested in trending monitoring systems that's definitely worth checking out. So we're gonna segue for a tiny bit onto QCum and RGLs which is another tool that I wrote. And QCum and RGLs is all about web testing and behavior-driven infrastructure. I'll talk about behavior-driven infrastructure in a minute because it's sort of an out there term. So very simply, QCumba is basically an executable specification. So you write in plain human understandable language how you expect a system to be behaving. So in this particular example here we're saying that when I visit this particular URL, so Google in New Zealand and I fill in the query with Wikipedia and I press the Google search button then I should see this particular string on the page. And internally what QCumba does is it maps each of those steps over here to these little Ruby DSL fragments. And what it will do is it'll call out to some other system to do the interaction with the websites. And QCum and RGLs makes all this stuff really, really easy to do. So normally when you run QCumba just by itself which is traditionally a web testing project but it works quite well in all these other cases as well. All these features exist in a single file. So let's just say this is the search feature here. And when you run that you'll get a bunch of pretty output that says I ran through all these steps and they all worked and it was fantastic. Cool. So what QCum and RGLs does is it does exactly the same thing. It runs through all those steps and then if it works then it will output in the RGLs plugin format whether it worked or not. And it means that you can write these high level tests in plain human language and plug them into your monitoring system. So let's have a very quick look at how it works. So the idea is that you install a QCum and RGLs gem. It's distributed as a Ruby gem. And you run QCum and RGLs gem to generate a project and in this particular case FOSDEM 10. And then we CD into FOSDEM 10 and then we run this gem bundle command. And this gem bundle command takes all the different dependencies the QCum and RGLs requires for it to run and freezes them into the single application. So that means you can just tar up that directory and then distribute it on your production monitoring environment and that's it. So if we actually look at the way that it works. So here's one I prepared earlier. So within that if we go QCum and RGLs gem feature say FOSDEM.org and we're gonna look at the navigation. Right, so this goes and it generates a bunch of stuff for us. You guys can see that okay up the back? Great. So if we look here it's generated just a bit of scaffolding for us and if we run that right now then hopefully that should work assuming that FOSDEM.org hasn't just gone down. So QCum and RGLs provides a bunch of built-in steps for doing things like interacting with websites. So this is built in library saying when I go to here or when I press this button or when I fill in or all these different things, right? It also has other things like SSH steps which I'll talk about in a minute for interacting with machines over SSH and whatnot. But I'll get to that in a second. Anyway, if we go back here and we go okay when I follow say tracks, when I follow tracks then I should see lightning talks. Okay, so if we run that, right. So you can see here that there were four steps that passed and that was all great. And say if we modify that to be and then and I should see spoons of doom. Hopefully that isn't on the page. Great, so we've got a critical here of one. So obviously that string wasn't there. So the cool thing about this is you can actually pass a bunch of other options. So if we pass pretty it'll run through and it shows here that this particular thing failed and if we go up we see here and I should see spoons of doom, expected spoons of doom, didn't see spoons of doom, great. Okay, so yeah, that's Cucumber and Argeos. And you can do a bunch of other interesting stuff like this new term called behavior driven infrastructure. So just after I presented Cucumber and Argeos in October last year, Martin Englund from Sun piped up on the puppet user's mailing list saying, you know, hey, I've played around with this Cucumber stuff before and wouldn't it be sort of cool if we could take all this Cucumber stuff and apply it to the idea of configuration management or build management? And he basically put together this blog post describing how he was using Cucumber to verify the builds of his system. So the interesting thing that came out of the discussion from this was that you can actually think of puppet as being a build tool for configuring systems, right? So the build tool or like a programming language and then Cucumber itself being a testing tool to verify that your systems are configured in the way that you expect them to be configured. The other interesting thing about this is that it's not proper centric, right? You could use CF engine or Chef or do your own hand roll configuration. And the hand roll configuration thing is actually quite interesting because let's just say hypothetically you have a bunch of machines that aren't puppetized and that have been sort of crafted over the years and nobody really knows what's going on with them but you want to migrate to a configuration managed environment. So you could use Cucumber and Cucumber and RGOS to describe how the system is currently working, testing that all these different behaviors and interactions work the way that you expect. And then once you've done that you can build a bunch of stuff with puppet or Chef or CF engine or whatever. And you basically iterate in your configuration management tool until all your tests are passing. So there are a bunch of other things that are in the works like say mail server tests. So let's just say I want to have a bunch of local logins from a mail server so say that when I don't have any public key set and I SSH to this machine with this username and password all this stuff should work. It also works for LDAP logins or whatever sort of authentication system that you're using. And then other things like mail, right? So you're saying that when I am using this mail server and I log in with this username and password and I send this mail to this person then it should send correctly. And obviously the next step of this is the receiving at the other end, right? You know, we can check that the delivery works okay but if the user isn't receiving mail at the other end it isn't really all that useful. So the question is then why would I want to do this? The thing about monitoring right now is that most checks are actually asking the wrong questions. Most checks are doing some sort of ping or a TCP connect to verify that something is the way that you expect it to be. And those things are basically asking is my server up or can I see my application, right? That doesn't deal with a bunch of edge cases like a VM going down and the network stack being up. Obviously it's still gonna respond to ping, right? Or it doesn't matter if your web server is up if you're serving 404s all the time or 500s it doesn't really matter, right? And that basically means that your monitoring system is dead in the water. So QCum and RDS allows you to ask the right questions a lot more easily. Things like is my app behaving? Can I navigate around my website? Can I place an order? Can I sign in? All these different things. And we can actually start thinking of monitoring to be sort of like continuous integration. So a traditional CI life cycle is something like this where you have to check out the build, the test and the notify phase, right? So if we think of monitoring as being continuous integration for production apps this is actually an interesting idea because we can actually take the CI life cycle and strike out the check out in the build phase because somebody's already built software for us and we're just doing the testing and the notification. The funny thing about this is that this also looks really similar to those diagrams that I had earlier about what Flapjack is doing. So let's just think, okay so in your monitoring system what your checks are currently doing are saying can I see my app? Can I do some sort of TCP connect and you're checking for a string or whatever. And let's think about that check that you're doing in a continuous integration life cycle. Let's think about the test that you've written for your code when you're developing and thinking about asking can I see my app? It doesn't make any sense at all that when you're developing the application the only question that you're actually asking is can I see my app? Because yes, of course you can see your app but it doesn't mean that it's functioning. It doesn't mean that you're making any money. The other thing to keep in mind is this is not new. Other people have done this before. You can already do this with a bunch of different checks. If you're using check X or check Y with check Z you can get the same sort of functionality. But the thing about QQM and RGOS is that it makes all of this reuse really, really trivial. So it means that instead of having to write the same checks again and again you can reuse an existing library of checks that other people have written. And this is great because it means that you're writing less code which means that there will be less bugs and less bugs mean less alerts. And less alerts at 3 a.m. in the morning which is obviously what we're all optimizing for. Right. So this is a great quote that Bradley Taylor wrote. And obviously it's a bit of a jib but it's actually quite apt, right? It's really, QQM and RGOS is really about building bridges between CIS admins and developers and increasing the collaboration between the two camps so that we can learn from each other. So if we take another step back out from QQM and RGOS and we go to CollectD as I finish up. So CollectD is a lightweight statistic collection daemon with an emphasis on collection. Sort of analogous to Ganglia if anybody was in the previous session. It's network aware which means that you can collect statistic locally and send them upstream someplace else. It has a plug-in interface and it also talks the RGOS protocol. So that means that any of the statistics that you collect with CollectD you can poke out with CollectD RGOS which means that you can plug it very easily into your monitoring system. And there's a huge list of plugins available for it and this is expanding with every release. It's actually really, really cool. So if you're interested in any of these plugins you should check them out on the CollectD website. There is a bucket load of information there. So if we look at some example configuration very quickly here we're having a CollectD client and you can think of a CollectD client as being like a RGOS agent that you're running on a machine, right? So we're loading up a bunch of plugins and most of these plugins don't actually need any configuration. And we're saying up here that we want to collect these statistics every 20 seconds. And then we have this network plugin and we're saying that all statistics that we collect locally we want to send up to this monitoring.mydemain.org or you can do multicast stuff or you can specify IP addresses or whatever. So then on the server at monitoring.mydemain.org we're saying we're collecting stats every 20 seconds and we're using the network plugin and not as many of the other plugins. And we're saying up here that we're listening on this particular address and all statistics that come in, whoops, all statistics that come in we're going to write them out using RID tool to this particular directory here. And we're holding on to those statistics for 900 seconds before we flush them out to disk. And you can also use other things like RID, KHD which was mentioned in the last talk as well if you have huge volumes of statistics that you want to log out to disk. The other awesome thing about CollectD is that there are language bindings for the network protocol. So it means that within your applications you can instrument statistics from like within your web app or within your Tomcat app or whatever and send them over the network to a running CollectD instance which is a great way if you need to instrument statistics within your applications without having to build all sorts of extra crazy stuff on top of it. So finally, going back to Flapjack it's some stuff about what's happening in the next few months. So right now, Flapjack is distributed as a Ruby gem which is really ghetto and inappropriate for a system administration tool. There are a bunch of people some of whom are here in the audience who are building packages for different distributions and to those of you who are here, I thank you. The other nice thing about Flapjack in the next few months will be implementing nice graphs in the admin interface. It will make it a lot easier to sell to your boss or whoever when they've got nice, pretty stuff to click on. So there's another project that I've been working on called Visage. And whoops, apparently this link is working. Sorry. Okay, here we go. So what Visage does is it renders the raw statistics that RID, so the collective writes out RIDs and it renders them in the browser and not just rendering them in the browser but it means that everything that you see here on the screen is actually a DOM element. So it means that you can do funky things like, you know, if I put my mouse over this particular thing here, I don't know whether you can see up the back but sort of fading in and whatnot, that's sort of cute. And you can toggle them in and out and all that sort of thing. And you can also do other things like that. Right, sort of neat. Which the other thing that's in Visage that I haven't publicly released yet is all this stuff is embeddable. So all these graphs that you see here on the Visage dashboard, there's some code that I've written that you click on this embed link and it spits out a bunch of HTML that you just paste into a page, which is fantastic if you want to create dashboards of all the different statistics that are flowing around your system. The last thing is a job insertion API, which if you're interested in hacking on Flapjack, you should come and talk to me about it later. So thank you very much for listening. Who here has questions? Do we have any questions at all? Or have I dazzled you all with my brilliance? About Flapjack, whoops, sorry. Can we use it in production? I have a older version of a running in production. I've done a fairly heavy amount of brain surgery to it recently, where it's not really in a production ready state. But that's certainly changing. I'm hacking on it quite vigorously. Thank you. Any more questions? Does anybody want to see demos of stuff? I don't know. No more questions? Oh, yes, over there. You have a few components who talk Nagios, but where does it leave Nagios itself in the picture? It leaves, so the question was, where does it leave Nagios in the picture? And the answer is it leaves Nagios out of the picture. I don't see, like, Flapjack is essentially a replacement for Nagios. And that's what I'm aiming to try and be. Right now you can think of Flapjack as being the infrastructure for building and monitoring system. As I'm rounding off the rough edges, eventually the aim is to be like the de facto standard for monitoring in the open source world. No more questions? Okay, thank you very much.