 Welcome to Before Else Be Graffed by Hedy MacLachlan. That was good. He was worried that he wouldn't be able to pronounce my name, but that's fine. I can barely pronounce it at the best of times. Greetings, humans. I'm Katie and I'm going to be talking about a dead guy. I work for Anchor Engineering. There are a few of my colleagues in the room. Say hi guys. Wow, okay, literal. Okay, so we had a problem that we tried to solve, and I'm going to be trying to describe, hopefully not too quickly, the different ways and solutions that we went about solving. A problem, hopefully, these solutions may apply to you, and I hope that you learned something. So, we work for a web hosting company, and we collect a lot of data about servers, metrics, NGOS, all that kind of fun stuff. We've got OpenStack data, IP traffic, networking data. We have a whole lot of data, and not a really good way to visualize everything we need when we need it, because if you want to see what a server is doing, three hour old data is not going to help. An infographic that's been hand tailored is not going to help you. What's also not going to help you is trying to port it into Excel. The humanitarian mini-confort taught us that this thing should die a horrible death. So, what we want is essentially a way that we can present nice data in a usable format that we can consume quickly. Infographics don't work, Excel doesn't work, but we want to have fast and reliable. So, why not both? This is not my deck. Why is this not my deck? I'll run with it. I don't know what's coming up. There we go. I'm in big in, in big in. Hi. Okay. So, that was caching. So, that was going to run. I should be okay now. So, even though this really isn't a good thing according to some comments I've heard around, it is really a pet-first-catal visualization or virtualization problem where we want something we can use and it doesn't look horrible with graphing all the things. So, it's a fairly broad thing that we just want to fix everything. And 12 months later I think I might have worked it out. So, my solution, I named it after a dead guy. We have an entire stack that we've built. This is just the front-end talk. We have an entire back-end and user systems named after other dead people and there have been other talks about this. This guy is the Italian guy. Everyone else is French. Probably front-end versus back-end thing. But I'll just be focusing on the front-end metrics and graphing in JavaScript. So, JavaScript doesn't scare anyone. CSS doesn't scare anyone. Okay. So, all the code that I'll be showing today is on GitHub because Open Source and it's pretty much just graphing and not the linked lists like line graphs, area graphs, all that kind of fun stuff. So, let's start off first. What is a metric? We are going to be talking, sorry, not time series, a time series. So, giggle, giggle. We have a time series database that we built and it's literally a point over time. So, you have time then and time later. So, this is really a nice format if you want to get multiple different metrics and align them all and see how things have changed in relation to each other. There are other solutions out there for this. There is graphite. There is influxDB. There's a whole lot of other things that sort of grab the entire problem set and give you a box that does stuff. But what we found is that it would be a whole lot more efficient if we could separate what we were trying to graph as a data source with what it conceptually represents. So, Machiavelli is very data agnostic. It doesn't care where it comes from. Nagios OpenStack data is what we are using for it internally. Machiavelli don't care. It's a strong independent Ruby on Rails that can take anything. The main way that it works out exactly what a metric is and how we've tried to describe it is that a metric is a metric. And that an origin is that decoupling of the way that the data is stored and how it's humanly represented, which is a really big distinction that we felt that graphite and other such solutions were clogging together and duplicating data and not being very efficient. I may not have explained that very well, so let me bring back a slide from before about the time series. This is time. You may recognize it from your spice area or your garden. We have fresh time and dried time. These can both be used to make lemon lime chicken or lemon time chicken. We want to add them to a margarita or some sort of mejito or something. So we know as humans what we use this thing for. A machine would not. So we've got a plant and we've got a jar. You can turn the leaves on the plant into dried time. You can turn the jar of dried time into dried time and then you can use them in the exact same way and you don't need to describe or re-implement those sort of things. So they're both metrics. They've both got the same context, but they come from different stores. I thought of this and I still don't understand it, but it works. So dry implementation. I don't have to re-code anything. Actually, I think it's really nice and I like cooking. So time series. The point is that once you know how to get time from a jar, you can get cumin from a jar or you can get anything else from a jar because you've already told the machine how to get it. And then you just say, well, I use paprika for jerk chicken or I use oregano for bolognese. So first step, accessing the data. We need to get the information out of the database, the JSON feed, wherever it is, and into the application. This is not a way to do it. You do not. If you care about the sanity of your systems, you do not want to dynamically pull JSON feeds arbitrarily. Chrome won't let you for good reason because there are security implications. You do not want to normally create cross-site vulnerabilities. You do not want to add issues with the security of your information. And sometimes you don't have the ability to say, I want my JSON feed to be able to be queried arbitrarily from anywhere on the Internet, which would be a bad thing. So what we did is we have this front-end JavaScript manipulation that actually encapsulates its own internal API that pulls all this data server side, which means that we can isolate the authentication mechanism so we can say this server with this IP has the right to be able to pull this data. And it means that we can pull from local files, which you can't do in a browser natively. And we can also have a level of store obfuscation. So users may not be able to know or see where the actual data comes from, but the server does. So there's a whole lot of nice things that happen with that. Once you get the data, you want to sanitize it because it is external data. You do not know what's in it. But the problem with time series is that time is only consistent sometimes. You have issues where you may stop collecting data. You may have missing data. You may have irregular data points. And you want to make sure that what you're getting can just be slurped up by JavaScript. And it can just go, okay, I'm just going to graph this now because you have an entire web server that can do a whole lot of calculations on it before you have to ingest it. And since you have control of that, the JavaScript isn't going to be horrendous. So we have the option, caveat, to sanitize any data that we bring in automatically so we can do live interpolation on it or backfilling for zero points. This may not be the best solution for every kind of data that comes in because you may want to see when a server has not reported anything. So it's still a solution to the issue of insane data. But it may not always be required. You may know that your data is sane already. Part of this stack that we've developed has our Voltaire back end, and we have our Machiavelli front end. We've also got a few layers in between, including a RESTful API that we've called CEST because it's RESTful. We also have a whole lot of other consumers and some of them go directly to the database. Some of them have this sanitized input based on how they want to consume the data. So in this case, when we're collecting from our data store, we do not have to do any sanitization because we are in control of where that data is. A series of numbers is all well and useful, but you want to know what it's about. Metadata probably isn't the right term because it has been overloaded recently with other metadata. We're talking about servers here, not human beings yet. So this is what I spoke about earlier. You have a number going from zero to 100, but without any context, you don't know that it's a server CPU. In the example that I gave earlier about time, the metric might be displayed to us as garden, herb patch, time. As a machine, we know that that is uniquely identifiable. As a human, that doesn't read well. So as a metadata, we can dynamically say, you see this thing, split it up, put some verbs and things in there, and make it a nice title for us. So we can take what information that we can make it useful. And incorporate your business logic. We do get some information depending on where we get our data, but we can also make assumptions based on what we know about it. So we could use a relational database and we could have views and we could have computed fields and we could do a lot of things. But we're not going to because this is 2015. So we can have a different solution where we dynamically derive what we need when we need it. So we can't always add data directly to our data source and our business rules might change. But we can add the data when we're going to consume it. So say the brand of the time in a jar. We may buy master foods one day. We may buy another place that does dried herbs. Hoits, I think. We don't have to then go to our database and say, see all this time we are going to find and replace everywhere. Because that's just silly. It's more data that we don't need copied everywhere. So how we've solved this particular problem is that we get the server to dynamically work out what's going on for us. We've done a whole lot of server side stuff. Now we get to get to the cool stuff. There's a demo coming up. So as I said, my slides are really good once I refresh them. But we have same meaningful data. The JavaScript is really easy because JavaScript is horrible if you don't know what you're expecting. So a lot of this stuff has come from other projects including Shutterstock, which do images like Getty images and stuff. You wouldn't think that any of these people have had graphing solutions that they've open sourced, but they do. And it's very nice, except when you want to extend them. And they don't accept upstream changes because they thought that copying their content onto GitHub was that. So I do run active forks of all of these to add nice things like UTC and such. I've also created nicer graphing styles, including tea leaves from Tezio, which you may have heard of before. It's one of the things by Jason Nixon, I believe. He goes around with a Black Adder avatar of Hugh Laurie, and he made a system called Decarts, and he did a whole lot of visualization. Then he was picked up by GitHub, which is why their visualizations are really nice now. So we have a demo. Here is a demo. Can we see this on the screen? Okay. All right. So what I have here is a list of points from the government, which have a nice JSON feed that you can see half-hour intervals of various metrics for weather stations. This example is exactly why I need to pre-sanitize my data. Last time I gave this talk, it broke because the government had a power outage at one of their stations and missed points. So this entire thing exploded. This one will not because I've enabled the, my data is insane, please fix it for me, check box. So we have, this is a thing, please don't go to it now. Check it out later. I really do want to have this thing keeping on running at least until 20 past. So I've got a URL here that looks somewhat human readable. So we've got a metric, our data sources, the Bureau of Meteorology, and we've got say Campbelltown air temperature and Gosford air temperature. Put them all together. We can see trends. We can see that say something was hotter and then it went colder. And this is one of our stacked graphs. So we've got things where we can say, I want to say, I compare things on two different axes or I want to remove things or I want to say zoom and see more data and dynamically resize to fit what kind of data we're seeing. And all this is dynamic and anytime I can copy this URL and see the exact same thing in another browser or send it around to my friends. There are a whole lot of features in here. I'm not sure whether my slides are going to expand on this more. Hey, let's do the thing. Click a thing and see a change. So we have a whole lot of different options here. Everything is going to be stored within this URL. Say this is historic data. We can click a button and remove a line. If I can use my mouse, which I can't. We can zoom. We can see all the things. So a whole lot of server-side stuff for a whole lot of live, dynamic, manipulatable things. So as I said, that URL is really special. And since I'm still talking really fast, I'm going to have a tirade now because I can. Because I have you in a room. And there is beer tonight if you're nice. As I said, we have this thing where we want to be able to share a URL around the place. The problems that we have are, you saw how dynamic this thing was and this view. If you want to say, here, look at this thing. How are you going to say, I want you to see this exact setup of my graphs? You can't use cookies because they're isolated on your machine. And dashboards require storage, which is just too hard. And we have explicitly tried to store the least amount of data that we can. There is no database here at all. It's all live and dynamic. And the only thing that we are storing on this particular server at this time is just the data files, the JSON files, because the Bureau of Meteorology wants to charge you if you want to pull that life, which is not very open at all. So we can store our state purely in this URL by a means of black magic and non-RFC same things. This application breaks the laws of thermodynamics as it comes to URLs because there is a little bit more. This RFC is horrendous. It is outdated. And I will prove to you why it's horrible. This is the uniform resource identified generic syntax. It is 10 years old, and it is not very specific at all. It's very generic. Emphasis on generic. There is clearly defined states and systems and quantifiers and reserved characters, but not for the query string. The query string is important because it's the string that queries. No one has any idea what they're doing with this because it doesn't get defined in the RFC. So do we know what a URI is? We have a URI. We've got different sections. There are some special characters. Am I going too quickly? No? OK. We can have query strings. We can do a query. We can have a fragment. We can have the hash. We can have the path and special characters again. This basic functionality is defined by the RFC. But then it gets complicated. For these examples, I will have a human interpreter and a part interpreter. So at the top, we're going to have a query string. Middle is the responses. And the bottom is going to be what rails parses. So that's the machine response. Start off with a start and a stop time. These, as a human, we know that there's seconds since Epic. I am defining human here as someone in the room. Because this particular date format, machines are really good at, and humans will learn to be good at. Because standards about what the U.S. does with their dates and their months and their days and their flipping and the inverted triangle. So the statistically average human in the room here can understand that. This is fine. The machine can understand this because there is one start and one stop. And I know that these are things. Rails also happens to know that they're numbers, which is really nice. We go a step further, level one. We started at zero, going to one. Again, something that the humans in this room understand. The problem we now have is that we want to, this store and source, we want to split it. And the wonderful tilde is an allowed character in a query stream in every single web server I could find. I think this possibly dates back to when you had home directories with a tilde name. So you are allowed to use this. So we are using it. We are defining... and I'm actually a human. The machine can parse this as soon as it's told what to do. The machine in this case, I'm assuming is Rails and then the business logic about splitting on tilde comes later. So this is all valid. This is all fine and this will work. But then we get problems. We try to have two variables and the machine goes beep. There's only one. The machine ignores a second value with Rails 4.1. This will work fine in some Python web systems. This will work okay in possibly some Perl ones. This works K and jQuery, I believe as well. Rails and... rack? Rack. Rack doesn't know what to do. So what does the RFC have to say about this? One person is interpreting it differently. It doesn't actually say anything of what to do when you want to have duplicate keys. People made their own standard here and new peers to the same standard. So it's not really a standard. It's more like a set of guidelines and actual rules. So some systems like to add extra braces here which confused me as a statistic among here that is quite average, it confused me. So I think that it may have confused other people. The first time you've seen weird brackets in a URL. But Rails thinks that this is fine because it knows that oh, I am declaring my key as an array and I can have multiple values. You can extend this further and get near JSON-like syntax in a URL using a whole lot of brackets and a whole lot of nesting. But you keep getting things like this isn't actually a thing. It is a metric which has 0 of x and 1 of x and I can't read that and I wrote the thing. But then as soon as you get really, really complex everything is a bit weird because now you have some singleton, some multiple, some other things and then both explode as soon as in Rails 4 you try to have de-separate data types. You cannot have a string and an array under the same tree of your JSON encoded document in the URI. jQuery does. jQuery is better than Rails at this. This is not cool. This is why we cannot have nice things. There is no standard, there's no anything. Everything is ugly and mismatched and all I want to do is just store some data in your URI so I don't have to do cookies or anything. So what's the solution? I made my own. Now there's n plus 1 standards. So first step. Those ugly braces are ugly. So brackets, brackets, brackets, braces, wiggles. Brackets, braces, parentheses, p, b, b, p. Okay. So before rack gets a load of my query string I change things because I say, see this metric, assume that's an array because I know I'm going to have one or more and as a human an array of one is an array. Step one. Step two. The ordering is wonderful but what we can do is say, see that one tilde? Let's duplicate that and have some sort of known queries and tags so then I can say have foo, bar, line, computed, green and that's fine. That's safe and the human can read and write it. And this is the thing. You want to be able to write these things because we don't like GUIs. We like to be able to type things on a command line. You should be able to curl this. I don't know how you're going to read the JavaScript output but you can get it. The problem though is once you have this standard you then have to declare it in the front end and the back end. So I'm not exactly sure whether they still exactly match my defined standards but I had to write something in Ruby and JavaScript and that filled good. The solution to this is Haskell because you can write JavaScript in Haskell. I'm scared of this but it will be an interesting experience. I'm sure Katie Miller will show me something on Friday about how it's awesome if she's doing that talk again. So a summary. How am I going on time? I am perfectly on time for ten minutes of questions. I think yes. Yes, the AP guys agree with me. So in summary, dynamic coding is fun. Making our standards is fun. Analytics is fun. You know what else is fun? This is the last talk and then we get to go and drink but you still need to do lightning talks unless you're running away and I have mentioned before there is another talk on this by the unanimous good person in the back corner there. If you're disagreeing with yourself I'm not sure if that's words hard talk bad. Tomorrow somewhere it's Haskell, it's Seff, it's all the buzzwords and this is all online. Please. There's a thing that makes you have googly eyes on things and then you can make them move. This is all on GitHub, forks, issues, patches, everything. Do all the things. Thank you for listening and if there are any questions written question from the audience for someone who the AV guy thinks is. Hi. Hello. First off, thank you very much for delivering typos on GitHub for the reviewer projects. Yes, context for those. I did give a lightning talk version of this in three minutes at a DevOps and somebody tweeted it which was great. Except they misspelled my project name because they couldn't spell Machiavelli. So I registered the project that literally just says perhaps you mean Machiavelli with one C. I think it is. They did two Cs. So I had to register that and try to get the people back. So, thank you. Other than that, in terms of visualisations, I know some of this is going on to the next talk that you mentioned before. Can you use tools such as GraphVis for your hastal visualisations or are you sticking with like pure web canvas type of rendering? All these use SVGs and layers on top of D3 which is really accessible because not a lot of people don't expect Graphs to be in a web plus I'm not sure where the GraphVis has all the shiny things where you can dynamically change and zoom and stuff. I'm guessing GraphVis creates images. Yeah, so if you want to start dynamically removing things, zooming things, you can't really do that in an image. You could click really hard but it's not going to change it. Plus, we do eat our own dog food and one of the ways we are integrating this is there is a little button called PNP for Nagios set up that will, once I get it working, link to the Machiavelli version as opposed to the PNP for Nagios version. So that's sort of you expect a web thing. We're a web hosting company. Why do we want a static thing? I don't know. Just make about 10 different SVGs in advance depending on what Tom wants to do. I don't know. Yeah, that's called PNP for Nagios where you have static images all the way down. Seriously? Yeah, you can get 24-hour time. You can get a day. You can get a week. You can get a month. You can also get lossy data and you can also have this as a ever-scrolling document. This is a thing. PNP number four, Nagios. It's ROD data visualization that is 20 years old or 10 years old, some decades. Anyway, yes, hello. You use the tilde to separate your... Yes. Yeah. What about this Ford Slash? I think we're used to it. Why couldn't you have used that? A Ford Slash in a query string? Yeah, that would work fine. Just an idea. No, wait. I know the answer to this one. I have tried this. No, no, no. I've thought of these things before. Somewhere along the stack it will encode it into a %20 without you asking. Somewhere. I think it was NGNX or something. Yeah. So those kind of encoding things do not respect the query string, which they really should because that's actually defined. But I wanted to not just be lazy and have, say, Kevin forbid, an XML document encoded with percentages and stuff, because no human's got time for that. You can't go %20, %25, %2F all the way down. It's horrible. Hello. Hi. Hi. If everything is on the URL and dynamically done, what kind of size limits do you have there? What's the most amount of data you can graph at one time or things like that? How big is your local laptop memory and whatever browser you're using that I'm assuming respects the amount of memory you have in your system to be able to actually display once it's finished generating? But there's no length limits, say, on the URL or anything like that? Length limits would be restricted to the web server. In Rails development mode, that is 1024 characters. Engine X, I believe, is 4096 unless you extend it. Apache might be different, but it's literally based on the other layers. I can take anything. So it's a lower stack limitation, but there are always ways around that because they're all open source and you can change settings. Queries, questions, qualms. No one wants to berate me on the fact that I used herbs. It was time. It was time. It's fine. It's not a regano. Gosh. If that's it, I thank you very much for your time. Right. Thanks, Katie, and we have a gift for you for taking the talk. Thank you. Thanks.