 Okay. Hello. Good morning, everyone. I'm Martin Ferrari. I'm going to present you this system I've been working with as a deviant maintainer. Also a bit involved in upstream but not too much. Which I'm presenting here because I think it's very interesting system, very different to what most people know about monitoring. It's based in a talk we gave together with Stefan sitting here in the BrimLab Hockey Space in Prague a month ago. I am a sysadmin for a long time and after I came back into working at Sysadmin after a gig in a company with a lot of close source products, I was struggling to find a good monitoring system and this for me was so interesting, so good that I wanted to share with you. If this works. Okay. So who's Prometheus? Prometheus is this dude who stole the fire from Mantar Olympus and gave it to humanity according to me, which might or not be some relation with the project itself. Do you know what I mean? So what is Prometheus? Prometheus is not Nagyus. That's the main thing to know. It's a completely different way of looking at monitoring. We are used to the Nagyus states of good, bad, or worse, which is bad warning or okay, which is not really useful when you want to understand what's going on, when you want to prevent problems before they appear. It's too coarse. It's only using very basic checks usually or complex checks are encoded into a script that only gives three states and many monitoring systems are like that. It has scalability problems, which I hear here that I think it's trying to improve, but it still has some issues with that. And as I was saying, you don't understand what's going on. You get an alert three in the morning, the web server is returning 500 errors, but it's difficult to track what's going on, so maybe you go to another tool, it will show you graphs, try to correlate for go to logs. Many different things you need to do usually when you're dealing with an alert that are not integrated. There are some new systems like New Relic that are very much involved these days. All these cloud-based automatic systems that do a lot of things for you will provide you nice graphics. We get a lot of data from applications. In fact, the idea is that you install this more or less proprietary library in your project and you will automatically get information about the health of it, which is called instrumentation. But then you get these nice consoles and then when you want to do stuff with it, it's not so easy. The data is not controlled by you. You have to assume that they will be always up. Getting the data out of it is not easy. So it's not the kind of system I'm comfortable with. Mention instrumentation for me was a new concept a few years back, maybe for other people it's more known nowadays. The idea is that you put hooks everywhere in your application or in your service. They will measure stuff, whatever, maybe a number of requests from an HTTP server, time spent in a specific function, you name it. That's up to you. The nice thing about these systems like New Relic is that they will data automatically for you, some kind of black magic to install this library and they will instrument everything, or for some value of everything. Java has the JMX framework. They will do something like that. And it's very interesting when you get into this idea because you're not only monitoring black box monitoring, you only check what's happening from the outside if it's giving errors or not. But you can also see and understand a lot better the health of your service. For example, because you can see how much time you are spending on querying the database from a specific function or stuff like that that will tell you exactly what's going on when something goes wrong. So what does Prometheus do? It's a combination of different things that you might be used to, like gathering data for graphs and for analysis of errors, provide nice graphics, provide alerts. It's quite integrated. That's pretty nice. The main thing about Prometheus is that it's a very efficient store for time series data. That means data that is collected every few seconds or minutes. And encourages you to get data from everything. The idea is that it's very cheap to get data and so you can say, give me, scrape all my targets in 15 seconds and get like a thousand different metrics from each target and it will cope with it. I've been using it in pretty small instances and can cope with thousands and thousands of metrics per second with no problem. Because of this, that data is so cheap to read and to store, it encourages you to do this kind of instrumentation thing I was talking about before, so putting metrics everywhere and then collecting them everything and storing them on disk. Even if you're not using it for any alerts, the idea is that you don't start collecting data when you have an alert for it. You collect everything and then you see what you can do with it. Because also you can see what happened in the past. You can analyze data from weeks or months ago, depending on your configuration. And as I said, it has really nice graphs and consoles and everything and it's very customizable so you can sell it to your boss. Intermission, since I'm in depth, I have to justify giving this talk. I started packaging this very soon after I learned about the project because I fell in love immediately. But this is written in Go. Go is a new language everywhere, very new to Debian. So there are many problems with that. Go produces only static binaries. So library packaging is a bit weird. You have to package the source code and the policies still are being decided. Then someday you realize what you have been doing for the past few months has been wrong so you have to change everything. So it was a lot of work. I had to package like, I don't know, 15 to 20 dependencies. I don't remember. It took me months. And the problem is that AppStream is so active that every time I had almost ready a version to upload, there would really be a new one. It took quite a while. But now it's unstable. Since like a month ago, it still has an RC bug so it hasn't migrated to testing but it's useful. I promise. But well, the Go team is small so if anybody is interested in Go or in Prometheus, come and take a welcome. So let's go back to Prometheus. The architecture is a bit like this. This is a graph based on the web page of Prometheus which is very good because it has a lot of documentation and examples and they also have a blog talking about best practices so I really recommend to read all that. Sorry. So it has, the Prometheus is separated in different services so you don't need to use everything. This is an overview of the whole system. You have in this side, you have the services that will collect your data. It will push it to the main Prometheus server that has its own storage. It uses level DB for indexes and don't remember what for the actual data, something homebrew. You have, in the Prometheus server, you have query interfaces, you have an API using JSON and HTTP and consoles and you can use this from that thing which I will show later which is a very nice way to create shiny graphs and consoles. And from Prometheus, you push alerts ideally to alert manager which is a services still a bit quite better but that will allow you to manage your alerts better like silencing the finding times for different kinds of alerts and stuff like that so it doesn't wake you up if it's not important. On this side, we have the four main ways to get data which is three main ways, sorry, which is using the current library inside your code, in your project, in the service, using the notice porter which is basically the equivalent of Nagyos checks of host health and the push gateway which is an interface between Prometheus and service that can only do push because Prometheus does pull, so crunch outs and stuff like that. I'll talk a bit more about this. Sorry. So one of the key things about the way this is done is that the protocol is very simple. It's very easy to implement. It's just HTTP gets a URL, usually slash metrics. It's plain text. You can use proto buffers as an optimization but it's not mandatory. This comes from former Googlers so everything is a bit biased on that side and they use protocols everywhere. But it's just text files saying metric name, value, nothing else. And as I was saying, it's pool based. This is a source of frameworks and stuff like that but it has been decided so it's not up for discussion. We can discuss about the merits of each one but Prometheus does this because it allows it to be pre-efficient and have some regularity with the metrics ingest because that's important for many calculations. And because also it's pretty good to have different instances of Prometheus monitoring the same thing and you can create new instances like development instances without changing any configuration in your servers because they're just HTTP endpoints so you can scrape that from everywhere. Implementation of ingestion. As I was saying, it's very efficient. It can ingest hundreds of thousands of metrics. Depends on your server size, CPU bounds and network bounds but most of the CPU bounds. The storage is very efficient. We have some benchmarks in the web side of comparisons with other time series databases and with terabytes of data you might be able to keep the service, the historical data for months or maybe years. The retention is tangible so you can say, well, I want to keep a year, ten years or just a week depends on your needs and that will define your memory and this constraints. But the thing is the defaults are saying which is pretty nice and uncommon. So basically you go get the source code from upstream and builds and runs and that's done. It's actually running and monitoring itself by default. And Debian is doing the same thing. In fact, I added some recommended packages which is like the NodeSporter and the CLI. So by default it will monitor itself at the machine without you doing anything. And you have a nice web interface for that. So let's talk about the ingestion sources. I was on the left side of the graph. I was showing these four boxes. The most common one is the NodeSporter which is basically basic host metrics. It's basically scanning slash proc for stuff. So these devices, network devices, RAM usage, CPU usage, et cetera. And it's very easy to extend by just writing your own metrics in a text file and this will read it and export it. So if you want to add some stuff without much work, you just add a cron job. That's right to a text file and that's it. Then there is the push gateway which is the interface to the push word. Because for pool, you need the service to be running all the time. And to keep a state. So it's not suitable for cron jobs, for example. Or for scripts that run in batch for a few hours and then die. So these kind of services will be updating data and the push gateway. Also using HTTP. They will post instead of get. But it's basically the same thing. Then we have the exporters. They have a long list of exporters already that are for different systems that will provide metrics ready to be consumed by Prometheus. So on the left, you have all the official, well, official a couple of weeks ago, exporters but keep on growing all the time. And they're unofficial, they're contributed by the community. Some of them are not very usable, some of them are production ready. You have very powerful things like GMX exporter that will take all your data, the data from your Java application, export it, or the StasD breach or the collectD breach that will allow you to migrate data from other systems easily. They have it already working for Django. For Django applications are very easy to be monitored with this. MongoDB, there's a new relic also, adapter, et cetera. And then the differentiator I would say is the instrumentation part is that they provide libraries for all these languages and they are more being developed all the time. And they usually provide decorators, depends on the language, of course, but in Python, for example, you just align to your function saying measure the time this function spends every time or count the numbers of the times this function is called. And that generally is a metric that then is automatically exported into a web server created by the same library. And there is also a few applications that we come instrumented for Prometheus, in particular ETCD and Kubernetes, which is a Google application. Google is actually embracing this system for their public stuff. And you can also do it on your own. It's very easy because it's just a text protocol, an HTTP server. It's very easy to implement whatever you want. So this is, I think, the most powerful part of Prometheus is the way you can process the data. Because so far I only talk about how to get a lot of data into Prometheus for different things, very fast, very often. But then you have a bunch of numbers that you need to do something with them. So they have this query language, which has some basic algebraic properties and functions, and allows you to do pre-complex calculations in real time for exploring the data, for creating graphs, for creating alerts, or for creating synthetic metrics. So you say, I have this metric that is not very useful in itself because it's coming from, or maybe I need to combine two metrics. But I want to save a new metric. There's a combination of them after some calculations. So you can also do that. And this is stored as, like, any other metric. And, yeah, of course, the idea is that these calculations will trigger alerts and wake you up at 3 a.m. Oh. I don't think you can see anything from that. And I cannot increase the size. Maybe. Well, you will have to believe me or look at the slides later when I've loaded them. This is some metrics that are coming out. Thank you very much. Much better. I hope you can see anything. Still too small. So these are some metrics coming from the node sporter. What it says here is node underscore CPU, colibrates, and then a lot of labels. Defining the CPU number, the instance, which is basically the host name and ports of the sporter. And then a label called mode, which comes straight from proc, which is IDLE, IOA system, user, et cetera, et cetera. These are real metrics that got from some of my servers. And on the right-hand side, you have the number, which is the number of seconds spent on that mode. Oh, milliseconds. Sorry. And you can see this is the data model. It's label name and then labels. And then, sorry, metric name and then labels. And it's not a string. That's something different from other systems. It's very powerful because they are considered different dimensions. So you can perform operations based on the labels without using reg access, which is very annoying when you have to do complex stuff. So you can say, well, give me stuff from the first CPU and mode IDLE. And that's easy to do. So this is an example query that takes the data from the previous slide. And that's some calculations. It says, sum by brace instance comma mode, brace, brace rate of node CPU, square brace, one minute. What does this mean? It means take the node CPU metric that I've shown in the previous slide, take ranges of one minute. And from that minute, calculate the rate of increase. Since the CPU counters are seconds or milliseconds spent in that state per, since the boot time, the raw number is not useful. So you want to calculate the rate of change. So the rate function will give you that. It will basically tell you how much this counter increased per second. So in the end, that will mean how many milliseconds were spent in this state per second for the CPU metric. And then I have this sum by operator, which is an aggregator, which allows me to take different metrics and group them by, in this case, instance of the host and the mode of the CPU and get sums of these rates. So what I'm getting here is that this host in the Czech Republic has spent 0.89 seconds in the per second. So that means 89% idle. And I get the same thing for my other hosts running Prometheus. As you can see, it's pretty powerful and this is like the most basic calculation you can do. You can do crazy stuff with these numbers. And from this data, you create this graph. This actual graph created from the data is the different CPU states in my server. And you can stack it and give it nice colors. And as you can see, there is some pop-up window there. It's because the graph is all a render client side. So it uses some fancy JavaScript library. But it just gets the data from the server and creates the graphic locally so you can export it very easily with your mouse and get the exact numbers at each point in time and all that. It's pretty nice. It's useful when you're exploring things. This is an example from a production server where Prometheus is running. This is a graph of traffic in the network interfaces, the first network interface in a bunch of hosts where you can see clearly the daily spikes and the weekend differences at the beginning. And you can combine as many metrics as you want in one graph to be able to correlate the stuff and deduce sources of problems and behavior. This thing I forgot to tell you, all these queries I was talking to you about can be written in that window, that box at the top. So usually you will get the numbers as you saw in the previous slides. But then you click on another tab and you get a graph automatically from the same thing. This is the Prometheus server interface. This is something you do interactively. But then you can have consoles, which is pre-made HTML pages. You write double, automatically query different things and give you an overview or whatever you want to create. In fact, just HTML so you can do whatever. They'll be handy because this way you can just keep them in source control and version control and deploy them easily and you don't need to do anything interactively for your dashboards. You can include graphs, values, labels, alerts, status, et cetera. There's another way to create consoles with this prompt dash, which is a separate application, separate project, which is written in Rails. So it's awkward to install on one node but they provide a Docker image so you can just get that Docker image and run it somewhere away. It doesn't even need to be in the same server so it's fine. And it's very shiny. This is one with black background and again the console is not great but you can see some really fancy graphs and colors that are monitoring ingested events per second, rate limits per second. Basically, actually, Prometheus data because Prometheus can monitor itself. By default it does actually. So you get all the metrics from the Prometheus server to see how it's performing. So alerting. A monitoring system with alerts is not very useful so Prometheus provides alerts. The idea is that you take one query as I showed you before and then you add some keywords around and that converts true false status in an alert. In this very simple example, you define an alert called instance down. That is defined by, if I should, is this the one? No. Never mind. So it says that if this query up equals zero, up is a metric name that is in fact a synthetic metric that is defined as one every time the target can be reached and scraped for data. When it's unreachable, up goes to zero for that instance. So this query will return a value for every target you have. It doesn't need to be one alert per host. You just define alert for all your hosts or all your targets. And if you say that this value is zero for five minutes, the third line says four or five minutes, then define an alert, add some labels, in this case the severity label which is page. So if there's any email, please wake me up. And then you can add a summary and description and add values from the alert. So like the instance name, so which host is down and you can add the value of the metric, not in this case because it's just zero, but in this example you can get more stuff. Like this one is checking the latencies for HTTP requests. That's a metric, it's a histogram that's created by the client library. You have some functions to create the histograms easily. So it's saying that if half of my request latencies are higher than one second, so like more than a thousand milliseconds for one minute, then something's wrong with my service. And I want to be alerted by that. And here the summary, the description, so it will give you the instance name and then the value of the latency. So it's that useful you get in your page, I was going to say in your cell phone or whatever, you will get a real informative message. There is very interesting blog posts and documentation on alerting, written by these people and some other very intelligent people about how this is much better than trying to monitor every one, each one of your demons. Because you don't really care if one of your, I don't know, SQL instances is down as long as the service is up. But knowing that the latencies are high is something that users can see. So this is a lot better than just monitoring up-down states for all your demons. There is, with this query language, you can as things like compare the current value with the average for the past week and take into account the standard deviation of two, stuff like that. I will not go into that because to get pretty long I don't fit in the screen. I guess with all this you get an idea of what premises can do. This is at the end of my slides, I was talking very fast, so I have time for questions and maybe even show you some small demo questions. Thank you for your talk, pretty interesting. Actually it seems to be a pretty good alternative to graphite. How does it store data on disk? In what format? I'm not sure in all the details, but as I was saying it uses leveldb, which is pre-efficient in memory, no, not in no-demon database for indexes, that means the label names to the actual storage, and then stores, if I remember correctly, one file per time series, and I don't know the internal storage, it was pretty complicated. So it's efficient in terms of size needed? It's very efficient in terms of size. They had done some comparisons with open tstb and I think graphite and some others and it was performing actually very well comparison. On the other hand, what I not find really fair is to compare it against Nagios, because Nagios never has been about matrix and all the perf data stuff is basically additional and I love to use something like graphite besides it, and the benefit of Pormitos would be for sure to have an active collection tool that does it and not relies like some application submitting to it, and so it would be a pretty nice addition to an active monitoring tool for everything. Yes, that would replace Nagios, and graphite, or moony, or whatever you're using, but also the big difference is that the alerts you can create are much more intelligent. Yeah, sure. Because you have access to all the data. So yeah, it's not fair to compare it to Nagios, it's true, but the thing is many, many people are still stuck with Nagios because there's not much alternatives to it, and so that's why I started comparison with that, but yeah, it's a different thing. But it can't really replace it all? Sorry? It can't really replace Nagios. Why not? For example, checking the state of hardware on my IPMI card. On your? IPMI card. The binding hardware state is the rate controller, okay, everything else. But how Nagios checks it? By reading a lot of stuff via SNMP, via tools, and comparing it, and that's much more than just calculating a bit of numbers. Well, you can, first there is an SNMP export that was released a few days ago, and it's still better, so you can actually get all your SNMP data. And also the thing is you can just take your Nagios script and change the interface of that to instead of writing a standard output and an exit code, pushes to push gateway. The same value you get to Nagios and you have the same values. Migrating a Nagios setup to Prometheus is not hard because it can do everything that Nagios does. Okay, thanks. So one of the monitoring systems that we use, which is OpenNMS, you can send syslog information to that and so if syslog alerts coming with a certain priority, you can alert on that. Which alerts you, sorry? Well, if there's a log message in syslog with an error state, for example, you can alert on that. There's an exporter called logM or something. Don't remember, it's a Google tool that was released also months ago that's able to check logs, run regular expressions on them and export metrics for Prometheus based on that. So how many errors you got and information about the error and everything. So yeah, you can get alerts and data from logs. So it seems like this is a very cool tool for someone who has a monitor hundreds of servers and thousands of data, but what if I have two servers with five servers and all I want is to get an email when it's down. Is it overkill to use Prometheus? Or do you think it's the right tool for someone like that? That's a good point. If you want to use it to the full extent of it, it's complex. I would not lie to you. You have to code your alerts and create your consoles and what not. There is work on the way, in fact was one of the people working on that to create some ready to use alerts and expressions you can just import in your configuration. Monitoring your five servers is very easy because you can just install the node exporter, the event package and all of them and just add one line in the configuration for Prometheus and they're monitored. But then you need to define what to do with the data. Yes, it takes a bit of effort. Also, other systems take effort. You have to configure Nagios or Munin or whatever. And probably the learning curve is more steep. So, yeah, it's probably easier to go with Nagios. But I think the extra effort is worth it, especially in the long term. Any other question? Yeah, so I was wondering how new is this project? How has it been around? Well, it has only been announced publicly in January. It's very new. The people working on this have been writing it for, I think, two years. They were using internally a SoundCloud and then SoundCloud decided that well, they asked SoundCloud and SoundCloud agreed to make this free software and in fact the developers have a day job which is 80% working on Prometheus. So, it's pretty good. And they're being adopted quite fast by many new companies. So, I think it's something you can rely on. There are some parts that are still not production ready, like the alert manager. But the main server is stable, works well. I've been using it for a few months. Didn't have any major problems except that I forgot to limit the memory usage and kill my machine. I need details. And so, how good is the documentation? Like, especially for creating the configuration? The documentation is pretty good. This one is the screen server. So, if you go to the main website it has quite a lot of documentation here about every aspect of it, the query language, all the definitions, the data types and how to apply operators. A lot of documentation is pretty complete. There is things that are changing fast. But the main things are stable already, like the query language is stable and the API changed recently but I think it's going to stay stable from there. So, I think it's something you can start using. Maybe you want to keep your old machine system for a while, but I think it's worth starting working on it. Quick one. What kind of server do you need to run this or what are the system requirements? They have some benchmarks somewhere, I think. But it doesn't require much. I'm running my test instance in an atom computer with two gigs of RAM. It's not killing it, except the time I forgot that you should not store everything in memory and eat all my two gigs of RAM and there was no swap, so everything went to hell. But basically, you can fine tune how much to keep in memory and that will define the memory usage and then the CPU usage will be defined by the amount of metrics you collect and how much calculations you do each time you collect. So, it's difficult to say which kind of machine you need because it depends on the size of the things you're doing. I forgot to talk about something. I was supposed to add a slide about that, but it's been released recently and I'm really testing it, it seems to work well. It's Federation. So the idea is that you can run different instances of Prometheus monitoring a subset of your servers or maybe you can even have duplicate setups, so you have two Prometheus servers monitoring the same thing. But then you can add another layer of servers one or two or more that will monitor the other Prometheus servers and get only a subset of the data. So let's say you collect everything in the Prometheus servers and then you run some calculations and you extract the metrics you are more interested in and the top server will only be getting those metrics. So in that way you spread the load and you scale horizontally pretty well. I just went quickly through the docs and it looks like the configuration for the Prometheus is stored in one place which is basically the main server. So when you are in the dynamic environment so called cloud and servers are coming and going how are you going to update this configuration? There is a support for I don't remember how this is called it's over here configuration. The configuration is just a YAML file so you can write it by hand and add all your targets by hand but then you have support for DNS records to get the list of hosts to monitor from DNS or use just files files that you include and you reload automatically so you can adapt anything you want or also you can use console to get the list of hosts and targets and I think ETCD a zookeeper can be used too and just files which are basically roll your own integration for whatever you want. This all involves modifying the files on the server itself, doesn't it? Well depends if you are using console or zookeeper it will just query your zookeeper server and ask what are the hosts that need to monitor or otherwise you have a crunch that writes to a file that the main configuration will import that's the easy integration if you want. So you wanted to say something about this as a problem? Well basically what I'm using right now something is called SENSU it's the software which has been designed for dynamic environments its main communication bus is over the rabbit and queue so whenever the host is coming up it's basically signing up to the queue sending message and that's communication this communication channel is basically for everything so it's for signing up for delivering information and everything. For something like that you can either extend this to talk to your service, you could do it or ask the developers to do it maybe they will be interested if it's widely used or otherwise you write a crunch of every minute or five minutes whatever queries this and writes a file it's a compromise in that sense We have four minutes I'd like to answer more questions When you did the initial packaging which criteria did you use for all this Go dependency either to put it embedded or to create a separate package because I want to package something in Go Packer and it also has this 15 dependency and I don't know where to start with that That's a big problem with Go upstream tends to vendor everything not every Go upstream does that but the permissions people want to have very stable dependencies so they vendor everything into the source tree or almost everything My criteria was to everything that was reusable for any other project that will separate the source package and create a new a new Debian package from that but then I had some dependencies that were just too small or just too specific to be useful for other projects so I left them embedded I guess you have to evaluate how much use other people who have that and also how big is the dependency how often does it get updated for security issues or whatever so some dependencies are better to be packaged separately also there's internal dependencies that Prometheus has I don't know how many repositories of different libraries and parts of the system and I package them all separately so there is code reuse otherwise like every Prometheus tool will have to vendor the same core libraries so yeah mostly I separated only things that were one or two files that were embedded any questions we don't have much time I will not be able to show you much but this is an actual Prometheus server that's monitoring my home instance it's running in a very a very small server and these are actual metrics from now and this corresponds to HTTP requests from the Prometheus server and the push gateway and the node supporters so this is what's from in fact from the sample slide here if I can reach it is this slide here is this query here this something I didn't show you which is using regular special matching on the labels so in this case what I'm trying to do is to only get the error codes for HTTP requests instead of all the counters so I was saying like all the codes start with 4 or 5 basically for 100 etc and I get so I get here here this is a mixture got it wrong sorry executing the query right now it failed they must always fail so forget it the idea was I could show you some nice stuff here I get some data here we'll process it I will not do it now but then once you have the data as you wanted you click here on graph and you get automatically graph and you can see all the graph is dynamic I can change the range one hour two hours one week two weeks I can go back in time you see tell me what happened with it before you cannot see much the graph here I have the labels for all the data I can select one and just show that and nothing else so the graph explorer is pretty nice and I'm out of time so just wanted to give you a proof that this works because all the other content was static so well I think that's it thank you for coming and I hope you will dare to try it even if it takes some effort