 Just start the exporter. Exactly. Thank you. Hello, everybody. We have plenty of time. I submitted this as a lightning talk and was allocated an hour. So if you have questions or comments or anything, please feel free to interrupt me. And does anybody use Prometheus already? Or is this new for everybody? A few people? OK, you heard of it? Great. We got a Prometheus exporter in FreeBSD 12. So somebody in the FreeBSD world at least knows and uses Prometheus apart from me. So this was a few months ago. I wrote an exporter to make disk utilization metrics available to Prometheus from FreeBSD. And I was encouraged to come here and talk about it. I did a blog post about it and was encouraged to come here and tell you. So since many people probably don't know Prometheus, I'll start out by talking a little bit about what it is and what it does and why it's awesome. And then the concept of exporters. And then we'll get to GSTAT and the actual exporter that I wrote for that. I also have a Grafana dashboard to actually visualize the data collected. And if everything works and my internet connection keeps working, then I have a live demo of some servers at my workplace using this. So it's always nice to see some actual real data. But I'm not making any promises because my connection has been flaggy. So let's see when we get to it. Live demo is always, as you know, a bit tricky sometimes. So my name is Thomas Rasmussen. I'm co-attributing on the internet. I'm born and live in Copenhagen in Denmark. And I do system architecture, build and administer free BST servers. I do some programming as well. And was recently laid off with three months with salary. So I'm just going to fast them and relaxing and enjoying myself at the moment and looking for something else to do. I run a DNS service called Uncensored DNS for 10 years now, 11 years, which is before Google DNS and before all the other public recursive DNS services. I started this as an alternative to the ISP censored, often censored ISP DNS servers. And it's been running since. I also help organize Born Hack, which is an annual Danish hacker camp. I highly recommend going. It is a great time. It's a week with tents and laptops and hacking around with all kinds of fun things. And I have a bunch of minor open source projects on GitHub. I use free BST exclusively on servers. I've used this in 5.2.1, I think, was my first installed server. And on my laptops, I run something called CubeSOS, which is an awesome operating system. I wish it was free BST-based, but it's Fedora and send-based. The concept is you have different virtual machines for different contexts. So I have one for work and one for playing with the Born Hack stuff. And they can really dispose of browsers for surfing. So the VM just disappears. So if you happen to click something bad, then nothing really happens. So awesome concept. So I'm here to talk about this GSTAD exporter. This is actually the first time I've been in the BSD room. There's never been room, but one of the perks of being a speaker is that you have to let me in. So I wrote this tweet and someone picked up on it and recommended I come to foster and talk about it. And here we are. So Prometheus, in their own words, it's an open source monitoring system, dimensional data model, flexible query language, efficient time series database. And that is pretty much it. The whole ecosystem around it is very unixy in that they do one thing and they do it well. Prometheus in itself is a time series database. It really doesn't do anything except collect metrics and stick them in a time series database. And then it has an API to make them available to Grafana, for example, so you can pull the data back out. It doesn't do alerting. It has something there's something called alert manager, which from the same team, but it's a separate piece of software, separate configuration file and stuff, which handles alerting and alert dependencies and which team should get what alert and page and duty support and all that stuff. Monitoring has often, I think, been in many years been, it's been a bit of a, like, there was a void left when Nagios kind of stopped being modern. And I've been using Subbix and a few other things, but nothing has really been, it's like they tried to do too much somehow. It's a big job both to gather the data and visualize it and do alerting and everything. And I haven't really been happy with anything. And I think, you know, like the first time I tried ZFS and thought this is what a file system should have been like all along, where have you been all my life? This is how I feel about monitoring with Prometheus. It is an excellent system. So it is based on, they call it dimensional time series data. It means that your metrics can have labels, they call it. So if you have a metric called HTTP requests total, for example, for a web server, then and you get a number when you scrape the data, say you had 100 requests, then that data can be, can have multiple dimensions and that can have labels to say what the path of the request was or what the HTTP status code was. So you can ask to get only the HTTP 404 requests, for example, and then make a graph of those in your Grafana. This is incredibly powerful. It's actually a relatively simple concept, but it enriches the data greatly. Prometheus and alert manager and all the accompanying software is all written and go. It was started at SoundCloud some years ago. It's been open source since 2015. And there's no company that owns it now. It's a regular open source project with many contributors. And a great community, a great ISE channel and stuff. So if you have questions, they're very happy to help. So this is the diagram of the ecosystem from their website. Central to the whole thing is the Prometheus server. It's a time-sensitive database. And it stores the metrics, of course, on some sort of disk storage. It's a pull model, Prometheus. That's one of the things that some people has to get used to, which means that Prometheus connects to the storage monitoring and pulls the metrics over HTTP. There's various ways around that if you absolutely cannot do it like that, but that is the idea. For example, the FreeBSD sys-sys-tel exporter that came and arrived in FreeBSD 12 is supposed to be run in iNet D or under iNet D. And Prometheus connects to it and pulls out the metrics. So that's what the exporters make available. They open an HTTP endpoint. And it's just a plain text key value list of lines. I'll have examples later. There's something called PushGateway that if you have a job that runs for an hour and then has some metrics to submit at the end of that, you can use PushGateway for that. It has a very extensive service discovery system. So if you are AWS or Kubernetes user or you use one of the many, many cloud services available, there's probably a service discovery thing available for it. At work, I worked on an ISP until I got laid off. We used the file service discovery thing and just exported a list of customers and IP addresses. And that was automatically ingested by Prometheus. So when we add a new customer in our central provisioning system, it automatically shows up and starts pulling the data from it. And to talk to Prometheus and query the data and get it out, they invented something called PromQL. It is what doesn't really have anything to do with SQL. It is a custom query language specifically for this. And there will be examples of this later. And it is a very powerful and easy to use language. And like I said, alert manager is a separate component. So when you add some threshold or something you want to be alerted on in Prometheus, it pushes an alert to alert manager. And alert manager handles all the nitty gritty with who is on vacation now and who has the duty and who does what. So all that complexity is separate from Prometheus. OK, so we generally speak of black box and white box monitoring. Black box monitoring is when you ping your whatever or just poke something from the outside, check if it's listening on port AD or whatever you might do. White box monitoring is when the thing being monitored is kind of in the game. So for example, the previous data exporter is a good example or an exporter for GSTAT data, for example, means you're kind of inside the system exposing the metrics to the outside. And that's a much better way to do it than black box monitoring. It has support for that as well. If you need to, like we do at work, ping 10,000 customers every five seconds, then you do that with something called black box exporter. But mostly and primarily it is white box monitoring. It is supposed to be used with the exporters built into whatever you are monitoring. So if you have a system like our provisioning system at work, it's easy to, for example, export metrics for the queue length for provisioning their customers. The router needs to be, the switch ports need to be configured. And there's sometimes a queue if there's many changes at once. And it's very easy when you're in the code to export this metric and just do a slash commit use or slash metrics endpoint in your system and export it. It's over HTTP, like I said, and it is designed to handle everything you can throw at it. You shouldn't be piggy. You should instrument all parts of your system. Anything that might someday make sense. And even if you think it doesn't, because sometimes these subtle little changes in whatever can turn out to be important. And then it's very nice to have the history. It is very efficient. It is designed to handle hundreds of thousands of metrics on a single server commodity hardware. It doesn't do, it has a little bit of, when you're exploring the data in the Prometheus web interface, it can draw a simple crude graph. But it's not meant for graphing. It's meant to be used with Gravina or something else. But I highly recommend. Excellent integrated into Gravina. There's client libraries in many languages. The GSTATIC code I wrote is written in Python. And you start out with these 10 lines or something from the Prometheus example client. And you just add name, define your metrics, saying we have a, in GSTAT we have a disk IO, busy% message metric or whatever and an inner feed data into them. And it takes care of listening on HTTP and serving the data up in the right order. And it is easy to use. And they have, of course, in Go and many other languages, client libraries. So it is easy to get started writing your own exporter. So it is, like I said, very efficient. They say about three and a half bytes per data point. And it is remarkably efficient when pulling in data. We replaced our Sabrex installation, which was thrashing the disks using Postgres as a backend. I love Postgres. I don't love Sabrex, but it was a decent setup. But it was, I mean, we could barely handle polling them once per minute. The 5, 8,000 customers or however much we had back then. And when we switched to Prometheus, we're polling them once every five seconds. We're doing additional metrics besides what we did with Sabrex and the server is bored now. It's not doing nothing. So we went from absolutely thrashing the drives to the server being almost idle. It is amazingly efficient. If you go beyond what one server can handle, it supports sharding and federation. So you can have many Prometheus servers working together. And it does that very well. And Alert Manager as well supports high availability stuff so you can send an alert to any number of Alert Manager instances. And only one of them will actually alert you. At work, the Prometheus installation I was managing there had about 750,000 time series and about 13 and a half thousand per second metrics put into the database. And that's small in Prometheus terms. That's not an issue at all. You can do that on a laptop or on whatever. In a jail, all my stuff runs in jails. And you can easily have one for each team. It's not like managing an elk instance or something like that where you have to just wave up a new if you want one for the network team and one for the operations team or whatever. You can easily do that. OK, so the query language from QL. I brought a few examples. And it's easy to look up. It has a page or two. And then if you understand basic statistics and math, then it should be easy to use. First things first, HTTP requests total is the example metric we're using here. So that was the internet connection that stopped working. So that just returns the metric HTTP request total. If, for example, I want the metric, but only if the job label is API server and the handler is slash API slash comments, then I put it in the curly brackets and it only returns the number of HTTP requests that had those labels. It could also return a range vector so you get all the values for a given, for example, for a five-minute interval if you need that. You can use regular expressions for the labels. And you can do first your matching and stuff for the label values as well. I use those labels a lot in the gset exporter to save stuff like the disk serial numbers and stuff. We'll get to that later. A bit more advanced, it has some keywords to do, for example, it can calculate wait over five minutes for this metric, or even do a quantized 95th percentile is used a lot when charging for bandwidth and hosting operations and stuff like that, or wherever you need to use it. And it is very flexible. It could also show the number of alerts, firing, and when the installation grows, that's the sort of stuff you want on your front page of your Grafana instance. You can have zero or five alerts or whatever. OK, so Prometheus connects to an exporter and gets fed metrics and ingests them. And these metrics are exported by exporters. Machine metrics like the normal metrics like RAM and IO and network traffic and stuff like that are exported by something called node exporter, which we have in ports, and it works well. But on Fibia C, it doesn't have disk utilization metrics. It does on Linux, of course, but not on Fibia C, which is why I started out, because when you're troubleshooting stuff on a database server or something, it is very nice to have metrics for disk utilization. There's exporters for every type of software you can almost imagine. There's hundreds of them. Each is allocated a port. On this list, you just go to the Wiki and allocate a port. And for the G-Stat exporter, I got port 92, something. But that's a good list. And you can search for FreeBSD on it and find a few others. There's a Jalex exporter, which is very nice, that somebody made that uses RCTL to get IOPS and memory use and stuff for Jales and then exports it in a way so Prometheus can ingest it. So you can get graphs to say how much RAM each Jale is using and how many IOPS and stuff. That is very nice if you're using Jales at all. OK, so I guess you all know G-Stat. It looks like this. It is a top style thing that shows the GOM devices. And read and write operations per default it can show deletes and other stuff as well. I actually have no, wait, I don't because I'm not online. OK, so this is G-Stat. And these are the numbers I was interested in getting into a graph. And that's why we're here today. There's a Dash-C flag for G-Stat that makes it output in CSV mode. It's called in the man page instead of the top style display. And unfortunately, a bit of a complaint would be that when you enable Dash-C, it also enables endless mode so it keeps running. You can just make it output values once and exit. So that should probably be added. Because usually in exporters, when Prometheus connects, you scrape the data and then you return it. But I've had to keep it running endlessly. The G-Stat and just streaming the output from it into the collector. Also, if G-Stat could somehow export the counters, it bases the top display on. So these have the difference between counters and gauges. And we lose a bit of precision because I'm reading the G-Stat values as counters. It shows me how much there's an update frequency in G-Stat. So every five seconds or every one second or something. And stuff that happens in between, let's say every five seconds, is kind of lost. And if you had the counters instead, Prometheus would be able to, it doesn't matter if you read every five or 10 seconds, then you would get all the information. And I'm not a C person, but looking at the code, it looks like it is counters when it comes out of the kernel. And somewhere it is converted to the human-friendly display that we see in G-Stat. OK. So with G-Stat, we really have everything we need. With the dash C mode, we have everything we need to write an exporter. Like I said, the client library, Prometheus client library, is easy to get started with. The G-Stat exporter is 240 lines. It sounds like more than this because it's formatted with these code-beautifying tools. And they are very generous with the new lines. So it's on GitHub. And yeah, port 9248 is the G-Stat exporter. It's not that they don't add them to services files everywhere because they're so dynamic, and there are so many of them. But they do have the page on their website or the wiki on GitHub to keep track of which ports are used by what. OK, so running the exporter locally and grabbing for DA0 and grabbing for write, this is the output for the G-Stat exporter. As you see, it's just plain text. And this is just fetch outputting to STD out. So it's plain text response. And it's just lines of a metric name and then labels and then finally a space and then a number, which is the actual value for that metric with that combination of labels for that specific time. As you can see, I've stopped all the info I can get from GOM. It's called GOM Identifier or something. There's a GOM command that can return RPM and all these neat metrics so I can get, for example, this exercise and RPM and the serial number for the disks and the size, which is very nice because, as we will see later in Grafana, it makes it easy to filter, for example, and show only the NVMe drives or only the spinning drives or only the 12 terabyte drives or whatever you need. OK, so as I said, it doesn't visualize stuff by itself. But the integration to Grafana is really good. It can, once you add the Prometheus instance as a data source in Grafana, when you start typing a metric name, it autocompletes it and it really works very, very well. And you can filter dashboard by label values and I'll show you what I mean by that later. But that, for example, means that I can easily, in Grafana, just select one disk if I'm suspecting that one disk is foggy or select only the 5,400 IPM drives across all servers if I should, for some reason, have the need for that. I published the dashboard at the Grafana dashboard published in place. And if you don't know Grafana, I highly recommend it. It is very easy to make, it makes pretty graphs for you. Like, you don't have to decide on colors and stuff. It uses colors that look well together and generally just works extremely well. OK, I'm going to need to fix my internet connection. So talk about yourself for 30 seconds while I get this, just work for the love of, sorry, as I would if I could. It's a long and boring explanation why it doesn't work. But usually the phone just works. But of course, right now, it doesn't. Yeah, thanks guys. But it's not going to work. This is great advertisement for Linux. It is a bit. But it really isn't. I'm glad I want that this might happen ahead of time. Yes. OK. So maybe we skipped a live demo since it's not working. I'm sorry about that. I'll talk a bit more and then try again in a few minutes when it's time to recover. I do have some screenshots in the beginning that I can go back to. But oh, yeah, thanks. It was a pretty quick project. I did it in a day. Started it in the morning investigating how it would do it. And then I wrote the code during the day. And in the evening, I did the dashboard. And then I wrote a blog post and posted it. So it was in the scope of a day, you can write an exporter for something, including doing a nice dashboard and making everything work. If GSTAT gets support for outputting just a line and the next sitting, I'll probably change the way it calls GSTAT. So it doesn't keep it running perpetually, but just calls it when it is scraped. And maybe add flags to support if there's an effort GSTAT can show consumers as well as producers in the geostag and some people might need this for reasons I can't really. The Grafana dashboard needs some fine tuning. And if anybody wants to, if anybody feels like helping with that, then you're very welcome. I'm going to get this one quick final attempt. Well, I have it open fortunately, so we can show you some of the things I was talking about. The labels I add to the labels I added to the metrics can be used at Grafana to make this nice little table in the top that shows this is in the top. We have the filters that Grafana makes it possible to filter by label values. So this shows server names, and this shows the different geomes and the different disk sizes we have and sector sizes and RPM and stuff. This is where, for example, I didn't know we still had a 5,400 RPM drive, but after I installed this, then I got a bit better overview of the entire. And it's a nice way to see, for example, serial number. If you know what this is performing poorly, you can easily see the serial number and ask your on-site hands to fix it. And other than that, there's sections for each of the metrics geomex ports. So there's a latency graph for read, write, and delete latency. And there's a bandwidth section for, of course, it can't. Trust me, there's bandwidth graph behind if I was online. There's an IAPS section and the Q depth. This is showing 30 days. I've had this running for a few months since I wrote it. And one of the fun things is that if you've ran a very large set of S pools, you probably know that the scrubbing can take almost forever to complete, which means that many people with large pools end up doing like a nightly scrub job that just runs it through the night and poses it in the day, in the working hours. And that's a fun, this is the busy present. And the scrubs are these, this is 30 days. So you can see we scrub all weekend and then every night. And then all weekend and every night. So it's a fun pattern to observe. And we found a bad disk in one of the servers using this. I can't find the data now, of course. But it works really well to get an overview of the disk utilization. And that's actually, that's all I wanted. So that's very nice. Any questions? We have plenty of time. Yes? My OCD screams at me that the media size should be just a number without the string. Yeah, but it's exported directly from the string that the GOM info exports. So I could cut it off. And then so Grafana, that it's a byte value and it would humanize it for me. You're right, that would be easy. Grafana knows, like it knows units. So this graph, I've told it that this is a number of seconds. And then it shows all the scales, the y-axis, to show that this is milliseconds, of course, not seconds, fortunately. So and it understands like megabytes per second and stuff like that. So it is very lovely to work with. Yes? Have you had any trouble with Prometheus or Grafana because you're on previous D instead of Linux, like things like in persistent D or stuff like that? Apart from it not exporting disk utilization? No. Oh, right. The question was, have I had any problems running at Grafana or Prometheus on previous D because it's previous D and not Linux? And no, apart from the node exporter, that's the thing that exposes system metrics like memory and disk IO and stuff like that. Apart from it not exporting disk IO and not exporting a ZFS data, then no, everything works very well. The ports work perfectly and they are updated timely and run very well. The latest version or the upcoming version of NodeExporter does have a previous D ZFS support. And I'm very much looking forward to that. And combined with this, then you can get a great impression of how your storage is doing. Yeah? Yes? The first one is you were touching keywords like alerting and federation and this kind of stuff. So my question is about these servers what you were scraping. They were only scraping endpoints or they were scraped and processed at the same site? Or you had one central Prometheus server and then scraping all the endpoints? Yes, one central server that connects out to, you configure it, tell it which targets to scrape and then it just sucks up all the metrics that are available at that endpoint. No, I mean, you would think, but no, it just really works very well. It's a bit of a radical, both the pull model and the idea that it does the scrape as it connects. So if Prometheus doesn't connect for a while, then there's no data collected. But it really works very, very well. And it turns out that it doesn't matter too much as long as you use counters. It would matter for GSTAT exporter because you would lose the precision. If you reboot your Prometheus server, then you would lose the five minutes or whatever it takes. But if you have counters available, then you don't because it doesn't matter if Prometheus has gone for a while. It'll figure out how to, what's the word I'm looking for, put the change in the client. Yeah, exactly. It'll smooth it out. So it works very well. And people are doing some, like the big hosting operators who have some 500 Prometheus servers doing millions and millions of data points per second. There's some crazy big instances. For my users with 100 servers or something, I know it's always nice to have to know that it can handle way more than you need. Well, the alert manager is a separate jail. I mean, it could be a separate server if you wanted. Yes? That's right. Yeah, it is stuff like this. It is InfoSec. It is also hardware, open source. And it really brought, if you've been to the German CCC camp or the Dutch big hacker camps, it is very much inspired by those. So it kind of reminds a little bit of first time, except that people live in tents. And there's not as many people, of course. About 400 people, I think, this year. And it's slowly growing. It's the fifth year this time. So it's an annual event. And it's getting traction. It's a lovely little event. So any more questions? Yes? Absolutely. Have you used? No, not really. I haven't used it enough to not set it up. I've only briefly been exposed to it when somebody else had configured it. So I don't know enough. But you know how you can sometimes sense that something has momentum? Like everywhere I look, Prometheus exporters are popping up like natively inside whatever, NGNX or whatever suddenly has a way to explore. And that tells me that I'm not the only one who feels that this is the right approach. And that's very nice, because I think we have been, I've been looking for something nice for months or for years and years. I've never been really, really happy. Even back when Nagius was the thing to run, I've never really been very happy with it. I think everybody really kind of loved to hate it. So I am so, so happy that finally something, and it also, I mean, the aesthetics are also very important, because even though technically it's the same numbers if it's an ugly graph or pretty graph, but it just still makes a difference. It's nice to work with. One response to that. So just looking at influx while it's nice to work with in at least some aspects, Prometheus has much, much more advanced math available. Like for example, in influx, it's not possible in a single query to get, for example, you get the return codes of HTTP requests. In influx, it's not possible to get what proportion of requests where, say, errors. It's possible in graphite, it's possible in Prometheus, it's not possible in influx. In general, what's the difference this one? Sure. Any more questions? I have a quick question, because I'm rarely surrounded by a lot of free BSD people, so I just, oh damn, I went offline. Oh, and I can't even show it. Okay, well the question, the problem is, you know, the jail exporter I was talking about, which uses RCTL to export jail metrics, and then you can do graphs for, for example, memory use inside a jail. So I use that, of course, because I use jails for all my stuff, and it's very nice to see if a disk is being trashed. You can easily see which jail is causing this. Postgres doing an auto vacuum, or what's going on. RCTL counts shared memory twice in jails, or as many times. I have, what I have in this tab that I can switch over to, because I'm offline, I have two jails on this server. It has 128 gigabytes of RAM, I have two Postgres jails, and RCTL tells me that one jail is using about 130 gigs, and the other jail is using 80 gigs, which is about 200 gigs, and there's 40 more jails, and it's so clearly something is wrong, and I've been able to track down as much as, it is, I don't know what the fix is, but it is absolutely misreporting memory use in jails, and since if I have 100 Postgres workers and each of them is using two gigabytes of RAM, then, and most of it is shared, what you want RCTL to export is how much memory would be freed if I shut the jail down. I don't care if 100 workers is using two gigabytes, it shouldn't report 200 gigabytes of memory use. Does anybody know, is this just a bug, or is this not possible, what I'm asking for, I mean? If memory is shared between jails, are you even able to tell? Not between jails, between 100 Postgres workers in the same jail, for example, and then each of those, they have access to the same shared memory space, and it's been counted 100 times, because there's 100 workers. So it tells me that one jailer's using 140 gigs of RAM on a 128 gig server, which clearly is not true. It would, but we can talk about it later after. I just, it's been bugging me because I finally got some nice jail graphs, and the memory usage is just way, way off. I was like, what the hell is wrong? How can it be using 300 gigabytes of RAM when it has 128? And RCTL exports the V-memory use as well, and that's, of course, a higher number and higher than the, it's 1.2 terabytes of memory, or V-memory, I think, but the memory use should be, we agree that it should be however much would be freed if the jail was shut down, right? Otherwise, what's the point of reporting? I mean, you see what I mean? Okay. I think I'll open up PR and write to a mailing list and see if I can catch somebody's attention with that. But other than that, the jail exporter is really awesome and stuff like IOPS, and that works very well so you can see which jail is trashing at this goal or using a lot of CPU. I think that's it for me. We're done with a quarter of an hour to spend, so you can use that time for self-reflection and self-improvement. And thanks for having me and I hope you enjoyed first time and my talk. And yeah, thank you.