 All right. Welcome. This is the Prometheus 2023 ecosystem update. My name is Ben Kochi. I'm a principal engineer at Reddit and I'm also a member of the Prometheus developer team. With me is Julian. Do you want to give yourself a quick intro? Yes. So I am maintaining the Prometheus server and my work is supporting Prometheus customer all day long. So cool. So who's used Prometheus 1.0? Oh, wow. A lot of 1.0 users. That's pretty cool. Who's not using Prometheus yet? Cool. Nice. Well, welcome. What is Prometheus? Prometheus is a metrics-based monitoring system and it's designed to pull data from all of your servers and all of your software, infrastructure, applications, pretty much anything. It provides a basic built-in web server data storage and allows you to do extremely powerful data queries with PromQL. And it can scale from anywhere from sitting on a Raspberry Pi to global hyperscaling. So Prometheus started in 2012 back at SoundCloud where we had already started experimenting with container platforms before Kubernetes even existed. And we had the problem of we've got all these containers, but we can get data out of them with StatsD, but StatsD doesn't actually let us know that the containers are running. So we needed an active polling-based monitoring platform and there really wasn't a good polling system at the time. So Prometheus was created to actively monitor the containers running in the container cluster. And it's been fully open-sourced the entire time. I don't know why it says 2015, but Prometheus was created as open-sourced the beginning in 2012. And we joined the CNCF and released 1.0 in 2016. Prometheus 2.0 came out with a completely new TSDB in 2017. And we were a graduated project of the CNCF in 2018. We've got a cool documentary if you can scan that barcode. Oh, that's a little off the screen. But yeah, there's a Prometheus documentary up on YouTube, it's pretty great. We have lots and lots of maintainers, lots of contributors. And it's a big and growing community of people. So we've been... You want to talk about the community? Yeah, so first of all, one month ago, Prometheus crossed 50k GitHub stars. So that was a really nice achievement for all the community. And basically, in the last year, we have expanded the Prometheus team. So we have added a lot of new members across the different client libraries that we have. So like, we have welcomed the new... We'll be maintainers. We have now an official Rust client library. So we are continuing to expanding the team. And in the coming years, we are also working more on the governance of Prometheus to be able to really scale up the team and the decision making processes inside of the Prometheus team. So that's all the new members that we have welcomed in the last year. So now we can go a little bit over more of the detail, the Prometheus architecture. So the cool thing with Prometheus is it can communicate with your targets. It's an agentless monitoring system. So every application can be instrumented directly, so you don't have to have a middle agent to talk to. So Prometheus client libraries expose the metric data for those apps. And then it can also grab data from your systems and other services and C groups and other data, you know, your data stores. And those have exporters. So that will convert internal systems data like stuff from the Linux kernel, all the data in slash proc. If you've got a MySQL database server, it'll query, make MySQL queries into the internal information schemas and performance schemas, and expose that as Prometheus metrics. And similar, you can get the data from C groups and all the underlying infrastructure. All of that gets collected into your Prometheus time series database so that you can get all of the data about all of the running systems in a single view, which allows you to make much smarter monitoring decisions. So that you can look at, say, hey, I'm getting a lot of requests per second, and I'm using a lot of CPU, so why don't we normalize how much CPU per request per second we're doing and decide if that's good or bad versus just looking at, oh, the CPU is really high. That doesn't really isn't a super useful thing to alert on. Whereas if the request per, the CPU per request per second goes way up, then you might want to have an alert that, hey, something has gotten less efficient. Prometheus also has a concept of called service discovery. Instead of having to program your targets to know where they are and where the monitoring system is, Prometheus can reach into your inventory databases, whether it's Kubernetes or Azure or GCP or AWS. It can read all of your target lists dynamically and then collect all the data. So you have a positive source of truth of what should be installed and Prometheus knows to go and collect that data and provide you the, not just the metric data, but also the availability of your targets. Then when you're done, when you've got all that data, now you want to visualize it. So there's external display platforms like Grafana and the web UI and other automation like Cata. So if you have a Kubernetes cluster, you can use automatic scaling based on your Prometheus data. And then of course, the Prometheus is not just a time series database and metrics collection. It's also designed to be a full monitoring platform. And so you can write your expressive prom QL alert rules based on your service level objectives or however you want to write your alerts. And that will go out to the Prometheus alert manager which will notify your on call that you've got a problem. So Prometheus is watching your systems while you're sleeping. So the Prometheus data model, you want to go through that? Yes. So basically, Prometheus is made to store time series. So each time series has an identifier and then we also have a time and a value for those identifiers. And basically what is really nice about the Prometheus model is that the identifier is actually a set of labels. So there is a metric name that tells you, okay, this is what this number is about, but you also have labels that can help you categorize, okay, this is coming from that part, this is that HTTP code. And then you can actually be really free to use the hierarchy that you want in your own data model, the one that will match your topology. And that data model is built into every part of Prometheus from the service discovery to the query language. And one of the cool things with the data model of Prometheus is metric names are a great identifier which allows you to associate a metric name with a piece of source code. And so it's really easy to know, oh, hey, as a developer or even as somebody who's maybe not familiar with a code base, you're coming in as an SRE and you're like, what is this thing? Well, you can find that metric name for your service in your code base and go, oh, that metric is going because this line of code changed. And so now when you want to get data out of Prometheus, there's a very high power query language that allows you to do complex vector expressions and complex data analysis with a very simple language. And unlike an SQL language, it's designed to deal with all of those labels and all that label manipulation and matching and things. You don't have to do it. It makes joins of data much easier than with an SQL language. So this is an example of a Prometheus query that just enables you to use a query for the partition in an infrastructure with more than 100 gigabytes of capacity, which are different than others. So you can see that oh, in a very simple query, you can express quite a complex question. And this is going across your complete infrastructure because we only care about the one point label in this case. And then we can do a quite easy numerical comparison based on that. And then similarly, you can take those expressions and write alerts. And so basically any result of that threshold will allow you to create an alert for some kind of problem like, for example, 500 errors coming from your application. And so there are a lot of things out there that don't speak Prometheus. And so there's a huge ecosystem of I think at least a thousand or two different bridges that will bridge from other systems into Prometheus. And so there's a huge number of these out there, both maintained by the Prometheus and the Prometheus community, as well as people just writing a thing that says, oh, hey, I've got my DSL router. How do I get the current DSL state out of my DSL router? Well, there's an exporter for that. And so it's really easy to get data into Prometheus from a variety of sources. And the Prometheus client libraries make it also easy to write your own. So it's also important to know that to expose a metric in Prometheus, you just need to be able to serve HTTP. And just text over HTTP is enough to serve a metric to Prometheus itself. So what's new in 2023? So yeah, let's look at the new stuff in Prometheus that we have worked on this year. The first one is the native histogram. So basically, for those who are not familiar with the concept of histograms, it's a way of measuring like distribution of data. And Prometheus used to have like an histogram being multiple time series, which is the sites. The first image is like what we used to have to show the latency of a service. And you can see, okay, you can see that when it's read, it's like there is a lot of requests going on. So you can see at this stage, there are a lot of requests that are below 5 milliseconds. But what the classic histograms, or we call them, they are very limited because each time series has a lot of cost. So we have worked on the new native histogram system that enables you to get much more granularity. So the data that you see in both pictures, it is the same data, but it is using the classic histograms and the new histograms. So with the new histograms, you can clearly see that actually like the requests are, yes, they are below 5 milliseconds, but they're actually between 125 milliseconds and 9 milliseconds. So you get a much better reading of the story of what your services are actually doing thanks to that new approach. And the next thing is because we made them native in Prometheus, it means that instead of being like a set of time series, it is a new object that we store in the time series data. So it is very, very efficient. It enables us to have many values and of really fine granularity in the histograms while keeping the cost almost, I think, even lower than the classic histograms. So you can get about 10 times the resolution of your histogram buckets with the same cost in terms of memory and CPU usage in the server. So it is a huge, huge improvement in terms of if you've ever run into, oh, my histogram buckets have too much cardinality. I have to reduce the number of buckets. Well, now you don't have to think about that anymore. It's just automatic. So this is a bunch of tools. If you want to catch up on the new native histograms, they are also available natively in the Go client libraries and they are also now a GA in the Java client library. And there's cool things you can do with that additional histogram. Then, so another thing that we have been working on a lot in Prometheus, and this is like the technical details, but really the highlight is that we continue to dig into the performance of the Prometheus server itself. So what we have done, for example, in this last year is what we call the string labels builds, which means that we have taken a bunch, the label representation in the Prometheus memory, which was like a third of the memory consumption of Prometheus. And we turn that into a different data structure that enables us to have a lot of memory gains. So if you are using Prometheus from one or two years ago and you just upgrade now to the latest releases, you will find a lot of performance gains just by upgrading the server itself. And we are continuing to working more and more on those. So we are introducing string interning later on in the coming months, which will even further reduce the CPU and memory usage of Prometheus. Yeah, we've basically, over the last year, have done a lot of performance optimizations. And if you went to Brian's talk yesterday, or if you didn't go to Brian's talk yesterday, we've basically cut the amount of memory in Prometheus in half over the last couple years for the same performance, or the same number of metrics. So we are also adding features to the regular Prometheus user experience, like we have now in the alerting, if you have alerts that are crossing the threshold by a bit, or you see that they are flapping a bit, we now have a new feature called keyfiring for which will help you reduce the flapping of your alerts, and which will reduce the noise with the resolved alerts that you might get. So we are still working on the basic user experience of Prometheus to improve it and to make it a lot easier for you to write your alerting rules, because before that you could do it by having complex, promcule expressions, but we also try to keep Prometheus simple and easy for the people who are new to the community, so we also bring some nice features like that. So next to that, we have also introduced a way to split your Prometheus configuration file. So we see that people were using Prometheus and traditionally we had one big Prometheus or TAML file, and almost everything was in that file. Now you can actually decide to have multiple config maps and to just like use script config files to like split your configuration, and this can also help you to delegate part of your Prometheus configuration to different teams if you want to do that. So it also helps with the reliability of your configuration, your points configuration. And what we also have added, but this is very experimental, we have started working on open telemetry and we have added support for a native open telemetry receiver inside Prometheus. So if you have a TLP data, which is really, so at the moment we don't do any transformation of that TLP data, but we have added support for it. So we are really, the story is that we are really experimenting with open telemetry natively in Prometheus, so we want to have a really nice user experience for the open telemetry users with Prometheus. We are also working on native compatibility like supporting like Unicode in the matrix name, so you can have the same metric and label names that you have in your OTLP matrix, Prometheus matrix, your traces, your logs, and we are also working on a new way to support the targeted for metadata that open telemetry provides us. Also again, we are looking for ways that we will integrate that into the PROMQL system and in the complete Prometheus ecosystem to get a really nice user experience. And besides just Prometheus itself, we've also been doing a lot of cool work in the exporters in the community. For example, in the SNMP exporter, one of the biggest complaints and problems was the authentication portion of the SNMP connection was mixed with the actual walks in metric translation, and so we've split that up, and so now you can specify a list of walks and specify a list of authentications. In addition, you can also do multiple walks in the same scrape, and so now you can specify, you can compose your SNMP configurations better. So who actually has to use SNMP? Nice. It's great. Same thing with MySQL. The MySQL exporter now supports the multi-target mode, so if you have MySQL databases hosted in a cloud platform, as say a managed MySQL database, you can now point your Prometheus to a single MySQL exporter, and it will talk to all the cloud databases. Same thing in the Java, we've also released a new Java client with open telemetry support and native histograms. The alert manager has a bunch of new receivers, so you can send your alerts to other services like MS Teams and Discord. The Windows exporter is now an official Prometheus project, so if you've got Windows machines, the Windows exporter is great for that. We are also now taking the bug scrap again in Prometheus, so the bug scrap in a weekly meeting happening every Tuesday at 11 UTC, so if you want to join us and work with us on Prometheus, that's a really nice place because that's where we review the pull requests where we try to edge the different issues, so every week we have Prometheus team members and community members joining that online call and really like that's how I actually learned a lot about what I know with Prometheus is here by joining those meetings and seeing the trade-offs that we have to make in a project. We've also been trying to revive the Prometheus ecosystem call. Time zones are hard, but join us in the community. We've also adopted an Ansible collection, so if you have bare metal nodes and you want to use Ansible, you can now use an official Prometheus community Ansible to deploy. So that's a bunch of other things that are coming now in the Prometheus ecosystem which I hope we will be able to be next year on stage and tell you that we have done all of that, so we are working on metadata on a new UI for each manager. We are improving a lot the remote right, the native remote right of Prometheus and we are adding even more ROTLP. And lastly, next year we will also release Prometheus 3.0, so the 2.0 was seven years ago and basically you can get Prometheus 2.0 with your configuration and upgrade straight away to the 2.48 that's coming out these weeks and it should all work. Like we have not made any breaking change for users, but we want to fix some mistakes to improve a lot the user experience, so we will release a 3.0. So if you have any feedback that you want to offer to the team issue, we are really listening to users and this is a good time for like some bigger changes. Cool, so questions. Come up to the mic. There is a microphone on that side. There is only one over here. Do we have a hand mic we can hand out? Or do you have to go to the podium? Anybody else want to queue up for questions? Yes, and I guess if there's not many questions I can ask. So these are like minor pet peeve things, I just want to get your thoughts and other people's thoughts. The config, so in a lot of cases we're running these as agent mode. So now we've got the stripped down one, these are kind of hard to reach clusters say and it works brilliant. You know agent mode, we feel like the bulk receiver works really well, but we want to push config map updates out, you know, we want to tweak changes. Sure. And we can't reach those Prometheus and there isn't a native reload the config. If it's bad just go back to the other one type scenario. You've always got to use another product. Yeah, we start it. Is that something you think that might get addressed in the future? Because I'm sure it bugs a lot of people. That's an interesting idea. For example, the Prometheus operator has a config reloader sidecar and it's mostly focused around Kubernetes. So if you're running the Prometheus operator you already have this and you'll get those automatic remote reloads with the Prometheus operator, but if you're not on Kubernetes, yeah, there's not really a universal sidecar for that because you know, what would be the option to, you know, we'd have to think about like what data sources could we pull from? Would we pull blindly from an HTTP endpoint and download that and then reload? That would be relatively easy, but you know, there's, do we want to read from NFS? So, you know, there's lots of options. There is a consensus to have automatic reload of the project configuration, but someone from the community has to write it. Oh, okay. So if you just touch the file on disk, there's a proposal, but somebody needs to implement the notifies. And that's it. That's what I'm asking. We're using Kubernetes without the operator and it's just that file's changed and it doesn't bother looking. All right, we'll have a bunch of... Paul requests welcome. Yeah, and then if you push then a config back then you will be able to just reload the config. Okay, cool. And I've one more quick question and everyone else can go. And this is really just a thought and everyone else would probably hate it. But sometimes some of your labels, some of the labels have a value in it. And it would be really nice if in a query you could declare that that's a value and say, sum up these label values. I know that would probably be a horror thing to do, but yeah, I think, is that something that gets talked about? Because that would be really nice. We get that request every few months. I don't think we've actually had any approved proposals for doing that data manipulation at the label level. I think every time we ask for use cases and well, I think if you have a use case, we understand what people want. It will be easier for us to accept that because you know, when you start touching with label values, there is an 100 ways to represent the flow value. So if you could know what people you want to do, then we could find the correct solution for you. All right, thank you very much. No problem. We got plenty of time. Hey, my question is more on the evolution on the client side. So we were using some legacy tools for infrastructure metrics, then replaced with Prometheus client side. And now we have OTL emerging for our logs, traces, and metrics. So on the client side, we seem to be having like Prometheus on one side and we have the OTL. Do you see Prometheus evolving more to be a server side to store metrics in the long run with OTL library in replacing it as the standard one? I'm just a deployer, right? I just deployed to hundreds and thousands of nodes and just looking for that. Personally, I think the OTL metrics collector is an anti-pattern because it ruins the active monitoring that Prometheus provides. But there are lots of people doing and for better or worse, re-implementing Prometheus exporters in open telemetry. We support both what we are also seeing is some client libraries like the Java client library. They already support OTLP and we really want people to buy in into the Prometheus client libraries so that they know that if they need to switch to OTLP, they can also do it without implementing the application again. So we still see a lot of usage for the Prometheus client libraries and we are not giving up on the Prometheus client libraries at all in the future years. I recently spent some time implementing both a standard Prometheus client exporter as well as the open telemetry version of that exporter and I was curious and I think you might have touched on this in the last answer but are there any particular things that you feel like OTL does really well? You're adding support for a number of things in the Prometheus time series space, right? Not the logging in though. But are there any particular concepts that they do really well or perhaps you disagree with and you think you do better or what would you recommend for some of these cases and when to consider one or the other? Sure. So one of the things a long time ago, there's a project called Metrics 2.0 and it had a lot of cool ideas around more well-defined structure of metric data including native support for making types and things and open telemetry has a bunch of the same and it has this flexibility but it also makes it so much more complicated and worse you end up with things in open telemetry like your label values can be multiple different data types. So in Prometheus we very strictly defined label values are strings but open telemetry doesn't have that restriction so you end up with too much flexibility and I have this horrible prediction that developers are going to make this just as bad as SNMP where you're going to have random values that are numbers and strings or strings and numbers and your HTTP status code could be a float 64 and we don't know and so it's going to be interesting to see how that develops. Thank you. Thanks for the talk. I'm Tomei with Sony. So do you guys happen to have like a working group or community meeting for the use cases for the Edge IoT use cases? Sorry, Edge IoT like Edge devices? Edge devices are a really interesting subject. I don't think we have a really big community around that and I would love to see it but nobody in the Prometheus team has really taken up the Edge device IoT mantle. Now what we have now is a Prometheus agent which is a way that you can run Prometheus on the Edge without the time series database. So it is like you have the memory consumption of Prometheus and you can deploy it on a lot smaller devices and then you can push data to a central Prometheus or Mimmy or Cortex. But that's still not as lightweight as say a mosquito collector. Yeah, thank you. Yeah, no I'd love to see an official like IoT like MQTT to Prometheus like Ingestor as part of the project but somebody in the ecosystem needs to maintain that. Okay. Yeah, thank you. Thanks for the talk. So I know high cardinality matrix is something that you hear all the time and don't like to hear about it but are there any plans to make that improve that where you could have maybe you know cardinality of a million labels is okay? So yeah, so the issue nowadays that we see this is what I see with customers. It's not really the high cardinality matrix. It's really like the churn that you can have in a time series. So if you have a lot of time series that just leads for like one, two minutes and you have thousands of them then you have an issue but we are optimizing a lot for the value labels and everything. So high cardinality if you have like fixed time series it's not an issue to have like multiple thousands of label values for one label. We see that a lot actually but really like we are working on that a lot and like what I talk about memory reduction and the fact that we will soon in turn strings will also help a lot with that. So we are really working on that issue actively and it's less of a problem now that it was one year ago and I expect that in one year it will even be even less of a problem. Yeah. No, I generally see you know people running Prometheus, single Prometheus instances with tens of millions of series these days but millions on a single label is not something a lot of people are working on. There's been some discussions about having a different data model for storing very high cardinality. You know single label million time series by completely rewriting a new TSDB that would be based on something more like parquet files but that's all just theoretical ideas right now. And one more question like do you hear reports of when you have high cardinality that generates a lot of time series then the scrape performance gets worse on the client side or putting that together data together on the client side leads to severe drop in performance for the thing being scraped. That usually depends on the client library involved. So for example the Node.js library had a bunch of performance problems but Reddit and a few other organizations actually did a bunch of work to improve the Node.js client library to be better perform to have significantly better performance with hundreds of thousands of metrics per scrape. But yeah it tends to be a problem based on the language involved and not Prometheus itself. Okay so your goal is to keep it at an H2T people text based model only not evolve into like do a streaming or something like that which is a bit easier on the client. We pretty much want to stick to the the scrape model because it provides a nice atomic unit because Prometheus itself is actually an acid compliant database and so we want to have this transactionality of a transaction insert to avoid promql transaction issues because in a lot of the very detailed SLO calculations you want to make sure that you're not seeing partial scrapes in and partial data because that can make your SLOs wrong. Great answers thank you. Yep great talk by the way so as we get to this point where a lot of companies are getting bigger and are starting to want to gather more metrics sometimes people get a little overzealous with what they want to collect how or do you have any guides or any resources that you would recommend to people on how a not to overdo it and b maybe how to handle augmenting what you collect as you grow so as not to let it become unbounded. That's a that's a long conversation. The Prometheus docs have some good guidelines docs on things like Thanos and Mimir also have some very good guidelines on how to scale things. For example if you're in Kubernetes one of the patterns that I've been working on at Reddit is we actually have a controller that creates a Prometheus per namespace to do application team isolation so we have one Kubernetes namespace one application one Prometheus and that way if a team causes some giant cardinality explosion their explosion is limited to their namespace and they only hurt themselves and so that that's helped us a lot. Yeah well what you can also do is we have the Prometheus query log now that we can enable so you can see actually which metrics are used and query it and if you enable that for like a couple of days it can then look okay we only use like 300 metrics so what are what else are you exposing as metrics and you can start analyzing more precisely what you have. Cool. Thank you. That's all the time we have. Thanks. Thank you.