 Hello, everyone. OK, sorry about that. We had a bit of an echo here. Hello, everyone, again, and welcome to another episode of Open Observability Talks. I'm your host, Totan Horwitz. And here at Open Observability Talks, we talk about anything DevOps, observability, and open source. This is actually the closing episode of 2022. And I just got the end of year podcast statistics from Spotify, so I'm glad to share that we have listeners from over 40 countries, with around 270% increase in followers, and several hundred listeners ranked us in the top 10 podcasts. So first of all, thank you so much for joining me on the show and for following us and giving us high ranking. The show is available on all the popular podcast apps, Apple, Spotify, Google, and more, as well as on YouTube. So if you are not yet a follower, go ahead and join us. I'd also like to thank our sponsors, Logs.io, the Cloud Native Observability Platform. Logs.io takes the best of read open source projects, such as Prometheus, Open Search, and Jager, and offers them as a unified observability platform built for scale. For those joining the live stream or on YouTube or Twitch, feel free to share questions and comments on the chat. We'd be more than happy to take them into the fireside chat. And with that, let's move on to today's episode. When I was at KubeCon in Detroit, I was amazed by how many updates were in the Prometheus and the Prometheus ecosystem, and then a few weeks later at PromCon in Germany. So I decided to ask Julien Pivotto to join me to discuss all these updates. Julien is a maintainer of Prometheus. He's also a co-founder of Oli, a company that provides support for open source observability tools, such as Prometheus, Thanos, and Profana. And he's definitely the authoritative source for all things Prometheus. So let me invite Julien into the live stream. Hey, Julien. Hello, hello. Glad to have you here on the show finally. Yes, it's glad to be here for the very last of this year. Yeah, perfect ending for the year. As an anecdote, two years back, we had Julius Volz, Julius here on the show back then also giving an overview. But so much has changed that, as I told you also before, we can just treat it as a clean slate and just share with everyone the latest and greatest. So maybe if you can share a bit about you and your background so the audience can know a bit more about your authoritative source. Yeah, so I have been working since 2011. And I have been always focused on open source in my work because that's the way I like the software I'm using. And I have been contributing to several open source software. I have also been involved in the automation world with tools like Puppet and Foreman. And since 2017, I have also shifted towards Prometheus and monitoring. I already did some monitoring before, but I started with Prometheus when I see the potential. And from 2017 to 2019, I was a contributor to Prometheus. And I joined the Prometheus team in 2020 to become the Prometheus server maintainer in 2021. And now that's where I'm working on. So I'm working most of my time on Prometheus, supporting customers and developing the software further. And also, I'm involving the community of several exporters. So it's a really fun project to be in. And that's where I am now in my career. Amazing. And actually, this is also a very special time for Prometheus project. It's a 10-year-old project. In fact, just a few weeks ago in November, 2022, we celebrated 10 years to the first commit in the Prometheus repo. So that's a significant milestone. The project itself is still a bit older than that, but you get to select a date. And that's the actual date that we have, because other days are lost in time. So yeah, we celebrate 10 years since the first commit of Prometheus, even if the world only knows Prometheus since 2015. But yeah, Prometheus is quite old and mature project now. As you know, it's the second project in the CNCF, which is also a great achievement. And by coincidence, it's also a great code, just like even if this is a great word. So yeah, 10-year-old project. And they are both the first project in the CNCF. So it's quite nice to see all of the history happening. Yeah, it's amazing. So maybe before diving into the latest updates, let's zoom out a bit and look at Prometheus as a project. We don't need to do the historical review, but maybe as you perceive it, what's the current, let's say, mission statement where the project is heading in the macro, like in the major items that you see the project heading to and evolving? So Prometheus is a monitoring solution. It has always been and it will continue to be the monitoring solution, which means that when we have to make trade-offs, we always choose the monitoring way. And we want it to be like that. I also want the project to continue to be very simple to use, which means that, for example, when I'm giving a training about Prometheus, the same training should work over the years because I want the basics to always work the same. And that's very important that when we add complexity, we can still have new commerce coming and using the project quite easily. And that's a key point of Prometheus that it is really easy to set up. You just need to start the binary. You have your server, and then you configure it the way you want. Now in the way that it is evolving now, we are actually having a lot of users from many different areas. So we used to be monitoring for infrastructure. And now we see that we are monitoring a lot of different things because we just take a lot of data and we enable you to query the best way that you want. So we have seen people doing wind turbines. We have seen people doing their own monitoring, like electricity consumption and everything around that. So we are really versatile in what we can offer and what people are doing with a project, which is also really nice because as an open source project, you should be just free to use it in the way you want if you want to use it in production, in your kitchen, in your basement. Anywhere you want, you can run Prometheus. It's available for everyone for every purpose. What we are also seeing in the future is that we are also evolving with the community and the world around us. In the last Dev Summit, we have also decided that we would also move towards ingesting an open telemetry matrix natively. So it's not like we are evolving in our own corridor without looking at what's going on around us, but we want to continue to stay relevant and to stay best in class monitoring solution. Yeah, it's funny when I saw the update on adding the trigonometrical functions that just showed me how different use cases that we might have not imagined originally for Prometheus that are less common maybe on the IT infrastructure, but are much more common in other use cases. Just to show you that you get some requests for features that just expose all different domain of use cases. So I think one of the most impressive things for me in Prometheus is an open source project and maybe even a de facto standard in many instances is the rich ecosystem. The fact that many frameworks, many tools today provide out-of-the-box, expose their metrics in this format, and they provide all sorts of integrations. That's, I think, the charm, because essentially whatever you have in your stack, if you draw Prometheus, it's very easy to start collecting metrics. So let's start. I think it's a good point maybe to start covering where we stand with Prometheus and the updates there. Can you share with us maybe about the service discovery elements that have been added and where you're heading in this respect? Yes. The first point that we had is regarding scraping and service discovery was an agent board. It's not quite recent. It was added, I think, two years ago. But it enables you, if you only want to use Prometheus to integrate with projects like Thanos, Cortex, Mimir, you just want the remote right. You can do it, and you don't need the local time series database, which means that your Prometheus server will use only a percentage of the RAM on the disk space and on the resources that you would need otherwise. So we listen to the user. They say, OK, we have a global solution. We have a provider that will just ingest a Prometheus data. And we just agree that, OK, let's do something upstream so that you can continue using Prometheus with a lower footprint. You have to do some trade-off, but you can just, we can lower the footprint of Prometheus. If you need to run it only on one node, if you need to just run it on the edge of your cluster, you can just do it with the agent mode. So that's further the overall improvement that you have done of service discovery. So we have a completely new mode of operating Prometheus. Then regarding the ecosystem, we have added a lot of service discoveries. The last one is the OVH Cloud Service Discovery. We also had a lot of different cloud providers coming to us, wanting to support further. I, on the top of my head, Linode, Volta joined the project to integrate their monitoring solution, their discovery into Prometheus. We are working with Oracle as the next one, Oracle Cloud. And moreover, when discussing with the community and when looking, OK, what are people using as sidecar for Prometheus that we could integrate upstream, I came over Netbox. Netbox, which is in Python, and it was not easy for them to have co-binaries to build inside Prometheus because there is no client really. So it's not as Prometheus the goal that we maintain a client for Netbox. And so instead, we go the other way around and we say, OK, we will have an HTTP Service Discovery. It will be generic. And we can just query any HTTP endpoint and get the targets out of those endpoints. Before that, we had only the files of the discovery, so you needed to write a file on the Prometheus server, which is not super handy. And that's why we added that new HTTP Service Discovery. And that's a very important point when I am trying to try Prometheus. So one is that if you are using sidecar next to your Prometheus server, let's look together, do we still need that sidecar or can we integrate upstream? As the project is growing, we want to meet users need. We need to remove the barrier to use Prometheus. And really like, we want to work better for you, work better for the users. So we are really like working with the community to integrate all those different use cases directly in Prometheus. We want to reduce the sidecars. We really want to make it easier to use Prometheus. And it's also always better when you run a monitoring solution to have less moving pieces, you know, because then if you have a sidecar, you need to monitor it. You need to do anything about it. So just like we are integrating a lot of things in Prometheus to make it easy for you to use it. I think that point that you said is also very important the emphasis of the project on the productization and the ability to make it as easy as friendly as possible for integration in IT environments with all the constraints. I think one of the other examples that it relates to the rich suite of service discovery options available is that there's a lot of them. But then again, obviously for a specific environment, you don't need all of them. And then the new plugin system lets you compile Prometheus without any unnecessary service discovery plugins. So you can only compile it to the size that is relevant for you and keep it clean. Maybe you have to say a word about this new plugin system. Yeah, so back in 2021, I think of 2020, well, a user came in and said, okay, I want to help on one issue in Prometheus, which is instead of the config package of Prometheus depending on all the service queries, I want the service queries to depend on config and to register themself. And that enabled us now to, when we add a new service discovery, you can decide to obtain or opt out of the service query when you compile Prometheus. It is also possible to use a note of three service queries. So if you really want to use the service discovery in Prometheus in your way and you want to not to share it publicly or we cannot integrate it upstream, you can also just compile your own plugin and seamlessly integrate into Prometheus. So we have a plugin system at build time that enables you to build in or out service discovery plugins. So some of the tests have shown that if you only pick one or two service queries, you can gain up to 60% of your disk usage. Wow, that's amazing. That's really impressive. You know, we ran into the details of the service discovery, but maybe you should take just a quick introductory word for those are not familiar with the system. Essentially, service discovery being the way that Prometheus discovers its targets or the components in the architecture for scraping or for pulling the metrics. Can you give us just a quick word about the context of service discovery mechanism? Yeah, so service discovery actually, it's a big selling point for Prometheus because when you have a monitoring solution, if it is not aligned with your infrastructure, you get a lot of full sol arms or you don't see some services. And service discovery means that Prometheus can automatically connect to the source of truth in your infrastructure to know exactly, okay, what is in the infrastructure, what should be there, what should I monitor? And when you will unregister or register new services, the monitoring will be adjusted. So for example, we natively connect to Kubernetes, to console, to a lot of different cloud providers, to Docker, so we can do it to a lot of different sources that actually know your infrastructure. And so you only need to change those sources and directly the new targets will be scraped by Prometheus. And Prometheus came out of the box with a lot of different service discoveries and we added more than 10 in the last two years, I think so it's really growing a lot, we see a lot of demand, and it's really fun to manage and to personally work with all those cloud providers to test everything out so that when you come to Prometheus, we already have support for your cloud provider. And on top of that, not only do we fetch the targets, do we decide which targets we can monitor for you, but you also have the flexibility to add some labels to your target. So if you monitor virtual machines, you can have a label with the data center they are in, you can have a label with anything you want basically, if you have some tags applied to your virtual machines like many cloud providers in ML, you can select the one you want to monitor based on the tags. So we really offer a lot of flexibility in the way that you fetch your target and that Prometheus knows, okay, I know I need to monitor that target. Yeah, that's amazing. And actually one of the extra flexibility that you released recently in the project is the ability to even have a scraping interval being adjusted because maybe not the same interval fits all of the types so we can actually have an interval defined per specific labels, right? Yeah, so what's important is that we want to make it easy for you to decide, oh, you want to monitor each target and we also don't want you to do 10 different API calls to your cloud providers. So we are bringing more capabilities to the feature called relabeling which means that based on a set of initial information you get from the cloud provider or for your orchestration layer, you can decide, okay, that target I want to monitor in a certain amount of time. So now you can monitor, you can change the script interval and the script timeout for each target in the future. That's amazing. So, and you mentioned just before about the agent mode I want to say word for our audience is less familiar. So essentially since Prometheus has so many facets to it and so many capabilities, the auto discovery, the scraping capabilities we just talked about and then again also the time series database that we're going to talk about and others. So the agent mode is sort of a lightweight version of mode for Prometheus that takes away the database piece which is obviously very heavyweight in case there is a backend database where the Tados, Nimi or Cortex or any other that is Prometheus compatible so that they need the Prometheus piece solely for the scraping for the fetching of the metrics. That's the agent mode that you mentioned that was released one or two years ago and is gaining a lot of popularity for this type of architectures, right? Yes. Great. And people are just using a lot of different backends to store the metrics that Prometheus is fetching it's really amazing to see all of these backends working together to get the best out of the Prometheus protocol. Actually, that's why I think the title of Prometheus ecosystem update is adequate because a lot of the innovation actually or the new stuff happened also and the surrounding peripheral elements around Prometheus which is also enabled by Prometheus's capabilities for remote write, remote read and other APIs that allow for this integration. So I think it's definitely a point of strength for the project. And if we talked about, we mentioned the time series database. So maybe first, if you want to say a word about what it means that the time series database and specifically what Prometheus offers and then we can go to discuss some new stuff that was added there. Yes. So Prometheus stores this data in a time series database. So it is basically blocks of data written every couple of hours. And those blocks are basically immutable and then we compact them together to optimize the displays and the query time and that kind of things. So this is all managed by Prometheus. You don't have to maintain anything. The only input that you have is, I want to keep that amount of data for that much time and then Prometheus will just deal with that all on its own. So it's a database you can only query with Prometheus or and yeah, it's very like tail or made for Prometheus. It's built on top of the Korea paper from Facebook. So it's very efficient at compacting the data and encoding the data in a very efficient way. And we are still looking into ways to improve that, to improve that further. So it should still improve in the future the way that we create those blocks to make them smaller and easier to query. And then one of the nice things that you added recently is the ability to backfill data to do out of order ingestion in case we need to fill in some gaps of information, right? Yes. So there are many good cases. So the easy way to do that is by using the remote write protocol, which you can see basically, you can see that as a replication protocol for Prometheus basically. It is the way that Prometheus will write to Thanos, to Mimir, but you can see that just as a replication protocol. If you use it between two Prometheus servers, they will just replicate the data. We have added backfilling data, which means that if you want to fill in data from the past, you can do it with the TSTP now and then the data will be nicely put into Prometheus blocks later on and can be queried. So this is still very early, but it brings new use cases or new migration parts. If you move from another monitoring solution, you could use that to backfill some of your data and generate Prometheus native block in a different way. So this is a nice feature to have, which means that you can start monitoring and then maybe fill data from the past more easily. Previously, you had to create blocks with some adequate tools and put them in the TSTP. Now we are bringing that directly to the API. We are also adding next to that backfilling support for native histograms. Basically, it means that you can now calculate your percentiles more easily with more accuracy. Which means that you don't need to define in advance the buckets that you want. So a histogram basically is you have buckets and you will see, okay, I have one of my query which is coming. It is taking nine seconds. So I'll put it in the bucket, nine, eight, six and zero up to zero seconds. And then you can see, okay, how did I reply to 99.5% of my request? And then you get a number. And the new histograms enables you to have very precise number relative to the old way of histogram we were working. And it's really nice to see that improvement coming and that Prometheus-DSDB can store more complex types than just metrics, which is also more efficient because we don't need to repeat all the labels 10 times in the time series database. Now we just have an object coming on and it's a lot easier to see. And it's really important not for when you want to calculate SLOs, when you want to really know what's going on, it is always better to have the closest estimation to the result if that's available. And it comes more or less fluently to Prometheus now. You need to use the great client library and then the rest will just work automatically. There is also some adjustment to the query that you make. But at the end of the day, this should work pretty well for most of the users. So I really recommend you to have a look at those native histograms because they will really help you and avoid things like calculating averages, that kind of things, because they will become a lot more practical than just trying to guess your latency. You can get the correct number now. Can you maybe just for those who are not familiar, what did we have so far to make clear what's the improvement here for those who are not familiar? Yeah, so currently when you define an histogram in Prometheus, you need to say, okay, I want the bucket and then you need to name the bucket. So like I want the bucket at once again, two seconds, four seconds, a second. So you need to know upfront what's going on. The thing is that when you have an actual incident, well, who knows the bucket that you would have needed to put in, right? Because some incidents are clearly out of what you could expect. And then the only thing you've alert will tell you is that, oh, the request take more than 10 seconds, but you don't really know like if they are blocked forever or if they are 30, 11 seconds. So it's really important that we remove that kind of prerequisite that you need to define your bucket at some point because also like when you respond pretty fast, you don't need that 10 second bucket, right? So those native histograms, they make a big change regarding that one. And it's really a great added value for everyone in the community to just have better data. And that's the key point here is that we want you to have better data. And it's pretty early on with the native histograms, right? In terms of the maturity, they want to say where it stands and what's the timeline for releasing that. So they are now an experimental feature, but there are multiple people working on them full-time. Well, or at least a last check of the time, it was out in the latest release 2.40.0. And there have been quite a few bug fixes on that release. So now we are at 2.40.7. So a couple of the releases fixes some bugs with the native histograms. So they are already used by people that noticed some issues. So it's really nice to see that the community is already engaging with them. I don't know when it will be really available for everyone, but we are working on that, definitely. Sounds good. And another thing that I found very, very exciting is the topic of exemplars. Maybe just for our listeners, the ability to... We talk about observability a lot here in the podcast and a lot of the aspects of observability go beyond just monitoring by the ability to correlate the monitoring and the metrics with additional types of telemetry to get a broader picture. And exemplars being the tool to actually correlate from the metric to respective exemplars or samples, shall we say, of let's say traces or others. Would you like to say a word about what we're aiming to achieve with Prometheus and where we're starting with this feature? Yes, so exemplar, basically, when you have... When you will see with your Instagram that you have a spike in your response time, you have a check exemplars and that you can enable on the Prometheus UI or on the Grafana UI, and then you can see, okay, at that moment, I have an example of a request that took a long time, and then when you click on it, you are brought back to Yeager, for example, that will tell you, okay, this is the full trace of query that took a long time, and then you can start seeing, okay, where did the query took, why did the query took so long? And then you can actually see, okay, it was in this piece of the span, and then you can really investigate directly, so you don't need to jump from one tool to another and trying to guess, because you have the correct exemplar attached to the metrics with the correct time, same and a great value. Yeah, so that's great. And obviously we see that not just with Prometheus, we see that also with open telemetry that adopts it as part of the specification, and others, I think this is an essential paradigm for getting these signals together, enabling the correlation, and the baggage, by the way, we should say, it's essentially an identifier. The example of a trace ID is a classic example, but it doesn't necessarily have to be a trace ID, right? It doesn't have to be a trace ID, it's very often used as a trace ID, and there are also some technical limitation with the number of characters load, but basically as you have an identifier, you have a type step and you have a value. So essentially I can also use that to correlate to sample logs, for instance, right, or other types of telemetry or reference data. You could use that, but I think there is an 128 bytes limits on the exemplars. No, I mean ID that will help me. Oh yes. To relate, obviously not the raw log, but again, using ID based to jump to the relevant sample logs, but yeah. Yes, but ideally based when you look at the open tracing, when at the log, at the trace, you should also get the logs embedded into the trace for that specific span. That's true. If we follow the best practices of embedding the contextual logs within the spans, many organizations, unfortunately, are not there yet. So yeah, that's an important point to say. The next part that I would like to talk about is the PromQL, this is the query language that Prometheus developed natively and actually back at the day was pretty novel because everyone was used to graphite at the time or similar that used a hierarchy model, hierarchical model and the PromQL came with a labeling system that was much more flexible, better fit for high cardinality metrics, for doing slicing and dicing ad hoc queries by different dimensions. So could you give us a bit of the background on PromQL and where we're standing with this aspect? Yes, so PromQL basically, well, you have the Prometheus data model, as you have said, which is metrics or labels, which are key values. And then you need something to query the data that you have in PromQL is really simple to start with. So you don't need to know a lot to start getting the rate of a Cp request, for example. It is quite, it gets sometimes to get to know about the details when you want to be very specific, which is why we have some tools to help you, which we speak about just after this. But the idea is that it is a powerful query language to just get your data out of your server and not just like the raw data that you can do. Relevant calculation that you can extract really like what is important to you and your infrastructure. And those labels are also fully integrated to the PromQL in the way that PromQL is working. So you can indeed select based on some data centers, based on some nodes and also do the aggregation like I want to do the Cp request per data center. This is all possible in PromQL. What is changing in PromQL in the future and now in the future is that we adapt to new use cases. So we have new functions used by the scientific community. For example, we function like the sign function, which gives you one or minus one or zero, depending on the sign of your data. We have trigonometric function as we have discussed. So we are bringing new function to promises that we think are useful for the users. And it's really important to us to continue expanding the language without also adding everything that's possible, of course, because we still want to be able to know and to support the users. And so it's not like we've added hundreds of functions, but we really try to keep on with the usage that people are doing with Prometheus and trying to integrate what is useful for the users. And that's, I think, something that is very prominent in the way that the project has been running, that is very driven by the community and the actual use cases and what's being put to use. Even things that, as you said, you and I maybe have not anticipated in advance that when you see again and again, for example, the trigonometry, then yes, there is a demand. There are valid use cases that apparently are more frequent and popular than we assumed. And the community serves that, I think that's amazing. So you want to say a word, by the way, about the ability, the support for offsets, maybe in the next... Yeah, so we also added support for negative offsets. So offset means that when you will select your data, you can offset the selection by an amount of time. And now we have added support for negative offsets. It's not really a big thing, but it's the kind of small quality of life improvements that we make over the years. That's like when you start using committees and you see something you want to try to try it out and it doesn't work, it's quite frustrating. So we also try to bring those more improvements to the community. The next change is the app modifier, which means that you can have more complex queries. So if you want to, for example, graph the top five CPC application that every CPU is trying right now over time, you can use more complex queries to do those kinds of things using the app modifier. So you can select, you can do a selection of metrics based on a calculation that you would do at the end of your current query. So it's really like a lot of different improvements that we are doing to the language. That is completely optional, so you don't have to use them. You can still do your perfectly simple rate function if you want. But we are also present for the power users that need more advanced queries. And that don't want to rely on features of, for example, Grafana to throw their own features. So the app modifier was possible to do in combination with some complex things on Grafana side, but now we also have the same capabilities directly in your promo code function. So it's in one place. And if you use the Prometheus console, Grafana or another tool, you will still be able to run the same function and it will give you the correct answer. Yeah, sounds good. And we started mentioning the ecosystem around like Grafana and others. I think one very important recent update in the Prometheus ecosystem is the contribution of PromLens to the Prometheus project. We talked about the power of PromQL, but sometimes it could be a bit complex to build complex queries just textually and having some visual aid and UI could be significantly beneficial. And that was the project, PromLens that Julius Volz from PromLabs developed for quite some time. And now it's been contributed, open source and contributed to the Prometheus as a native citizen of Prometheus by Julius Volz, PromLabs and also Chronosphere. So first of all, kudos for them for this very significant contribution. Maybe you can help us with some information about this. Yes, PromLens is a query builder from which you use the query explainer. So basically you type your religious query and it will run it and explain you what it is doing. One of the more, one of the nicer skills of PromLens and a lot of people that are using PromLens have got that issue sometimes is that you have a query and it does not return anything. If it is a complex query, it's not always easy to figure out why it's not returning anything. And that's where PromLens can help you because when you will type your query, it will actually not run your query at once, but it will run all the different parts of the query and tell you, okay, at that step of the query, I still have 20 results. When I compare to that other things, oh, now I have zero result as the output. So you can easily find which part of the query is generating output and at which point the output is dropped. So that's the main use case for PromLens. It's really understanding, okay, why don't I have anything outside of my query? And I think it's kind of a game changer in the way that you can work with problem queries that you don't understand or that are more complex because directly you see, okay, I have a zero there. I know that's where I have to look at. So what's on one side, what's on the other side, what doesn't match. So it's really nice to see that coming upstream on a produce and also some part of PromLens will probably be directly available inside PromLens. So not everything, for example, in PromLens you can share your queries, you can do a lot of things but the main paths to debug your problem queries are probably coming in the following months inside the PromLens server. Nice, that's important and it will be available in terms of the UI as part of the Prometheus UI. Yes. Excellent, so that's actually, we haven't talked about Prometheus UI itself. So everyone knows Prometheus from this infrastructure elements for scraping for time series database but Prometheus does offer UI, many use Grafana and others but it is important that I think there are quite a good few changes that were made in enhancements whether they make it friendlier like a dark mode but even more something more significant like auto completion of the prompt your syntax so do you want to tell us maybe a bit about things that people might not be aware of that are available on the Prometheus UI? Yes, so the Prometheus UI has seen a lot of changes so we've simply changed the complete UI in the last few years to go to a React UI based UI and what is really important is that now we are working towards the performance of the UI like for example the target space now is not loading all in one but we are loading every job simply for each other because some people are using Prometheus with crazy numbers like they run Prometheus with thousands of targets and we want to provide them with a fluent way of using the UI. The new UI also brings direct support for completion of prompt queries based on the metrics that you have in your Prometheus server so you can type start your query and it will give you suggestions or complete your metrics so it's really nice to see that you are really helped a lot when you write a query in Prometheus now. We have also seen some third party using the same library that we have been put together to do the auto completion because Prometheus is supported also by some cloud provider tools now so it's really nice to see that used more and more and that it's becoming easier to write prompt queries. We also have the dark mode as you mentioned which is really well it's an interesting story actually so yeah okay there is a dark mode button but it was contributed by the community so not by someone from the Prometheus team and it's a kind of query when you see the first time you are like yeah this will be too much work it will never like be finished or it will never start and then the community is working together with the Prometheus team and at the end it's the kind of feature that happens even if at the beginning you really wonder like will there be someone to maintain that because it's quite, you think it's easy but it's actually quite difficult to get everything correctly so it's really nice from the community to have support for that kind of features because you know Prometheus is still a very small project when you compare it for example to Kubernetes because we have a much lower focus on monitoring only and we try not to do it in every direction so it's really a small project when you look at the big open source project out there so yeah it's really nice to see that the community is much here enough to produce that kind of output. And we mentioned before we mentioned Grafana a few times Grafana is I think maybe the most popular visualization front end tool for Prometheus it's not part of the CNCF stack it's provided by Grafana Labs as an AGPL licensed open source and I know that many use that but on the other hand maybe people did wonder what is going to be the CNCF's native solution for that so do you want to help us understand where things stand and where we're heading like the Percy's project and others? Yeah so basically when you look at Grafana it's a really nice open source project but it is not part of the CNCF and it is managed by many a single company which decides what they want to do with their trademark what they want to do with their software and so just like Prometheus it's a multi company project I think it is nice to see open source alternatives to that visualization tool and yeah we have seen the rise of Percy's recently it is still early on it's still working progress but once again we are speaking about Chronosphere because they are partly sponsoring the project and it's basically a visualization front end for at the start Prometheus but it will expand to other backends as well so basically the idea is they want to also address some of the issues of Grafana for example the dashboards as code or the GitHub story of Grafana which is at the moment not completely figured out yet so they want to start from a clean sheet and say okay how do we design a solution that enables native integration of GitHub that works well with Prometheus and that is under open governance that is completely transparent and managed by multiple companies and maybe at the end only to the CNCF currently it is under the Linux Foundation directly but who knows what the future will be for Percy so I think it's really important to have some choice when you are using Prometheus that you can not only rely on one visualization solution but that we can work with many of them and maybe even make some sort of an agreed open specification that can enable moving between tools I think that's one of the things that was mentioned I was exposed to Percy primarily in the recent PromCon when I saw the talk and although it was a short one it got me intrigued so I think it looks like a lot of potential there and definitely a project worth following up on I think the key thing is that you should have the choice of the solution that you use and it's also something that is always in my mind and has always been in the mind of the Prometheus maintenance that I think we have never designed a feature specifically for visualization solution in particular we have always tried to be agnostic to that and really tried to avoid some kind of caching mechanism that would be very specific that kind of thing so we really tried to keep an open API on Prometheus in the way that if you want to integrate with it you can just do it and we now see that the API of Prometheus itself is replicated in Thanos Cortex Mimir they use the same API which means that it is very flexible and it can break you a lot and I think this is the power that's why I said that Prometheus for me goes beyond just a very successful tool but also a de facto standard in many ways because of this approach that turned the Prometheus APIs to some sort of a specification that others can so you don't need to have Prometheus per se you can work with other open source projects or even closed source alternatives as long as they provide the same API support obviously the query language support and I think another very successful example that emerged from the Prometheus community is Open Metrics the exposition format that is now to its own standalone open source project under the CNCF that standardizes on the way to expose metrics irrespective of Prometheus obviously Prometheus is a big consumer of that but it's now no longer just with the focus of Prometheus in mind so you need to remember that in the world that we live today every company is a software company everyone is running software everyone is doing any kind of development anyone is relying on those stuff and if we can bring the same pattern if we can bring useful pattern useful way to use your data to monitor your infrastructure whether it is with Prometheus under open source solution it is a win for everyone because everyone is it and even if Prometheus we have thousands of users there are still many many more that are not using any monitoring at all at the moment or very like path monitoring from the 90s and if we can bring those patterns to the user whatever the project it's a win for everyone yeah definitely another very important element in the Prometheus stack is the alert manager that provides obviously the alerting function on top of Prometheus can you give us a bit of background on this and the alert manager its uses and where it currently stands so the alert manager is a core project of Prometheus and it is the part that will actually make sure that you know about the alerts so it is the part that will knock to your door and say oh you know you have a server that's down and it can talk to you via it cannot actually knock on your door unless you write what to do that but it can if you have IoT it's wrong, how about IoT it can use Slack, PagerDuty, VictorOps and a lot of different integrations and we try to bring more of those integrations natively like the three last integrations that we have added we had it Telegram, Discord and now we are adding support for WebEx Cisco WebEx which is used by an alert manager user they want to get rid of their third-party integration to be natively integrated into the app manager so it's really nice to see that we are now integrating all those new receivers directly into the alert manager we are also adding support for time-based muting for receivers so if you have a non-call team they only need to receive that overnight you can also do that natively in the alert manager now and we are also adding a quality of life improvement for negative matchers when you can apply a negative matcher so instead of saying I want those labels to go to that receiver you say okay everything that's not deaf goes to the on-call team so alert manager is getting a new release this week actually so at the same time we are also working on memory usage so if you have a really short repeat interval like for example the upgrade of this week will also bring you a significant memory reduction if you are using an alert manager so we are also working very hard on the alert manager itself to make it more useful and also adapt to what users are actually doing with it sounds great I want to say a word maybe about exporters as well if you want to share the new windows exporter and others but let's first maybe get a bit of background about exporters in general and then we talk about the new additions yeah so one of the strengths of Prometheus is that it records plain text so you can use Curl in your local browser and see the list of metrics and any software that can expose an HTTP endpoint can expose metrics we have not done that for MySQL yet so MySQL cannot expose metrics natively so in some cases you need a small piece of software that can in one side talk to your software and on the other side talk to Prometheus so that's what is called an exporter and it's also the case for operating systems for example the Linux system did have an HTTP server in Linux 2.4 but that's a long time gone and it could not expose metrics anyway so you need those kind of exporters to be like very knowledgeable about the business and on the other side you can convert that to Prometheus metrics that you can relate in the exporter world we have a community community project when people can share their exporters and then it will be taken off by the community and instead of being under one single user a GitHub namespace we put them on the community community page if there is a need for maintainers for example it's a common place for the exporters one of the most popular exporter that we have here is the Windows exporter so it has gone to a long journey from WMI exporter to now Windows exporter it can actually do more than Windows because it can monitor your MS SQL for example that is running on Windows so it's really an exporter if you have Windows machine Windows services have a look at the exporter that exporter has been worked on by a large community of users and we are working towards making it an official exporter so it will be completely under the Prometheus governance so it will be better aligned with the goal of the Prometheus project it will be easier to talk to the maintainers and it will also be easy for the users because they will directly find it in the download page of Prometheus next to the exporter for Linux which is the Node exporter they will directly have the Windows exporter available for them another exporter that we are improving is the MySQL exporter we are bringing multi-target support for the MySQL exporter which means that you will not need to run one single instance of the MySQL exporter for each database but you will just be able to run one for ten different databases if you want so that's also again with the maturity of the project we can do those kind of quality of life improvements for the administrators that are using Prometheus sounds good and again I think it's the mindset of supporting production use cases so the one that runs with a cluster with many databases and still want to use Prometheus they need an easy way to aggregate and this is a very useful one I wanted to say before supporting production environments and maybe enterprise grade use cases there's been a need for long term support and I think you and your team jump into this gap and took charge of this can you say a word about the long term support? Yeah, so like we have mentioned we released 2.40.0 with native Instagram and then they are seven different patch releases in less than six weeks right because the release was less than six weeks ago so when you make that kind of breaking changes in the code they touch a lot of different course paths and even if you have a lot of tests in Prometheus there is always with new features new risk coming and Prometheus is releasing every six weeks and for many so everyone is a software company but not everyone is an agile company so upgrading every six weeks or every week is not that easy for everyone so we have decided to take a certain Prometheus release and to say ok instead of releasing every six weeks we will just like keep that release available for a year so it was six months we have extended that to a year so we have LTS version of Prometheus that enterprises can use and we are going to support it for one year completely so they will get bug fixes and security fixes for a year so you don't get all the shiny new features but when you are in an enterprise anyway it's not always about a new feature it's more about the reliability and the release that you take so we are it is completely upstream so we work with the community, we work with the government donors you don't have to pay to use it of course it's completely on the Prometheus website but it's available for one year completely supported release of Prometheus we will if you want the new release you will still have to wait for a few months and then you put another LTS release and you will be able to jump from one LTS to the other quite easily so even if I'm creating Prometheus it should always be worry free it should be quite simple to upgrade but still getting a LTS release is really helpful for those customers they don't need all the shiny features they need the working Prometheus all the time so that's the target and there are quite a few users that are using it already I'm quite happy with the outcome with the LTS release and we are working to bring a new LTS in the coming months so it's really nice to see that coming with the maturity that we can also listen to the users like what upgrading every six weeks here you know in my company it takes two months just to get a VM so what are you talking to me about changing a release Yeah it is important to remember that not everyone is the cool child on the blog that moves agile there are large organizations who want these large organizations to feel comfortable using Prometheus and these capabilities and support is very important to them so it's really great to hear that Even what I noticed as a maintainer is that sometimes we have some quite impressive bugs and they are only reported like one or two months after the release and you are like Oh many people are eating that bug and just not speaking up in the GitHub issues not telling us about it so yeah there is a really long time before you publish a release and people actually use it in production that's where the LTS also helps that we know that this is a quite stable release we know all the bugs we have fixed and hopefully if you cannot upgrade easily just pick the LTS version it will be really best for you Yeah that sounds like... Many have dreamed about this for sometimes great to have that as part of the upstream and a way to go to you and your team for actually taking this, that's amazing I want to maybe in the bit of time that is left talk about another important part of the ecosystem around Prometheus which is and you mentioned that a few times over the discussion Thanos, Cortex, Mimir sort of the ecosystem of long term storage let's call it that provides sort of a back end scalable back end for Prometheus I think the original intent was that or the original let's say constraint in a way was that Prometheus by design is a single node you mentioned the simplicity as a core core value in the community so obviously it has it has its limitations with vertical scaling and when people started looking for solutions for horizontal scaling and clustering this is the place where these solutions came to be Could you help us just understand a bit this space and the players in this space? Yeah so basically you can scale Prometheus vertically we have seen people actually scaling a lot like we have seen people with more than 1TB of RAM Prometheus with many millions of time series but still, yeah Prometheus, as I said we are a monitoring solution and if you introduce some kind of distributed system it's a lot of new traits of you need to deal with and it's a lot more complex and so we have the remote write protocol that can write to a lot of different back ends some of them are based on databases you have prom scales that will dump your data into PostgreSQL storage engine you have Cortex, Mimir and Thanos they were in the Prometheus way of dealing with the data which means that they will write the same blocks as Prometheus will write more or less and they will use external solution to store it like STB which takes a lot of maintenance away which makes it a lot less cost-full with regards to the storage that is needed but it's a lot of trade-offs and inside of the CNCF there used to be two major players there Cortex and Thanos they are still there but now there is Mimir which has been forked out of Cortex into Grafana Labs a GPL license and GitHub so the future is quite interesting for the long-term solutions we are seeing a lot of customers running Thanos some of them running Mimir and we also have a few that are running Cortex so it's really interesting to see all of that will evolve in the future because every solution is kind of trade-off and it is in for a certain public and it's interesting to see what will happen in the future because in this case it's actually the downstream project and we are just seeing what's going on in the ecosystem and we just try to work as best as we can with the other players so it's interesting to see the evolution that will come out of that you mentioned the community and I think it's important to mention for our audience, the different places the Slack, GitHub, events, IRC you want to mention where people can join the conversation around Prometheus? if you want to join the conversation you can go to Prometheus.io you have all the information about the Slack, the IRC channel we have some events so if you go to you have the conference as well and if you need help on Prometheus you can join the mailing list or the the forum that we have so just find the discussion results we are always open to the discussion also importantly if you are interested to develop and we also have developer summits which are completely open and online so you can also join the conversation there we really want to expand Prometheus also the team and the community more diversity, more people more voices that's very important to us to know exactly how we can improve the project and stay relevant because it's not like I want to live in my own bubble but we really want to expand and to welcome everyone so Prometheus.io Slack discuss that's great source I highly recommend I highly recommend PromCon the one that was in Munich in Germany was excellent and also around the KubeCon we had the Prometheus day on the KubeCon North America we'll need to see the format for the next KubeCon North America maybe it will unite with some others around observability but generally speaking if you are in one of these conferences an excellent opportunity to see people also face to face and get the vibe of the community a very vibrant community and lots of things going on also maybe Julien if you want to mention how people can reach out to you after the show yes you can reach out to me at www.o11y.eu so you can also find me on different social media or on github you can also check on my company www.o11y.eu so I think there will be a link somewhere in the description where you can just find us if you need to have some Prometheus support we can help you with that so we ran out of time I actually decided to skip the breaking news for this episode because we had so many interesting things to cover with you on the chat that I didn't want to waste any of the time so apologies for our listeners but I'm sure it was worth the while and I also will add for the show notes the PromCon EU 2022 talk that Julien provided I'm also putting it here for the YouTube and Twitch audience so it will be also on the show notes and other references that will help you get going with both the updates that we mentioned briefly and also on getting in touch with the community and getting involved because it is an amazing community Julien thank you very very much for joining me on this episode thank you for the invitation I already wish you the best for next year and for the new year so thank you Dota for welcoming in the show thank you very much we need to do more about Prometheus not wait so long before the next update so I'll definitely keep in touch and make sure that we cover that more adequately and with that I'd like to also thank our listeners for joining our episode as always all the episodes are available on your favorite podcast apps and also on YouTube so do check it out if you're here on the live stream and if you're not on the live stream if you are listening on the podcast then do know that we stream the episodes live on Twitch and YouTube live so find all the details on our Twitter page at openobserve where live streams occur or follow me at Horvitz and obviously also share your comments, your suggestions especially for the new year for the 2023 if you have any suggestions and things that you would like to see or if you have something that you would like to talk about that you're a subject matter expert in and you can share with the community whichever feel free to reach out in whichever way thank you very much for listening on the 2023 first episode of the year thank you very much everyone