 Welcome, everybody, to my talk on lessons learned from running Prometheus. Sadly, Lily was supposed to join us today, but she couldn't make it sadly. But we'll go ahead anyway. I'm Frederick. I'm a one of the maintainers for the Prometheus project for almost four years now. And even beyond that, I've been a longtime user of Prometheus. Yeah, so before we kind of dive deep into all the operational things of Prometheus, there may be some folks that aren't familiar with Prometheus and the talk is totally suited for those folks as well who are just kind of trying to understand Prometheus as a technology and trying to decide whether to buy into it and whether to use it a lot. And hopefully with this talk, I can kind of make everyone who is already using Prometheus more comfortable with using it, and those who aren't already using Prometheus kind of convince them that Prometheus is the right choice. So on a very high level, Prometheus is an open source monitoring system. And one of the first things that people always talk about with Prometheus is the pull-based model. A lot of monitoring systems are push-based so that your application kind of pushes the metric to some central system. Prometheus kind of turns this around and gathers metrics on a periodic basis from your application. And we'll keep coming back to this pull-based mechanism during this talk and we'll see why a truly pull-based monitoring system makes a really great fit for operational purposes. And just another interesting side fact, Prometheus was originally created at SoundCloud by folks who had previously worked at Google and kind of knew a lot of the systems inside of Google. And that's why Prometheus is very heavily inspired by a lot of the things that they have experienced with at Google. So on a super simple level, let's do a very quick example of how Prometheus might work in a real system. So we have our Prometheus server, and as I said, it's a pull-based system, right? So we have our target, let's say an HTTP application that serves some API. And Prometheus comes, every scrape interval is what we call this, and collects metrics from our target over time, and it does that until eternity. And every time it scrapes this from our target, it takes the metrics and inserts it into its internal time series database. And that's really the Prometheus in a nutshell, right? Obviously, there's a lot more to it, and we'll talk a lot about that in this talk as well, but to kind of level the playing field and make sure that everybody's on the same page of Prometheus. Let's dive into a couple more basics. So the way that the scraping works is, as I said already, Prometheus goes and scrapes every 15 seconds, for example, but it does a little bit more. Because Prometheus knows which targets to scrape, it can automatically add some metadata about these targets, right? In this case, I have a web server which has metrics about the HTTP request it has served, and these are counters, right? So these are ever-growing, monotonically increasing counters, and Prometheus, just every time it collects the metrics, gets a snapshot of those, right? And so over time, this counter grows, and Prometheus sees that growing over time as well. And so once Prometheus has scraped this, and that's kind of the representation that is presented to Prometheus on our left side here, Prometheus adds this metadata. In this case, I chose to just add the instance label, but there you can add other kind of arbitrary metadata represented as labels. Prometheus adds this, and in this form then inserts it into its internal time series database. And why this is really powerful is because this is essentially what people talk about when they say Prometheus has a multi-dimensional data model. Because with this data model, Prometheus was created. Prometheus is the query language that was specifically designed to work with time series and ultimately for alerting purposes. So that because the monitoring system is ultimately there to make sure that our system are reliable and that they actually behave in the way that we expect, right? So I have another example on my left here, and you can note that these sets of time series that I have here are coming from different targets. So this could be an example of, I have two processes that Prometheus is scraping every interval and collects these metrics. And we can see the instance label changes here, right? And why this becomes powerful with the multi-dimensional model and with PROMQL, we'll see with a couple of examples now. So let's say this is my PROMQL query, just HTTP request total code 200. And what this does is it returns me the time series that matches this exact kind of slice of the data, right? We can now, in a very powerful way, slice and dice our data and get exactly the answer to the questions that we're asking. So in this case, we get all the results from all the instances because we ask for all the time series that match the HTTP request total metric name and the code 2XX, right? But we can take this a lot further. We can even sum these, right? And now the results kind of because of how sum is an aggregation, there's actually no resulting. Label set, but we get a resulting scalar. So just a number, right? Which in this case is 72 because that's the sum of the previous kind of samples that we had seen. But we can do even more powerful things. And I won't dive into all the things that you can do with PROMQL, but I'm just kind of trying to show you how this multi-dimensional model works. But one more example that I think is kind of cool is you can even say you want to sum by some specific label and group those, essentially. And in this case, we would then get in our result set this by label. So we have the 2XX, 3XX, 4XX, 5XX. It essentially looks for all the label values that exist for this and then sums their values, right? And this way we can actually get a global view of our entire infrastructure and how it's doing, right? Because ultimately monitoring is about understanding the way that your users experience your system. And that's really most powerful when we have a holistic view over our entire infrastructure and not just individual processes, right? We really care about how our user is holistically experiencing this. And kind of ending our short Prometheus intro. Alerting also is super simple in Prometheus. We kind of have the basics of how data is gathered and inserted into a time series database, as well as how querying works. And now alerting kind of becomes trivial because the way that alerting works is not how a lot of traditional monitoring systems had worked, which were kind of, you wrote a check and if a check fails, you alert, for example. Prometheus works on evaluating queries against the data that it has in its time series database. And again, there's no magic in Prometheus. These rules are just evaluated at some interval, which also defaults to 15 seconds, just like the scrape interval. So these are kind of, this is a very basic introduction to Prometheus. There's obviously a lot to it, a lot more to it. But with this, we can kind of now build the operational experience and understand why Prometheus is so awesome. And I want to show you a very quick example of the Prometheus IO website. When you visit this website, this is one of the things that you see on the front page. And in particular in this talk and from now on, in this talk we want to focus on this part, simple operation. And we're going to dive deep into why Prometheus is so popular with its simple operational aspect. So in some way, we could have also mentioned, called this talk, what makes Prometheus simple to operate, right? Instead of lessons learned from Prometheus. And one of the things, and this is literally something that the Prometheus IO website said in that tile. And it really is a very simple thing. Some people just don't even think about it, but I've come to appreciate this a lot. And a lot of people kind of get hooked on Prometheus because it's so, so easy to get started with. Prometheus is a single set of links binary. So all you do when you start trying out Prometheus is you download that binary for your architecture and your operating system. And Prometheus compiles for pretty much everything that Golang compiles to, which is almost everything that exists out there. And this makes it absolutely trivial to get started and you get super quick wins. And kind of you start to experience Prometheus in a super quick way. And I think that's really powerful and one of the primary reasons why people get kind of hooked on Prometheus because it has all the functionality in this one binary and you get it super quickly started. Another thing that I think is often overlooked and I've only kind of started appreciating this over time is it has very, very strict rules about how it's configured. But that also makes it incredibly consistent. And when you want to configure something, you kind of because of these consistent rules, you always know where to look essentially. Anything that you can reconfigure at runtime is in a configuration pile. And you can reload these configuration files either by sending a signal to the process or calling an HTTP endpoint. And it will automatically reload everything. And that can be alerting rules that can be all kinds of configurations. And configuration that will last for the entire lifetime of the process is configured via Slack. So there's really, those are the only two things that you need to think about when you configure Prometheus. There are no environment variables, nothing like that. Everything is either in a file or in a flag. And as kind of a side effect of that, there's a really awesome and concise reference documentation on the Prometheus website. Because it's so consistent, that's a very easy thing for us maintainers to keep up to date and kind of keep precise and concise. So now we kind of have, those were kind of the simple things about operational aspects of Prometheus. A lot of other systems could replicate this. But ultimately, and this is where we keep coming back to the pool model. This is why I think Prometheus is so popular. Resource usage. And to understand why I think that's the case, we need to understand resource usage of Prometheus. Now, this is literally the Prometheus that I among others operate at Red Hat. And as you can see, we have, this is a relatively big instance, right? We have thousands upon thousands of time series in this Prometheus running. And we can see CPU usage is actually quite low. There's something between 0.1 and 0.2 CPU cores. But we have something between five to six gigs of memory. And one interesting thing to note here is kind of the sawtooth pattern, right? And we want to understand why that is. And this is one of the first things that people, when people run Prometheus, they kind of see this and think, oh, memory usage is kind of continuously growing. Maybe there's a memory leak, right? But this is totally intentional. And let's, I want to explain why this is the case and why it's actually a good thing. And to understand it, we need to dive a little bit into the time series database that we have introduced in Prometheus 2.0 back in 2017 already, actually, in three years, almost already. Time flies. The way that this time series database works, again, this is compiled into Prometheus. Part of that single statically linked binary is kind of very much modeled like a lot of other modern databases are models. There's a right of headlock where essentially there's a right of headlock next to the conceptually next to the headblock. The headblock essentially receives all the inserts into the time series database, right? We accumulate those for two hours. And we always append all these rights to a right of headlock so that if Prometheus or the entire node that is on crashes, we can kind of recover from this and pick up just where we left off. But the interesting thing that I mentioned here is this two hour block. So we accumulate two hours worth of data, and then we flush it to disk. And this is really important because once we flush this from disk, it's no longer fully in memory. It's actually memory map. M-MAP is like a kernel system call. And a lot of you are probably much more familiar with the intricacies of M-MAP than I am, but for those who are not familiar with it, essentially M-MAP is a way to tell the kernel, make it look like all of this data that is on disk is in memory for me. And you, the kernel, take care of kind of transparently loading this data from disk into memory and evicting it as necessary, right? So why this is super powerful is now we can essentially assume all this data is in memory, which obviously our disk tends to be much larger than the amount of memory that we have available. But it seems like it's in memory, right? And so the kernel can kind of optimally make use of all the memory available. What this ultimately means is that for queries, for example, if there's a lot of memory available, we tend to reply to queries super quickly because it's already in memory. But even if there's not a lot of memory available, it'll just take time because the kernel loads and evicts memory all the time, but it'll still eventually reply. And that's really, really powerful. And most useful about this is that we can roughly assume that this entire part of the database, everything that we flushed disk is roughly of constant memory usage for us. And that makes essentially the writer headlock and the headlock the interesting thing about memory usage, because the M-map part stays roughly the same the entire time and we can largely ignore it. So memory usage actually comes from this two-hour pattern. And now we already kind of see where the sawtooth pattern came from that we looked at in the graph earlier. It's precisely because of adding these samples and time series to the headlock, which is in memory. And once we put this to our time slot, we flush it to disk and start over. And that's exactly where this memory drop occurs. But we want to understand even better what within this headlock actually causes the memory usage, right? Because we have this multi-dimensional model, essentially, there are two things that largely make up any block. And this is the same case for headlocks as well as on-disk blocks. But again, on-disk, we can largely ignore for resource usage because it's practically limit constant. But within a block, we can see here, we have roughly an average of 1000 bytes per series, so any unique combination of labels, right? We can see here four theoretical time series. And into each of these time series, we insert samples, right? And we can see a series in our index because ultimately, this is a search engine roughly, right? It's an inverted index. A series is wildly more expensive than a sample. Obviously, we add a lot more samples than we add series, but the whole point of this is a series is much, much more expensive than samples. And the way that I kind of skipped over here, I said samples are really, really cheap, right? One point three bytes on average. The way that this is achieved is with a very specific compression algorithm that's specifically meant for metrics and time series. It kind of originated from a Facebook research paper called Gorilla. It's very popular among other monitoring systems as well. And there's a really fantastic talk here that I recommend everybody watching if you're very interested in the kind of nitty-gritty details of this compression algorithm, which I don't have the time for in this talk. But if you're interested, I highly recommend it. This talk is called 16 bytes at scale because, essentially, a timestamp is 8 bytes and a sample is 8 bytes uncompressed. So 16 bytes per sample and doing this at scale and how we can kind of compress this down to, on average, 1.3 bytes. So the whole point was, though, to understand how memory that works, right? And all of what I've talked about so far is just understanding memory usage conceptually. If you really want to understand and kind of almost have a formula of how to calculate your memory usage, I recommend checking out this blog post by Brian Vazol from Nova's Perception, where essentially he builds something where you can just insert things into a formula and it will spit out your kind of worst case memory usage. I don't want to dive into that formula in particular. I just want to give everybody a conceptual understanding of memory usage in Prometheus. And everything that we've talked about so far, what I wanted to get across essentially is that the index is so much more expensive, right? But as I said earlier already, even though a series is very expensive, we tend to insert thousands of samples into a series. However, the compression algorithm that I mentioned kind of groups samples into a so-called chunk. And for the compression algorithm, again, for the nitty-gritty test, I recommend watching the talk. But for this compression algorithm, on average, to be optimal, we need 120 samples per chunk. And ultimately, this is a mean to kind of a trade-off of grouping samples, but also making them quickly addressable in the index, right? But at 120 samples per chunk with a two-hour interval, because remember our blocks are two hours inside, at a 15-second scrape interval, that makes 480 samples or grouped into chunks, that makes four chunks. And chunks in themselves are complete, right? So we can reuse a technique that we've already seen in this talk, which is anything that is complete, we can memory map to disk. And we can see here essentially these samples that we had previously individually can now be memory mapped to disk. And only the kind of active chunks that haven't been completed, that haven't reached 120 samples per chunk, are still in memory. Which ultimately reduced, it depends on kind of the way that your Prometheus actually ingest data or what data it ingests. But there was a huge kind of win when this was introduced. Actually, just a couple of weeks ago, this was released in Prometheus 2.19. And I want to give a huge shout-out to Ganesh, who did all of this work. And yeah, he wrote a fantastic blog post about the nitty-gritty details of how this in particular works. And the win of this was cutting another 10-40% of memory usage from Prometheus. So all of this is even more reduced, right? But what I want you to take away from this section of the talk is conceptually for Prometheus, the head block is much more significant for the resource usage of Prometheus than all the blocks that have already been flushed to disk. And theories take up significantly more resources than samples. And so ultimately what that means is more series means more resource usage, right? Because samples are relatively cheap, whereas series are comparatively expensive. So that's really, really important. And that's where the majority of resource usage comes from. So that's why, essentially, when we talk about load of Prometheus, we talk about the number of series that a head block has to manage. And this entire kind of philosophy results in something that people often refer to when working with Prometheus as step and forget. It kind of means that since Prometheus continuously scrapes all this data and targets tend to be relatively stable in the series that they expose, resource usage, while initially seems relatively high, it actually stays exactly the same over time. And so it can be kind of daunting when you do this, when you set up Prometheus for the first time, and it feels like memory usage is relatively high. Remember that this is kind of the maximum load that Prometheus will ever get, right? So I encourage you to kind of stick with it and understand that this is what makes Prometheus so powerful, actually, right? So as I already said, we keep coming back to this pull model. Because of this pull model, that's exactly why we have this characteristic. And because targets tend to be relatively stable in the number of series that they expose. And ultimately that leads to very predictable usage of resources. Now, and this kind of plays also into the set and forget mindset. It's less about memory usage though than this usage actually. And this is probably a concept that many are familiar with from other monitoring system, which is retention as in the time, how long I keep my data for, right? By default, this is 15 days in Prometheus. In theory, there's no limit to it, but there are practical limits to it, obviously, like this size, how much you want to use. And here again, we kind of fall into the set and forget model because once you've reached your attention for the first time, and you've started kind of hitting that 15 days period, let's say, this is the maximum disk size that you'll need for your Prometheus, effectively. Now, obviously, it can be a slightly more nuanced, but typically that's exactly how it is. And again, this makes it incredibly predictable and you can kind of, again, the set and forget kind of characteristic. You can set it up and that's a really important characteristic of a monitoring system because you don't want to have to think about it when you don't need it, but it must be there, it must be functional when things are in fire and you really, really need your monitoring system. And so the way that the retention works is simply because everything is managed in block. We can just delete this entire block. So this is really nice from a capacity planning perspective, but it's maybe a little bit of a detail, but it's even a really simple operation for Prometheus to do. It's just deleting a couple of files on this and that's it. Now, I've mentioned a couple of times that the full model is so powerful because it makes things kind of stable, right? But how does Prometheus actually know what to scrape? And there are, there's a mechanism that is called target discovery or service discovery in Prometheus, and there are a couple of kind of built-in integrations, but you can also just specify a static list. But there are popular integrations like Kubernetes is a really popular system these days, but also other things like console or just plain DNS discovery can be used. And you can use all of these things to kind of automatically discover your targets as your infrastructure changes, right? And the interesting thing about this is Prometheus scales exactly with your infrastructure as your infrastructure scales, right? So it's very predictable in that way again. So I want to give you a very quick example of the way that this would look like with Kubernetes. In Kubernetes, we have a concept that we call POD, and these are essentially units of processing. This can be, there can be multiple containers in a POD, and a container typically runs one process, right? So we can vaguely refer to it as a group of processes. And in Kubernetes, typically you group your services at your POD with services. And this is, in Kubernetes, this is not just a logical grouping. This is also used for network routing essentially, but that's not super interesting for Prometheus because Prometheus wants to go to each individual process and scrape them, right? However, this is still really powerful to Prometheus because Prometheus can just discover the service and through the service discover, okay, these are the PODs behind the service and discover them that way. And when you create a third POD, a fourth POD, Prometheus automatically discovers this and starts scraping them. And this is not just powerful to kind of dynamically keep up with this infrastructure, but Prometheus can also add additional metadata because of this. We have this logical grouping of a Kubernetes service now as well. And so all of these can be labeled that can be attached to your targets, as I showed in the very beginning with the instance label, but also with kind of arbitrary metadata that you know about your infrastructure. And that makes not just querying really powerful, but again, Prometheus will just scale exactly the way that your infrastructure scales. So we keep coming back to the predictability aspect of Prometheus. And once, let me just say this, Prometheus scales incredibly far without having to do any sort of distributed sharding, but there's always a limit. It made me that you can have larger instance sizes on your cloud provider, for example, or your machines quite just aren't aren't you can just magically increase the machine size of your governmental hardware. It can be all sorts of circumstances, but Prometheus scales incredibly to an incredible scale, but just vertically. But ultimately, you there's always an end to vertical scaling. And when that happens, you can make use of hash mark charting. And this is also again, there's nothing magic and Prometheus quite a simple mechanism. We literally just take a hash of some label that can be any metadata that you obtained through service discovery. And we kind of perform just mod on it. So a theoretical example with our pods discovered in Kubernetes could be for shard zero of our Prometheus. We take mod one of our targets, right. And so we would get pod zero and pod two. And for shark one, we would get the exact inversion and this works kind of to any scale that you need, right. So once you go to get to that scale, even that is possible with Prometheus. And why I mentioned this is kind of Prometheus is so incredibly powerful, because it's really simple to get started with its scale to a really tiny scale. But as on the other hand, it can also scale to an incredibly large scale. We are actively doing the people are literally monitoring some of the largest services on the Internet with Prometheus. If you're interested, there are a lot of user surveys on the Prometheus blog about various users of Prometheus and how they're using Prometheus. I highly recommend checking those out. And this is kind of the end of the section of this talk about predictability of Prometheus. And hopefully, I have kind of convinced you that Prometheus is highly predictable in its resource usage usage and is still actively being improved in that way. But hopefully you now understand kind of how to evaluate and how understand where all this comes from with Prometheus. But this is my kind of my last topic for this for this talk. And it's less about operational models, but more about making Prometheus fit your exact use case. And that's why I like to refer to Prometheus as a platform. And what do I mean by that? Prometheus is a platform to me because it has really strong stability guarantees on APIs on the data model. And what this has as a consequence is that you can build super opinionated models around that on top of Prometheus to fit exactly your organization's needs and workflows. And one example of that, for example, is the integration of Kubernetes and Prometheus. I just mentioned this because it's a really popular combination. And some of you may be interested in this particular combination. But I just want to show that there's so much possible with Prometheus. And there's lots of other integrations that I could have, could have elaborated on here, but I'm just chosen this because it's a really popular one. So one of the teams that I work with at Red Hat actually, and I created the Prometheus operator, which essentially kind of brings self-service monitoring aspect to Kubernetes in kind of a Kubernetes native paradigm, but tightly integrating into Prometheus, which makes for a really nice experience for people who may be super new to Prometheus, but very familiar with Kubernetes or even not familiar with Kubernetes, but it just all seems like the same system. And this is only possible because of its consistent configuration and such strong stability and consistency in its API, so that we can kind of build around Prometheus to integrate it really tightly with Kubernetes. And just one example that I want to kind of elaborate a little bit further on here is you can build tenancy models for this essentially because of that. In Kubernetes, pods are typically or not typically every pod is somewhat isolated quote unquote in namespaces. So often namespaces are kind of used as a mean to isolate workloads. That's not to say isolation like containers, it's more of a logical grouping. But that as a consequence often means that people use it as a tenancy model thing. Namespace XYZ is my tenant XYZ. And because of all of this, because we know this, we can build an opinionated system around Prometheus. And there's a project that my group also created called from Labelproxy and this is an upstream project in the Prometheus community. And this essentially allows you to enforce some particular label and you can configure it to be that namespace label from Kubernetes. And you can enforce that in any query that someone does against Prometheus. And just like that, we've created an opinionated model around Prometheus without having to modify Prometheus itself. But because Prometheus is so consistent and stable in APIs and configuration, we're actually able to do this in a reasonable manner. And I think this is why essentially you can refer to Prometheus as a platform and why it's such a powerful platform. You can truly make it yours and fit your organization's needs. There's a lot more that I could talk about in terms of the integration between Kubernetes and Prometheus. There's a project that we maintain called Kube Prometheus, which essentially packages up all of these things and how to monitor Kubernetes with Prometheus in this project called Kube Prometheus. I recommend you to check that out and kind of go from there. And then last but not least, leverage the ecosystem. The Prometheus ecosystem has grown into some of the most incredible that I've ever seen. There are so many things, people are sharing so much experiences. And one of the most, two of the most amazing things that I think have happened is, one, a lot of people have built so-called exporters for almost anything you can think of in the open source world. Exporters are kind of the shim from some bespoke format of metrics. Let's say Postgres has some way to extract metrics via SQL out of Postgres, right? And it converts that into the Prometheus exposition format. And that kind of allows integrating into the Prometheus ecosystem. And for almost anything out there, there's a Prometheus exporter if the application isn't natively instrumented with Prometheus already. And the second thing is something that I actually created with Tom Wilkie, the now VP of product at Grafana, called monitoring mixin. And this is essentially packages of alerting rules and dashboards that the community together maintains. And this kind of enables us to share experience, share the way that we monitor all of these systems and kind of collaboratively build better and more reliable systems. And best of all, we can all do it together in a community and learn from each other. So this is called monitoring mixin. You can check it out under monitoring.mixin.dev. We have these for a bunch of components already, but we would love if there were even more for just about anything out there. There's so much already. So I highly recommend kind of leverage this ecosystem. There's so much already there that can make use of and get back to. So that's kind of it. That's kind of the conclusion of why I think Prometheus is simple to operate and kind of summing all of this up. It's because of consistent configuration and really efficient and reliable and predictable resource usage. Because of how the time series database works, as well as how target discovery works, it provides really strong APIs and guarantees that you can use to make Prometheus truly your own. And last but not least, make use of the ecosystem, share, and make this ecosystem even more awesome than it already is. And with that, thank you. And if you have any questions, I'd be more than happy to answer those. Okay, great. Great. Great job, Frederick. Thank you. All right. So we had a couple of questions in the Q&A already. Feel free to add more there or ask in the chat. One that I haven't answered yet, I'll try to answer here live, but I'll reiterate it because we have a couple of minutes time so I can reiterate on the other questions as well. Do we get a downloadable PDF of the presentation? Yes, I believe this will be all attached to the schedule. So let's answer it live. And we had a couple of really awesome questions. First one was from Josh about kernel tuning and controlling kind of the flushing mechanisms within Linux for the storage. I think this is a really excellent question. And I could have an entire talk just about this topic. So I recommend you to kind of look at all the kind of things around Fsync if you're not already familiar. This is essentially what can happen is when Prometheus calls Fsync, some of this data may not be entirely written to disk yet. And I don't know all the details myself either, but essentially the kernel has like an expiry time, which defaults to 30 or 60 seconds depending on the Linux distribution at what rate it actually kind of forces these to be flushed. And so in Prometheus world, we say that roughly two failed scrapes, so by default 15 seconds is acceptable. And that's why most of the time the default in the next distros is sufficient because essentially that would mean at worst case we're losing to two samples and that's still fine for monitoring purposes. Then the next question was whether discovery is on the main Kubernetes scheduler for discovery or on the node. So this is basically on a global level, but you can limit Prometheus to only watch certain namespaces so that you can kind of limit the permissions that the Prometheus potentially requires. But it is either global or per namespace. Then the next one we have is this is also a really, really great question is scraping just a snapshot of the system in the current state. And how do you capture spikes essentially. So yes, the scrape is as a snapshot of the current state. And this is typically not a problem with counters because counters are monotonically increasing and you do things like rates or the increase over some time. So in those kind of scenarios, you will still see the spike. And if you really need to narrow it down to the very second or something you can still you can just reduce the scrap interval. But if this is let's say memory usage or something like that, those are things that would need to be tracked within the application because because it's not kind of a monotonically increasing thing. Prometheus can't reason about the values that it hasn't seen in between. So that's something that needs to be handled within the application itself. And the next question we have here is for Coupe Prometheus is the plan to continue to use jsonnet. Yes, there's no no plan to move away from jsonnet. We have really great experience with it. We work with it every day. We ship it in in our product. We use it for various other projects. So we're really happy with it. That said, we're not opposed to kind of the community maintaining other versions of this. There's for example already a helm chart. We just don't maintain those because we don't work with those other mechanisms on a day to day basis. So we can kind of maintain them to the way that we would want to to the level that we would want to maintain a project. But yeah, we're totally happy with jsonnet and we're going to continue doing that. Let me just mark that as answered live. Okay, then the next question that we have is, is it possible? It's possible how to export OpenStack Metrics internal network project on OpenStack Exporter. I am not an expert on the metrics available in the OpenStack Exporter. So that's something that you'll need to ask the maintainers of that project yourself. Prometheus can ingest those metrics. It's conceivable, but I just don't know whether they offer. And last but not least, how to best scenario for persistent data with Prometheus like say three months of data. Yes, so this is a really interesting one. This I actually didn't cover in so much detail. Roughly it behaves similar to memory and retention. So Prometheus goes through a process called compaction. And essentially that means that these two hour blocks don't just stay two hours. But after a certain amount of time, they get compacted into larger blocks. And the biggest block size that exists is 14 days of data. And once this has kind of reached its state, this is pretty stable in terms of the data that you will be gathering. So you can essentially extrapolate from there. So you can take 14 days times the amount of kind of long term data that you would want to store. Plus all the other data that you have. So it's a little bit more complicated to plan for this. I agree. But you can still kind of extrapolate in the same way as you do with the memory usage. Okay, that was the last question that we had. So if there's no other questions, then thank you for checking out this session. And have a great rest of your day. Bye-bye.