 Hello, everyone. We hope you have an amazing day at KubeCon. And you have some spare energy to learn something new and exciting about Prometheus project. Before we start, though, I would like you to know to that maintainers team and community created so many materials about Prometheus already you can read, right? You can find those on our website, Prometheus.io. You can find that on blog posts, YouTube, other KubeCons, past KubeCons. And to be honest, it's kind of hard to prepare a talk, a deep dive from KubeCons, that doesn't repeat some knowledge. But if you're not, we found some cool area that might be a little bit different for you this year. So what we want to do, we'll try to give you an overview of all newer features and deployment possibilities that we have created to accommodate many, many various requirements from cloud native community that adopts the technology, the cloud, the communities, and all the goodies there. With more companies, there are much more use cases and different integrations that we have to evolve. And this is what we want to show. But first, some short introduction. I have Chris with me. Hello. I am Chris Marchbanks. I've been a member of the Prometheus team since 2019. I'm currently working at Grafana Labs on engineer and on some machine learning capabilities. In my free time, I try and spend as much time as I can, skiing and climbing with my wife in the mountains outside our home in Colorado. Wow, I miss skiing as well. Hello, my name is Bartek Plotka, and I am a principal software engineer at ThreadHat. I am a Prometheus maintainer for a bit, and I also co-founded Tunnels Project, which is a project for scaling Prometheus. I'm also CNCF TAC observability tech lead, and I also write a book with O'Reilly, which is called Efficient Go. So before we jump into those advanced cases, let's introduce Prometheus to maybe those new here who just want to learn about what it is and what it does. And let's make it very briefly, very brief. So technically speaking, Prometheus is just a single stateful Go binary that we deploy in one or two replicas next to the workloads we want to monitor. So the things that produce some data, some metrics that we want to observe and grab. So when you start this Prometheus server with the typical configuration, it will first reach the discovery that you configure, discovery service that you configured. For example, it might be a DNS service. It might be a Kubernetes API. It might be ECS that scans for the virtual machines that you have deployed. And using the labeling mechanics or various different strategies, it obtain the targets that we want to scrape, that we want to essentially monitor. So with those targets, we essentially reach them and pull metrics from those targets, which typically are just applications. And we do that with every configured interval, which generally means something between one seconds and couple of minutes. And we call this process as scrape, right? And then scrape is essentially a very fast HTTP call to the endpoint that those applications exposes. And it usually exposes those using the open metrics or Prometheus text exposition format. So this exposes essentially the current values of the gouges and counters that are in the system right now. It will also watch the service discovery for any changes that might appear dynamically to make sure we always have the correct place we take the metrics from. And the available data that we pulled can be an access through different means. We have various Prometheus APIs, REST APIs, including query API with flexible promoQL language. You can then connect to Grafana for some dashboarding experience. You can also configure your own recording rules. So to make some fast aggregations or alerting rules that will trigger an alert when, for example, Prometheus see that there are too many errors on your application. Others then are sent to the alert manager for routing purposes, and then send to your email client phone or Slack to notify you or some further machine processes. So that was general overview of Prometheus, very basics. In this talk, however, we want to look on various, maybe more advanced use cases, but actually I want to highlight that those are getting more and more typical because with advancements of technology, of CNCF projects, we can actually abstract so much complexity in a faster way. So you might hit those cases pretty soon if you are even new to the cloud native ecosystem. So we will talk about scraping features, bug-filling, signal correlation, multi-cluster agent mode, and multi-tenancy. Let's go for it. Yeah, so first up, the core Prometheus, the scraping, is still adding more features to allow more and more use cases. I want to highlight a couple of those in the service discovery component. So first off, in the last six months, six new service discovery implementations have been added to Prometheus. I especially want to thank all of the contributors who have implemented them. These new integrations allow for more and more users to easily configure and get started with Prometheus. I'm especially excited about the generic HTTP based service discovery to allow users the ability to use service discovery for anything they can think of. For example, in the past, I've had to use sidecars and Kubernetes operator to fairly distribute load among a fleet of Prometheus instances. Today, that would be possible using only the HTTP service discovery. With all of the new service discoveries and just the continual growth of highly dynamic systems, Prometheus has also improved handling of potentially untrusted or misconfigured targets. First, it is now possible to limit the size and the count of labels. So if it's especially useful one time, write a service writing arbitrary error messages into a label value, sometimes as large as 100 kilobytes in a single label value, quickly brought my Prometheus instance to its knees. Second, you can also now limit the body size of a scrape, which can catch some high-carnal use cases that may also cause issues in your Prometheus instance. And finally, starting in the most recent version of Prometheus, we're going to allow the configuration of scrape intervals and timeouts via relabeling in your service discovery. This means things like a generic Kubernetes job will be able to customize a scrape timeout, a scrape interval on each service or each pod without having to create more and more complex configurations for one-off cases. Combined, all of these new service discovery integrations and just the additional control you have over scraping your targets, allow more use cases and more safety than ever before. Now back to Bartek to learn about the new backfilling capabilities in Prometheus. Thank you. So in this kind of feature, what is very, very important is that we need to understand that for some people, like for most of the users of any observability solution, data is precious because once collected, you usually cannot collect it again for this situation that happened before. So sometimes you want to persist that data even across different systems. This is why there are more and more cases when user want to import bigger datasets of series, metrics to Prometheus setups, for example. It might be due to migrating to Prometheus from other systems or even between Prometheus in a different locations or maybe bring back up data when you lose your, for example, persistent disk. And data backfilling was always possible, not maybe always, but for a couple of years already, but yet only recently with community, we made it much more approachable. So let me explain how you can achieve that and import large number of metric series today. So first command that I would like to introduce or like the solution is maybe you are familiar with the Prom tool. Prom tool is like another binary that builds with the Prometheus project and it allows you to perform certain operations on your local machine or the or on the server. It's like a CLI tool. So we recently added a command called tsdb create blocks from. And then it allows you to define, create a tsdb blocks from different inputs. A tsdb block is essentially a data in a format that Prometheus understand and uses to store metrics in an efficient way, right? So if we can create in an easy way those maybe complex binary format blocks from very easy formats like open metrics or YAML, then we can just create those and put it in output and then literally just copy or move those blocks into your Prometheus local storage. And Prometheus will immediately load those and even start compacting or re-compacting vertically for efficiency purposes, right? So this is why we create the tsdb create blocks from open metrics where you provide open metric format files for different at the same time periods. And you can really literally use something like Prometheus exposition formats open metrics and pack multiple series for multiple timestamps and efficiently create the tsdb block from that on your local setup or somewhere else and then move the blocks. So that's kind of the general backfitting solution. But there is one thing more that we did. So it's a common thing that when you do recording crews or even other thing, mostly for recording crews you are recording those rules or evaluating the certain expression for longer time. And then it might be some disruption at the some point of time, like let's say for three days your cluster were down for some reason or like your machine with Prometheus were down and you have a gap of three days. Now, what you can do right now, you can backfill rule backfill this data which means that you can perform the same recording crew but for the past data and catch up and build this gap of data and import directly into the Prometheus again. Which is amazing because you can just fix that the previous gaps if you still have a raw data for this moment, right? And this is especially important when you have like distributed rules system maybe with the Cortex system and Thanos but it might be useful for Prometheus cases as well. So for this, you use these to be create blocks from rules. Thanks, Bartek. Next up, there's another new feature in Prometheus which is exemplars. Exemplars provide a way to associate context or other information to your metrics and are commonly used for adding tracing information to counters or histograms. So what does an exemplar look like? Here's an example of exemplars being exported using the open metrics format. You can see two exemplars, one with a tracity of fast and one with a tracity of slow but you can add any data that you would like to these that is helpful for you. In addition, you're provided the value of the observation of the exemplar as well as the timestamp for when that value was observed. These fields allow the Prometheus UI and Grafana to plot exemplars on time series charts. For example, here's how exemplars are represented with the Prometheus user interface. I'm graphing the 99th percentile of request lengthies for Cortex and then you can click on any of the exemplars present in the graph to find a trace ID that you can look for in your favorite tracing tool. If you filter the series to only look at certain pods or error status codes, you will also only get exemplars relevant to those series always matching the context that you're looking at. Having the ability to quickly jump from an alert or a dashboard to an example trace is very powerful and streamlines both incident and debugging workflows. Then client library support for exemplars is still growing. Currently, both Java and Go fully support exemplars and Python has an open pull request to add basic support that will be merged soon. In order to enable collecting exemplar data in Prometheus as well as sending it with remote write to other systems that might use that exemplar data, you can use the exemplar storage feature flag. Also note that exemplars are only collected when using the open metrics exposition format. Open metrics is negotiated by default when Prometheus scrapes a target, but some libraries such as client-golling require feature flag to enable open metrics. I hope you find exemplars as useful as I do. Now back to Bartek to discuss multi-cluster use cases for Prometheus. Thank you, Chris. Yeah, I mean, exemplars, this is something I really would love to use more and more these days. It helps so much to correlate with traces or really to get the workflow done much, much easier. But now let's talk about very common thing that are starting to, well, emerge for many of us. We used to have just one big Kubernetes clusters, but nowadays with companies and startups building solutions that can spin up Kubernetes clusters in minutes or even seconds, it's a very common that we have many of them. In Red Hat, we have dozens of thousands of them running around the world. So how do you even cope with that? And what are the monitoring challenges or deployment models that we can use with Prometheus? So we have one cluster, let's imagine it's like in Europe and it's kind of production, we might have different environments, you have applications and then as we discussed, there is a Prometheus running there to monitor them and what you do if you have more of them. Now, something that I want to show as a kind of anti-pattern, never do that essentially, never try to put Prometheus outside of those clusters in let's say another region or another cluster somewhere very remotely in a different network and try to use scraping mechanisms. So very low latency, 15 seconds interval or even faster scrape between those zones. And the reason is that you need to have like a very reliable network or reliable situation to actually have reliable monitoring from that. So you want to have the Prometheus very close to our application or really the ingestion part, the pulling part very close to the application. This is why never do that, never put Prometheus in a different failure domain. But what you can do is that, right? So generally what was always was possible, again, not always but probably like three, four years was hierarchical federation. And essentially what it does is that you federate. So you can say scrapes from one Prometheus on top of another, right? And usually you do that only for a set of metrics with maybe even different interval. So you can actually group and aggregate data on the second layer and on some other cluster, for example, and be able to query that data from a single place and have all the data there. If for example, your Prometheus will die on or your cluster totally die, right? You still have data. And that's kind of a great approach but it has its own trade-offs. First of all, it's a double scrape again. It's against scrape across the domain. So it's not very reliable for all cases. And overall, you usually need to have, you cannot fetch all the data. You have to fetch like very small percentage of the data to make it even possible. So there are a lot of myths of this to the solution. So what kind of, we build or a community build is for example, remote with integration, the Prometheus. So this allowed systems like Thanos to just attach to the Prometheus. So build, we can kind of have a sidecar that kind of transform that data to some JRPC API which is kind of easier to use in Thanos. And then you can place some kind of querying component, aggregating aggregation component on global level to have to be able to query the data from multiple locations, multiple clusters. And I'm talking about hundreds, thousands and every query will try to find a proper place and find out to the relevant places for the data and aggregate and perform promql evaluation on top of that. And this is thanks of the remote read protocol which allows fetching series from Prometheus directly. Again, the limitation at trade-off here is that you are accessing the data which is still on those cluster kind of components on those clusters. So when the cluster is down or maybe network is unreliable between you and the cluster, you don't have visibility of those metrics. The advantage is that you don't move any data, right? Well, you don't move persistently and you don't persist anywhere. You only move it when you need them, right? And generally to be honest, like we love scraping a lot of collecting a lot of data but we don't use most of them. So this might be great solution for you. But it's not only that. I think what is getting more and more popular and what people use is remote write which means that you switch kind of to the opposite model instead of you pulling of the data, you're pushing it somewhere else and actually persisting there for forever. Sometimes you have like a cheap storage. So Prometheus then uses remote write feature to stream the data samples every sample or the samples you want to stream. So maybe portion of the data that you have inside cluster into the some other cluster and then there are different many solutions that allow you to receive this data, including Thanos, Cortex, and recently we quit even some system that allows you to kind of configure that in a kind of SAS product observatorium. But there are so many other solutions that you can have them listed here. And that might be the something that will work for you. Beyond just those big clustered systems like Cortex, like Thanos, like things like that that you're sending to, if you don't have a ton of data or you only want to send a small subset of data similar to what you do with a traditional federation, is that native Prometheus is the Prometheus binary can now be enabled to receive remote write requests. This means all of your clusters can just send a small amount of maybe the result of recording rules to a central Prometheus where you can evaluate global rules and global alerts. And advance to this is you get all of the nice remote write semantics with retrying requests or things like that if there is some sort of network partition that occurs between your central Prometheus and your many clusters. This also feeds into edge clients which we'll talk to a little bit next. So what happens if you just can't store your metrics next to your apps? This happens in lightweight clusters running just like Raspberry Pis, IoT machines or maybe dynamic clusters that don't exist very long and don't have storage for Prometheus or you just don't have persistent disks wherever you're trying to collect metrics from. That can make a traditional Prometheus architecture very challenging. And many times it's very hard to scrape data from an edge client or things like that. The upcoming Prometheus agent seeks to solve these use cases. So Prometheus agent is a mode in Prometheus just configured with a flag that you can run Prometheus and all it does is collect data and forward that data via remote write to another Prometheus instance or any of the remote write receivers that we just saw on a previous slide. This new mode is based on the Grafana agent and it's contributed by Grafana Labs. Running an agent mode will minimize the disk and memory consumed by Prometheus while still having a right ahead log for reliable sample delivery. Feel free to follow pull request 8785 for updates as this work is completed. Then beyond the just agent mode and the remote write receiving capabilities we're also working on improvements to the core server to better support resource constrained environments. For example, there are some settings recently added to Prometheus to just lower the amount of disk space that Prometheus needs to provision for every block of data. And we are welcome to any more contributions that will help enable these low resource use cases. Thanks, that's pretty exciting. So last but not the least, let's talk about use case that pops up here and there again, which is multi-tenancy, right? Which is essentially, what if we can reduce the cost of running Prometheus servers by essentially running one of them for multiple teams that don't necessarily like each other or like don't want to see each other later? How to enable that? And this is something that we try to evolve as well. And I want to explain how you can achieve that right now, especially on the QVD system. So you can, the problem here, as you can imagine, let's be specific is that in the cluster you might have the instances that you want to make sure they are only single or like multiple replicas but the kind of single element there. And you want to make this reusable for multiple teams that, and you want to kind of isolate their experience. So, but the problem is there are lots of shared resources here. In Grafana, as you specify set of dashboards, you specify a certain configuration. You don't want multiple teams to access the same set of dashboards because they can impact each other, maybe create a dashboard with the same name and suddenly everything will not work for another team. The same with other manager. And this is actually very critical. You kind of, there is normally a single place of configuring routing and providers which tells you know, from what alerts should be routed to watch a notifier like maybe page or duty or this kind of slack or whatever, you know, if we allow configuring this by multiple people that don't know each other kind of goals, then well, we might have a clash here. And similarly to Prometheus itself where we specify set of scrape targets. So what should be monitored? And we've worked with what are lurking rules of this if it's configured from a single place then we might have a serious incidents when people are impacting each other from the configuration standpoint. So this is what Prometheus operator is aiming to solve in the Kubernetes ecosystem. So you have Prometheus operator and Grafana operator which kind of recently emerged. Those servers, they essentially define a certain APIs allowing each team to have their own set of, you know rules, recording rules, alerting rules, service monitors which specifies what things should be scraped what targets should be specified for Prometheus and the other manager configurations. So routines and those providers and dashboards you can have your own set of dashboards and it's only your dashboards you don't need to see, you know, other teams resources. So by providing that for Kubernetes APIs we have, you know, by default all the airbag kind of access controls. So we make sure that, you know this team can only access namespaces. So resources, so those dashboards and configurations that only relates to them. And then we can use Grafana operator and Prometheus operator that will watch Kubernetes APIs and essentially merge those things together. So merge other configuration into one merge service monitors into one scraped configuration of Prometheus and inject that to the single resources like Prometheus and other manager and Grafana. And this is what we use in production right now in Red Hat and we really, I mean, recommend doing that if you have like multiple teams and you don't want to impact each other. And there was also one important point which is at the end, you know, configuration is one thing but second thing is how you access this data while providing some data isolation. You can do that with Grafana thanks to the, you know, access control mechanism many of those, but Prometheus doesn't have that but it doesn't mean that you cannot do it because there are already community driven and very well supported proxies. For example, from label proxy that allows you to essentially inject a certain label into the PromQL making sure, you know, your team only sees your own data, nothing else. And we use that heavily on Cortex system, Tano system, so all those systems are derived from Prometheus as well. So it's like a very, very flexible solution here. And, you know, all those things that we improved on the scraping part that Chris mentioned in the very beginning. So making sure resource constraints are, you know, respected and also, you know, there are scrape limits and all of those isolation mechanics. This helps as well because you also limit the possibility of, you know, one team providing, you know, so many metrics or so broken metric endpoints that will, you know, be stopped and be rate limited before it kills the whole Prometheus server, for example. So those approaches are exactly for this purpose for the multi-tenancy cases. So you have some links for those operators if you want to check them out and you can totally check those out and help building and improving those as well. But there are certain things that we always, you know, evolve into and we have two that we are super exciting to, excited to announce. And the first thing is this idea of ingestion scalability automation. What it means is that thanks to the agents mode that we kind of start to, you know, implement and provide, it means that in a sense ingestion, so those the scrape mechanism and service discovery is stateless. So, you know, the data is pulled and immediately forwarded to some remote server. This allows us to scale them dynamically and using, you know, horizontal poke out of scalars and communities or any other solution because suddenly we can dynamically assign scrape targets into multiple agents. So you, if you're Prometheus, if your communities cluster scales from one node to thousand we can dynamically, you know, also scale those ingestion pipelines. It's not like a one Prometheus server that suddenly you need to have more of those which are stateful, it's kind of harder to achieve. So I'm super excited to see those mechanism and we, I hope we can also work with open telemetry to make that, to improve that part as well. Yeah, we also have high density histograms that has work going on right now to support. The new histogram format will provide both higher resolution and higher accuracy of histogram quantile calculations and a significant cost savings in terms of how expensive histograms aren't a store and just how much data is in your Prometheus TSTB. I highly recommend checking out just a few days ago at the DayZero PromCon event, my colleagues Ganesh and Dieter gave a talk about the work they've done in this area. Please check it out. And finally, thank you for listening to our talk, provided some links to reach out to the overall Prometheus community as well as each of us individually. And now we have a few minutes to answer any questions that you have, thank you. Thank you.