 how Prometheus indexes its data and why you should care. Do you ever wonder why a simple promcule query like this takes more time to execute with more instances to monitor, even when the instance level is not in the query? Or why does the performance slows down every time you query data with an increased time range? This talk will give you an intuition for query performance by understanding how Prometheus indexes its data. A brief about me. Hello, everyone. I'm Harkishan Singh. I live in Bhuvaneshwar, which is an east coast of India. I work at a software engine in a time scale and contribute to Prometheus upstream regularly while developing and maintaining prom scale, which is an open source observability backend that can store Prometheus metrics and open telemetry traces. When I'm not working, I read books and play computer games. That's about me. This talk will cover three topics. We'll start with understanding the Prometheus data model, then learn about the indexing strategies that Prometheus uses to index its data and finally learn the querying execution flow with the help of Prometheus indexing strategies that we learned in part two and the effect of high cardinality on it. Let's begin with understanding the Prometheus data model. Prometheus uses a simple data model that contains labels, values and timestamp. In this example, we have a metric go underscore go routines, which has two labels. Instance equals C advisor, calling 8080 and job equals C advisor, which has a value 125 followed by a timestamp. Now, what are series? Series are basically sets of labels. Now, if you change one label or even its label value, it's a completely new series in Prometheus data model. For example, the third label over here, which is instance, has its value changed towards the right. This leads to completely new series for Prometheus. What Prometheus index cares about is labels and time. These tools are collected when Prometheus monitors a target. The value part in the Prometheus data model, which we learned previously, is not what Prometheus index cares about. On a high level, let's see the indexing strategies that Prometheus uses. Let's start with understanding what a Prometheus block is. A block in Prometheus is a basic unit of data storage that stores two hours worth of data. Let's see how a block looks like. On a high level, the block contains two things, samples data and index. Sample data is the data that Prometheus scrapes from the targets and stores it inside the chunks directory. Index is the actual indexing process that Prometheus uses for query evaluation or for indexing its data. This is what we will learn about in the next slides. The next is method or JSON, which is the metadata about the block. And finally, we have Tomestones, which contains a deleted series along with information about its deletion range. Let's learn about what chunks are. Chunks are a collection of samples in a time range for a particular series. This organization of data makes it easier for systems to retrieve samples for a particular series. If the index can figure out which set of chunks belong to a particular series, we will cover this in the next slide. Now let's see what an index file is. Prometheus index file contains two types of indexing. The first is postings index and the second is series index. A posting index is a relationship between labels and a set of series that contains that particular label. For example, name equals up. This label is contained in two series. And similarly, the next example is instance equals 9091. We have one series in our example which contains this particular label. A series index is a relationship between series and set of chunks that belongs to that particular series. For example, the first series contains two chunks. The second series contains three chunks. None of these chunks are shared between two series. Rather, each chunk is specific to the series itself. Let's see how the blocks are placed inside Prometheus DSDB. Blocks in Prometheus DSDB are collection of index and sample data which we discussed previously. Now imagine Prometheus gets a query say up with a label as job equals from which has a start with two hours and 30 minutes and ends at five hours. Now block two and block three overlap with the query's time range which means we will consider block two and block three for the evaluation of the received query. Let's understand the query execution process on a very high level. But first, what is a query? A query is formed of two things, labels and a time range. Over here, the labels are name equals up and the job is from and the instance is 9090. Let's understand a basic query execution strategy. So a query has two things as we learned which were labels and time. The first step uses the time component of the query which basically means we see the query's time range and figure out which block overlaps the time range of the query and then we consider those blocks for query evaluation. Next, we use the labels part of the query to find out which series contains the query's label. This is done using the post-exindex which if you recall was a relationship between labels followed by a set of series that contains that particular label. After step two, we have a list of series that satisfies the query's label. Hence, we move to the next step which is looping through each of the series list which we've got from step two and using the help of series index which if you recall was a relationship between series and a set of chunks that belongs to that particular series only. So while we loop over the series using the series index we find the chunks that belong to that particular series and then retrieve samples from the chunks in step four. For understanding the querying process it is important to understand that this test that I said happens for each block then each series and finally each chunk. This means any delay on step one will have multiplicative effect on the following steps similarly for step two and so on. For example, if there are a lot of blocks to consider for a query evaluation that will slow down the series because we have to go through each of the index file in the block to read the respective postings index and the respective series index. Let's look at some query examples to understand the query execution process better. Now imagine that Prometheus gets a query up with a time range. The first step is to find the blocks that overlap the query's time range. Say block one and block two were those blocks which overlap the query's time range. Next we use the postings index to find which series satisfy the labels of the query. Over here the query as up which in other words is name equals up. So we see the postings table and find out which series contains this particular label. In our example we have two series for that. In the next step we loop through each of the series which we got from step two and using the series index find the set of chunks that belong to that particular series. For example, chunk one and chunk two belong to series one. In the same way this will happen for the second series as well. And then finally we try to samples from only those chunks that overlap the query's time range. Now, let's focus on the postings index to understand a more advanced query execution. Let's add a label job equals prom to our previous query which was up. This query has two labels now. One is name equals up and the other one is job equals prom. When this query is given to the index reader the index reader goes inside the index file and looks at the postings table and tries to find out a match between the labels of the query and the labels inside the postings table. Over here we have two labels. So the first match is received at the first entry which is name equals up which has three series and the second match is received at the second entry which is job equals prom which contains two series. Now if we see the promql expression it needs a series that contains name equals up and job equals prom which means there's an and condition in promql. Since there's an and condition we do an intersection between the series which we got from postings table. Intersecting the series of the first two entries will give us series two and series three as the result. So the result of the promql query would be samples of series two and series three. Let's see another example with another label instance equals 9090. Now this query has three labels. First is name equals up the second one is job equals prom and the third one is instance equals 9090. So the index reader goes inside the postings table and tries to find a match for these three labels. It gets a match for the first three entries because these three entries satisfy the queries labels. Now what the query wants is a series that contains name equals up and job equals prom and instance equals 9090. So we do an intersection between the three entries and we get series three as the result. So the result of the promql expression would be samples of series three that lie in the range of the promql's expression. Now let's learn how high cardinality affects the situation. But first let's understand what high cardinality is if you recall the previous slides towards the start. We learned that every new value of a label creates a whole new series for Prometheus data model. That's what is high cardinality in labels. So when you have multiple label values for a particular label name that creates a lot of series in Prometheus system. In other words, having lot of label values for label names will create lots of series for the Prometheus. So under high cardinality for each block in point two the index reader has to go through a long list of tables in the postings table to find which series satisfy the labels of the query. Now there is high label cardinality which means lots of series are in the Prometheus system. So the result of step two will give us a long list of series that satisfy our given labels. Now in the step three we have to look through each of the series in the long list. Which we got from step two. And then for each series find the chunks that belongs to that series. And then in step four retrieve the samples from each chunk. Retriving samples from chunk is a C cooperation on disk and if the block is memory mapped or not memory mapped this can be expensive. Similar to the block is memory mapped this can be expensive. Similarly a query with a long time range will involve lots of blocks which overlap the queries time range which means you have to read index file from each of those blocks that overlaps the queries time range. This will cause the following steps to slow down since step two has a dependency on step one. A long range in query also means that we have to loop through lots of chunks which we receive from step three which overlap the queries time range. This can take good time. And this is why a simple promql query without lots of labels when executed in a high cardinality system or having long time range in queries affects the query performance and slows them down. And with this we complete the talk. And we are hiring engineers that are excited about building high performance systems for observability and time series use case. We work on prom scale which is a backend for observability data which uses timescale DB a time series database built on top of Postgres which is operationally mature and SQL compatible. We are also open source and fully remote company. If you are interested please apply at timescale.com slash careers. Please write your questions in the event slack channel. Thank you very much.