 Okay. Hi. Hi everyone. Okay, we still have one more minute, but maybe I can ask a question who's here for Cortex to just to make sure everybody's in the right building. Okay, good. Who's here to see AI generated pictures of puppies? Nobody? Oh, right, cool. Okay, good. So we, I guess we have to do both. We have to do Cortex and AI generated pictures of puppies. Yeah. Okay. We're going to talk Cortex and how to run a rock solid multi-tenant primitives. And how do we do that with Cortex? Right. On the agenda, we're going to do, we're going to do a bit of introduction on Cortex for those that are not familiar very quick. Then we're going to talk about what is latest on Cortex. We just released something. And then we're going to go in some of it of durability and in Cortex and having users in mind, you're going to see what that is about. And then we're going to see a specific on Cortex. What is the secret sauce of Cortex? We're going to get that. And then, then we're going to talk what's coming up for Cortex in, in the pipeline. And then we're going to do some questions. Right. Of Cortex is a horizontally scalable, highly available, long-term storage for primitives. It's a CNCF project. It's an incubating project. It's not a new project. It started in 2016. It has seen a lot of contributors over the years. Lots of maintainers. It will even change at some point the back end storage. But we're going to talk a bit on what's like last year a bit. So my name is Felix Gonzalez. I'm a software engineer at Adobe. In here with me is... I'm Alan. I'm also a software engineer on AWS. Yeah. So, and then we also have some other 10 members of the team that we show the value. Alvin, a software development manager who's also a maintainer. And we also have Ben from Amazon as well. Also, we have maintainers on the home chart. We also want to mention they do a great work on the home chart. These are the faces. Sorry, Nicholas. I didn't get your picture for the talk. But yeah, you do good work. And these are the people who have been working a lot on Cortex lately. Some of the key contributors that we have on Cortex are on this slide. If you... It might be that you made some comic on Cortex, you're not on this slide. And probably this is like... If you make more than one comic, you're sure. So, sorry. But we want to thank them all because these are the people that we're going to talk about when we talk about all these features that we're going to talk. So, but why is Cortex and how Cortex came to be? Cortex exists because of Prometheus. So, let's start with Prometheus. Prometheus in a simple form. It just scrapes an application. That application might increase in high carnality in high churn. Or you might increase your data retention on Prometheus. Typically, there's a happy puppy that is an owner. And this is the owner of this Prometheus. And he's a really happy puppy, as you can see. Whenever there's high carnality problems, the owner of this Cortex decides to increase vertically in resources all that is needed for Prometheus. But what happens if we have multiple applications? We have multiple applications. Prometheus is ready to give you more. You can scrape multiple apps from Prometheus if you have the same problems that we talked before. You still add a happy puppy. You're still a happy owner because you can still increase your resources vertically as you need. Simple use case in all those scenarios. But what happens in this next use case? In this next use case, we have two groups applications. And we have the same problems. But this high churn belongs to some applications, not all of them. And they belong to different teams. So in this case, the users are not happy with each other. Why? Because there's no tenant isolation in Prometheus. Prometheus was never designed to do tenant isolation and it will never be in Prometheus. It's part of the design. It's a simple project made for that. And in this case, there's no way around that. So this is where Cortex fits in. Cortex comes and allows you to have a tenancy in another environment. So now you have two Prometheus in the same Prometheus that we had before, scraping multiple applications. They send to Cortex to different tenants. Because they send to different tenants, Cortex provides the multi-tenancy. In the scenarios that we've talked before, Cortex is prepared to limit for one tenant and not for another one. So if one tenant is having problems, you can still have the other tenant working. And this is the key factor. So now you have the two puppies really happy. The two owners really happy with their service because they can scale separately. They don't have to deal with the problems that the other owner has. So they can achieve higher availability in everything. We're going to stop right here and we're going to go quickly on the architecture. Alan, please. Hey, just before starting talking in Cortex architecture, just out of curiosity, how many of you use Cortex in a way or another? Cool. 10 people. Nice. Well, we would love to chat with you if you guys have gone to the CNCF channel. We would love to see, talk about your experience. So a little bit about Cortex architecture. Cortex is basically multiple microservice that we deploy. If you are talking in the Kubernetes world, we deploy those microservices like deployment or a state of the set. And each one of them has a very key goal on the Cortex cluster. Today, we will not be diving deep on which one of those components are and what they do. But I think the main thing here is that those components can be horizontal or scalable. So if you increase your number of time series or your number of samples or your query path, number of rules or number of dashboards that you have in our organization, we can just do what we do on Cortex. We can have HPAs, looks at some metrics. And if you need more time series, just put more pods. Cortex will reallocate the time series to the new pods. Everything will be done under the hood and like we can scale down and up microservices just like any other microservices on Kubernetes. The second thing that Cortex have here and like that we can see is that we can retain your data for a long period of time. Cortex only keeps in this recent data and after some time that you can configure, this data is shipped to object storage and can remain there for how long as you want. Those data, they are credible. So it will work just like a normal Prometheus, but it will be stored in S3 or Azure Blob Storage in your cloud provider. The third thing is that even though we have multiple microservices and multiple things going under the hood, this Cortex is fully Prometheus compatible. So if you have your dashboards, Cortex will work just like a single process Prometheus server. So what's the latest in Cortex? The last six months, we had two releases that we released at 1.14 December last year. We have some key features here like we removed deprecated chunk storage. This is not, well, not adding features, but removing features, but this was being deprecated for a long time and it was very, very good to have a big cleanup in the code of Cortex. We also added vertical sharding. So Cortex traditionally, like when you call about sharding on the query, is basically a technique that Cortex used to split a big query into smaller ones. And traditionally Cortex is split by time. So if you have, let's say, if our query is seven days, Cortex is split by a day, run those queries in parallel, receives the result back, merge everything and return. So increase latency, oh, improve latency. But in 1.14, we also introduced the vertical sharding. So like the same query for the same time range, some types of queries can be split into smaller queries that you can also run in parallel in multiple pods and just merge the result back and return to the client. We also added open telemetry, hotel for tracing, just before we were just using open tracing, so it's nice to see that. And actually yesterday, we released Cortex 1.15. And in this release, we are bringing some latest features from Prometheus, like supports for out-of-other samples, if you want to enable supports for Reddisk cache. We have several caches, layers on Cortex, and now you can use Reddisk for it. Arm image that it was long as get by the community, and we finally did it. And supports for the new tunnels from QL Engine. I will talk a little bit more about that. So we also want to welcome Ben as maintainer for Cortex. He's doing an amazing job. He's also maintainer for tunnels. His contributions are already making lots of difference to the project. So thank you, Ben, for being part of the team. So I just said that in 1.15, we introduced the experimental support for the new tunnels from QL Engine. And what is this? Tunnels and Cortex is creating a new from QL Engine and is doing everything from scratch. And why is that? Same as Prometheus, Prometheus Engine is designed to run in a single core, in a single process. And here we have a distributed system. So that is an initiative to create an engine that's fully compatible and can work better in the distributed system. So when we are doing sharding the query, instead of doing tricks here and there, like splitting by day, splitting by something, we can have a fully support of distributed query and multi core queries in the engine itself. So this is, I read, experimental, like you can enable that in Cortex in a feature flag. That is like this engine is new, it's in development, it's constant changing, but you can try out as experimental features on Cortex right now. So we also have a new project, sub projects on Cortex. It's called from QL Smith. It's basically based on the SKL Smith. But basically this project is trying to help us to do first testing on Cortex. We just said that we are doing vertical sharding, horizontal sharding on queries and also trying out new Prometheus engines, but we need to make sure that we are not breaking the semantics of Prometheus. So this project runs now on the new engine and also on Cortex and do first testing, like we generate a bunch of data, create a bunch of different queries. We run Cortex with the old engine, with the native Prometheus engine with no optimization whatsoever. We get these results, we run with the vertical horizontal sharding and with the new Prometheus engine, we make sure that everything is the same. So we are not breaking any experience. Now back to you. Yeah, okay. So comes back to the question of Cortex. And now, you know, how can we add all these new features and still keep credibility, right? And the question that you see in the slide is, what does a single thing people want for monitoring? And it's already a given. It needs to be reliable. Monitoring needs to be more reliable than production. How can you know if your production is down, if your monitoring environment is not up, aren't really available? But how can we do that? Well, we have to measure it. We have to measure the reliability. In this slide, you're going to see there a very complicated promptual expression. That promptual expression, we're not going to go through it. But the bottom line is you have to make sure you can divide the number of errors you have in your application by the number of requests you have. When you do that, you get a nice percentage that tells you the reliability. What is important from this global expression that you see there is that it's not only the errors that you care about. You also care about the latency, so you measure that. This might not be entirely just for Cortex. You might apply this to other things, but for Cortex, since the beginning, it has been the case that you have to do this to make it reliable. But once you measure that availability, what you have, and this is me telling you things that are common to other applications, but you do have to do that for Cortex. You have to set up an SLO for your environment. You have to set up a realistic SLO. Once you refine that SLO, you're going to have to page if you have problems in that SLO. There's too much to talk about this. There's already a talk that does exactly this. Explains this in this slide if you're not familiar with this. Go to this presentation from KubeCon and watch that out. It's very important to keep your environment reliable. If you try to do some other alerts and you don't do this, you're not going in the right way. But thinking of the way Cortex is designed. Even if all your monitoring NSLIs are green, how can you really say everything is good if you don't talk to your users? You have to know from your users' perspective what they say about it. It's not only important to have everything green. You have to know what they think about it. You have to see if they are able to do what they want to do with the application. What is recommended from the beginning in the Cortex scene has been done. As you can see, we use PromQL to measure latency. We do dock footing. You're expected to monitor Cortex using Prometheus. Doing that, you can do a lot of dock footing. You expect also that you feel whatever users' pains they have in their environments. If you provide Cortex, you're the first-time user of Cortex. There's too much to talk about this slide, but I have to move on. There are more specific things that we need to talk about Cortex. What is the secret source of Cortex since the beginning? How is it that is reliable till today? It was from the beginning made for Kubernetes. A long time ago. Imagine being in 2016 and doing it ready for Kubernetes. Not only that, Cortex is not a separate project from Prometheus. It vendors Prometheus. We import all the Prometheus in Cortex. That's how we can achieve. I'll mention that. There's binary compatibility. At the same time, we also use Thanos. Cortex uses the latest version of Prometheus and the latest version of Thanos. So when you're using Cortex, you're using Prometheus. There's no way not to do that. That's the secret source. How can we keep up with all the new features always coming up with the Prometheus? That's how it opens up. It's also very possible to always go fast with the features and keep the reliability. On top of Thanos, what does Cortex do? On top of Thanos, Cortex does an amazing thing for reliability. That is that it provides limits. The limits in Cortex are done by design. You can, in each tenon, provide a different type of active series, maximum active series, for example, or maximum ingestion rate, or maximum ingestion. There's too many options. Actually, what we recommend is that you create tiers, like what you see in this slide. On the right side, you see a bigger user. You give them more than limits. On the right side, you see medium user. This is how you refine your promise. Remember when we talked about SLOs? Well, you do refine your promise. You say, I'm going to support all your active series to this level. And then until this level, what happens with Cortex, start replying with 400, which is difficult that you're not... Cortex is not wrong. It's just you're sending too much data. Doing that, Cortex remains reliable if it keeps the promise that was made to the users. But there are too many limits on Cortex. In this slide, you can see all the limits that have been added over the years and are still compatible. And you can still use them to refine what you need for your user. There are limits for ingestion. There are limits for querying. There are limits for retentions. We don't have time to go all over them. But I do want to mention the ones that you see in bold are the ones that were added last and have been added in the last year, specifically in this release. Actually, the release that we did yesterday. There are other limits in Cortex. And remember, all these limits are made to guarantee reliability. The reliability of Cortex is the most important thing. You can set up putting gestures, which is a component of Cortex. How many active series do you want putting gesture? So when your ingester receives too many active series, because the outer scaling didn't trigger, your ingesters will never... Or will never have problems. There's still, of course, something that happened, as we'll see in the next slides. But this is another way to keep things under control and keep things reliable. And same thing for distributors. It's very important for ingestion to keep them very close. I'm not going to go over exactly those details. We are going to move to another specific feature of Cortex that we want to explain to you, and it's the replication factor and the quorum. Thank you, Frederic. Yes, having per instance limits, per tenant limit is very important. It protects the service and protects tenant abuse. So we do that, but on top of it, we also do some things to improve the reliability even further. We'll go through three techniques that Cortex implement, but there is many more there. So the first one is a simple quorum. Usually you configure Cortex to have a replica of three. This means that on every right, your date is replicated across three instances. You try to replicate across three instances, but if you have two out of three success, you return success to the customer. So this happens on the read and on the right path. Here, you already have some reliability in the case of one instance feeling. The second thing is a technical shuffle sharding. So what shuffle sharding? Shuffle sharding is just like a way to allocate part of your fleet to some of the two different tenants. And you allocate a fleet in a way that you try to avoid that too many instances or too many nodes overlaps between two tenants. So tenant A can have ABC nodes, tenant B can have CDE. So they don't have to be the same. You try to minimize the overlap of instance between tenants. We have a graph here that in this case, for instance, if you have 50 instances and a shard size of four, you have only 3% of chance of between two tenants, they share more than one instance. This is very important because let's say that even with all the limits, with all the protections that we put on the fleet to avoid abuse, one of the tenants can cause some of your fleet to go out of memory or use a lot of CPU. And as we said before, if you have one bad actor that is causing problem in four instances of your fleet, you only have 3% chance of impacting other tenants as you only have 3% of chance of sharing more than one instance. And with the replication factor, if I have one instance down, I'm okay. So this is shuffle sharding. We also have a thing called ring. Ring is just like service discovery slash consistent hashing implementation that Cortex has. So the ring is like, so every time that we scale up and down your fleet, the new instance will join the ring, and some of the load will be shifted to that new instance. The service discovery part is like the instance itself, like Registrator and the consistent hashings that is a consistent hashing. When a new instance comes or we don't resharpen all the time series, but just the time series or the load that will be migrates to the new instance. For the ring, we have multiple backends, traditionally like console, TCD and member list. We just added DynamoDB as a serverless backend, let's say together with member list, you don't have to run anything else in our cluster if you want to use this implementation. Also, Cortex implements zone aware replication. This can be enabled on Cortex and basically this is to Cortex that besides replicating my data across 3 different instances, I need to make sure that my data is replicated across ASIS. This is very important because you can have your cloud provider can have on ASI failure, and when this happens, it's exactly when you need more metrics and telemetry, and then like Cortex will just still work as fine, like even in a case of a total ASI outage, Cortex should continue to work just fine. We also have other projects coming up soon. We are implemented out the gateway. Cortex doesn't do authorization and authentication traditionally. We have on Linux foundation mentee working on that. We also have another mentee project to import blocks on Cortex. This has been asked by the community a lot, like I have my prometheus running and then I have scale problems or like I want to do Cortex. How can I import my historical data to Cortex? This has been implemented by mentee as well. We have down sampling and down sampling can help a lot if you have queries that's over a very big period of time. I want a query that's from the last six months. So like I reduce, I down sample the data so like the query can I end up querying less samples and speeding up the queries quite a bit. Federated rules is another feature asked by the community. I allow you to query my data to create rules and alerts and native histogram. It was just added in prometheus. We already have prometheus version with native histogram support, but we still have work to do in our side that's being done as we speak to support native histogram. Right. So we typically hang out in the Cortex C and CF channel. If you are having joined that, you should. That's where we are. If you have any questions, hit by. Somewhere it might take time, not too long, probably. Probably every day. If you have some issues, you can file issues. You can search through issues. Something is important. Of course, PRs are welcome. We have mentees. If you want to join, if you want to help out, yeah, you are more than welcome. And this is my actual puppy. Yeah. Okay. That's it. That's and done. Thank you. Yeah. So we're going to do a very five minute or whatever questions you have. Anybody has any questions? No questions? Yeah, there is one. Okay. Hey, guys. Thank you for the good talk. Hey, questions regarding the cardinality. Is there any way to improve observability on the Cortex layer for cardinality? Or do we have some plans for that? Yes. Yes, I like that you answer that. So, yeah, I'm very interested in that. And we have been talking this week a lot with the community, and that's been very asked, I might say. How can we do? I have to say there have been internal approaches that I know have been working on, like many people are working on doing reducing cardinality. But yes, there's something that we can do there. We are thinking about it. Currently, though, you can still aggregate metrics in Prometheus and then send the corner rules to Cortex and don't send the raw metrics to Cortex. And doing that, you can reduce cardinality in Cortex, not in Prometheus, but at least on Cortex. Yeah. Go ahead. You are talking about improving visibility or improving control of cardinality. Yeah, we have some ideas, but if you also have some recommendation or some, oh, this would be great if we do that, please create an issue, we will track that and we will find out. It's something that everybody's asking. Yeah, that was a question on high cardinality. Anybody else? I'm hoping that was good. Yeah, okay. Anybody? Any other questions? Come on. More questions? I guess one. Do you want to see more puppy pictures? I do have more. Yeah, we did mention a bit of it, but I'll just summarize it again. Thanos and Cortex are based on Prometheus. Prometheus, Thanos and Cortex are all projects from CNCF, so we all belong to the CNCF foundation, open source, so we share a lot, a lot, too much. Many of the bugs in Cortex are actually on the other projects. So we don't fork Thanos, we don't use another version of Thanos, we just reuse Thanos. So Thanos has an implementation, has something that they do, that it works in a certain way, Cortex reuses that and exposes it in a different way, but on the back end it's very similar. Do you have something to add to that? Yeah, so we use Thanos, Thanos uses Cortex. I think in my view, the main difference right now is Thanos has the receiver mode, which is basically, it's very similar to the feature that Cortex provides. So you can remote write, you can read on Thanos receive mode, but I think the main difference is that Thanos does something, Thanos, you need an operator, you need to scale your fleet up and down, you need to create an operator, change some configurations, like you add one node, you change your configuration in some way, and then it starts running. Cortex has the hash ring, so scaling up and down an instance on Cortex is just basically, you can use HPA, and then the new instance comes, sometimes series is shifted to the new instance, so it's just two different ways of achieving the same thing sometimes. Does that answer the question regarding Thanos and Cortex? Yes? Okay, cool. Right, one minute for more questions, and more questions, but one more. The question is, there are a lot of Cortexes around, like another project, like another product. Well, Cortex here is an open source project. I don't know what the other, yeah, this is like very useful name I guess, but yeah, Cortex exists since six years ago, so yeah, pretty much established brand, I would say, from CNCF. Yeah, the question is, if implementing sharding was hard, well, I was there, I was not a maintainer when that, but it took a long time. You were part of that? Come to the mic. I was using Cortex already, I was not the one that was participating in the implementation, but I guess sharding and shuffle sharding, sharding was there since the beginning. Cortex had the shuffle sharding later to try to improve tenant isolation, and then it was just something built on top of the sharding that was there before, so yeah, it was more like implementing an idea. I don't think that was a huge change. Good question. Yeah, I hope that answers. Cool. And we're done. Thanks, everyone.