 Hey everyone. Thanks for joining us Today, I'd like to talk about how we scale Prometheus using Cortex a CNCF project Designed around really building kind of a clustered version of Prometheus Hopefully going to take about 20 minutes of your time. It leaves lots of time at the end for Q&A I'm joined today by by Ken Ken was one of the first people I guess other than myself To to start running Cortex over at EA three years ago Ken's now at Microsoft where he's a principal software engineer and he's also he was one of the first Cortex maintainers as well Myself. I'm the VP product at Gravana Labs. I'm a Prometheus maintainer one of the original authors of Cortex I also started a project called Loki, which is our horizontally scalable Kind of Prometheus inspired log aggregation system So today we're really we're really going to treat this super super simple Why Cortex? Why should you care? What does it do? How does it help you? When is it appropriate? We're going to have a bit of a demo and then leave plenty of time for a Q&A So this is this is my mental model for for how people get started with Prometheus This is how I started with Prometheus I really love the Prometheus system and this talk is not going to be about bashing Prometheus The things I love about Prometheus are that this pool-based monitoring system that uses kind of dynamic service discovery to To find your jobs. I really love this because it just kind of works for me You know, especially if you're running your jobs in a Kubernetes cluster It just finds them. It scrapes them. It gives you that kind of monitoring experience. You really you really want it's got this incredibly powerful query language and Multidimensional data model which makes it so easy to do kind of ad hoc analysis and and compare application level metrics and system level metrics together in one query Also allows you to do some kind of really powerful stuff in alerting like an error budget and SLO style alerting and Finally, it's open-source. It's incredibly resource efficient and it's really easy to operate So in general, Prometheus is more than enough for most people In this kind of model, you know, if I'm running every all my jobs in a single cluster I can I can deploy a Prometheus and a Grafana and and really start collecting metrics and building dashboards in in You know within the hour And this works really really well You know talking about dashboards, you know Grafana another a great open-source piece of software You know really easy to drop to build dashboards using Prometheus and this is one of the One of the dashboards we have internally for one of our demos The challenges with Prometheus and and this model really come when you start deploying your software in multiple disparate clusters You know, especially if these clusters are disconnected This pool-based model that Prometheus has really encourages you to deploy the Prometheus server Co-located with your jobs like next to your jobs in the same cluster And so what we typically see is when people start to service customers in Europe and in Asia They'll deploy their application in a region there and and deploy some Prometheus with it And at this point, right, you've got your Grafana pointing at one of them. How do I start, you know How do I start monitoring my other my other regions? A lot of people will just deploy multiple Grafanas One of the things you can do. I'm sure most people are familiar You can actually use Grafana with multiple data sources And what's even cooler is you can sub out The data source you can template out the data source in your dashboard to allow you to switch between different regions So this is this is kind of, you know, pretty mature already and pretty usable You can easily drill down into metrics in a single region You can you can figure out what's going on and in a recent Grafana release You can even start combining metrics from multiple regions The challenge here comes when I want to look at my global metrics in a single query Let's say I want to look at my global CPU usage You know, maybe I'm doing some capacity planning and I'm deciding where I should deploy more more resources It's hard to do if I have to connect to three different Prometheus servers to to do that But all is not lost and this can actually be achieved You can actually achieve a solution to this without Cortex And so I'd be remissed if I didn't mention Federation Federation is Prometheus's answer to how to build kind of global observability You deploy effectively another Prometheus server on top of your existing ones That Prometheus server goes out and scrapes the edge Prometheus servers And, you know, gives you that central place with all that data There's a few challenges here and this is really where we set out to kind of help with Cortex The challenge here for one is your global Prometheus server has to be able to connect to and scrape metrics from your edge Prometheus servers And this means if you don't have like a fully connected kind of set of VPN tunnels You probably need some way of exposing these Prometheus servers to the internet You probably need some way of securing them and controlling who can access them Opening up firewall ports, things like that The other challenge here is you can very quickly overwhelm that single global Prometheus server As you start to scale up the number of metrics in it Prometheus is a very vertically scalable piece of software And you can just take a bigger box and run it on machines And recently we saw someone running Prometheus with like a terabyte of RAM Which was pretty impressive But that's not the kind of model we're used to in the cloud native world We're really looking for something that's more horizontally scalable there So the solution we recommend in the Prometheus space is that actually you don't propagate the raw samples to the central global federation server But you only propagate pre-aggregated data So you can imagine at each edge node you might have metrics per pod But then you might only propagate metrics per service for per deployment per stateful set to the federated server And this is one way of kind of controlling the cardinality in that federated server This works really well and there's a lot of kind of best practices around how to manage the recording rules you need How to make sure that only the right metrics get propagated But at the end of the day it's overhead that you need to manage You'll also find not having the raw data available in a central location Means you have to be careful about how you construct your queries And it's not necessarily the case that the same query will work locally in an edge As it would work globally in the central federation server All in, this is quite a powerful technique and it's very simple and reliable But it can be quite tricky to master So this is really where Cortex comes in You can deploy instead of a global federation server You can deploy a central Cortex cluster And as Cortex is horizontally scalable, you can scale that cluster up to take all your raw metrics Having basically the union of all the metrics in all the edge prometheuses all in a single place Cortex itself is also highly available through replication And we've basically come to a lot of effort to accelerate queries and to make this system functional for this use case The horizontal scalability in the Cortex cluster also allows you to grow and shrink the cluster as you add or remove edge locations Generally it's really designed for that kind of global visibility into your metrics So once you've got something like this, in Grafana you can start doing queries like sum by cluster And see which cluster is using the most CPU This is actually a query of the 15 or so clusters we've got at Grafana Labs So a bit more about Cortex, the horizontally scalable Prometheus implementation Cortex is a time series database like Prometheus, it actually uses the same time series database as Prometheus We add a lot of distributed systems glue to make that database horizontally scalable The other big difference with Cortex is it's push based So you don't have to open up firewall ports and worry about securing every single one of your edge locations You just have to worry about securing the Cortex cluster And you can have Prometheus natively push its metrics using its remote write system directly to Cortex Alongside the glue we've put in for horizontal scalability we use something called a distributed hash table We've also added replication, this allows Cortex to tolerate failures in the nodes without ending up in gaps in your graphs This also means it's super easy to kind of do a rolling upgrade with zero downtime in the Cortex cluster And then once you've got all your data in one location it's pretty obvious you're going to want to store it And you're going to want to store it for a long time Cortex offloads a lot of the long term storage aspects to object stores And so really you connect it to an S3 bucket and it will go and put the blocks of data in there for long term storage Prometheus itself is totally totally useful for long term storage And you know as long as you've got a big enough disk and you take regular backups You can store data for years in Prometheus But in Cortex we care a lot about durability, we care a lot about replication We care a lot about making sure that if a machine fails there's backups and replica of that data for you And the final thing that really sets Cortex apart is its support for multi-tenancy Inside a single Cortex cluster multiple different users can be isolated from each other with their own data sets that only they have access to And we go to a lot of lengths to make sure that not only is the data isolated but also from a performance perspective One user can't run big queries and can't kind of start sucking up all the resources of a single cluster There's a lot of kind of sophisticated quality Sorry about that There's a lot of sophisticated quality of service, a lot of sophisticated kind of limit and quota management All baked into Cortex so that if you're a central observability team within a large organization You can have different teams within your company share the same Cortex cluster And as we said this is no longer a sandbox project instead in fact I have to update the slide Cortex is now a CNCF incubation project, it's Apache licensed, it's open source It's got a vibrant maintainer community and really I would encourage you at this stage to go and get involved and try it out A bit of history, I started the Cortex project with a chap called Julius who started the Prometheus project We started it almost four years ago, no over four years ago now, wow We initially used DynamoDB for a lot of the storage requirements After that I added support for Google Bigtable and that was about when Ken started using Cortex at EA I understand though still on DynamoDB Shortly after that we added support for Cassandra and really had the ability now to start running Cortex on premise and out of the clouds And really one of the things I'm very proud of with Cortex is I feel like we got the right path, the scalability of the right path in Cortex We got it pretty good pretty quickly and this allowed us to move on and start focusing on query performance So we put a lot of time and effort into parallelizing, sharding and generally finding ways to horizontally scale queries And I feel like I've given talks at KubeCon before about the techniques we've used, I'd really encourage you to go and look at those And I feel like one of the things that really sets Cortex apart is the focus for the past two or three years we've had on query performance And on caching and on parallelization and on sharding We joined the CNCF Sandbox in 2018 We've got more maintainers now, a lot of maintainers at Grafana Labs, Gotham, Marco, Peter, Jacob But also Ken, Microsoft, Chris at Splunk, Brian at Weaveworks Since for about a year or so we've been focused on ease of use and on community We've put a lot of effort into making it easy for people to get started with Cortex Putting a lot of effort into a website, getting started documentation, documenting our configuration file, this kind of thing And hopefully now if you follow our instructions on the website you'll be able to use Cortex in half an hour We did our first 1.0 release, well only hopefully 1.0 release earlier this year And that really for me marks the start of when I think non-mainteners of Cortex can really start to lean on Cortex in anger We've been seeing a lot of adoption of Cortex over the past year in some quite large companies And very recently, a month ago, we launched what's called the block space storage engine in Cortex So this is the same storage engine that Thanos uses, it's the same code, Marco who did this in Cortex is also a Thanos maintainer And really marks a lot of collaboration between the Cortex and Thanos projects We work together on accelerating the performance of this storage engine on things like caching and scalability of the query path for this storage engine And this really also really helps Cortex because it reduces the number of dependencies down to just an object store This makes it much easier to get started with Cortex and also makes it significantly more cost effective to run a very large Cortex cluster The design doc on the right is the project Frankenstein design doc that Julius and I wrote a long time ago now, 4 years ago Originally Cortex was called project Frankenstein, but we've renamed it since then So thank you for listening to me, at this point I'm going to hand over to Ken He's going to show you how easy it is to use Cortex and how it can scale over multiple clusters Over to you Ken Alright, let's jump into a demo What we're going to show here for the next few minutes is collecting metrics from various Prometheus instances And sending them to a central Cortex installation which then we can do global aggregates and reports with Via Grafana Alright, so what I've set up on my local docker is I have a three node Cortex installation And an instance of Grafana running on a docker network that's separate from this Prometheus and node exporter instance They're running on their own to simulate they're running in a particular region while your Cortex installation is centrally located somewhere else Not really particular where just that the Prometheus can send data up to it So our Cortex ring is fully up and running let's just double check hit refresh a couple times there so we get recent data So we got three nodes in the in the member list ring and they are all active and have their tokens registered So this is also fully redundant for fault tolerance so any one of these notes can go down and we can keep processing data and querying data So back to console so Prometheus is sending data and we're going to add two more regions here to the mix So we're going to create a region to add its Prometheus and node exporter So there's number two three going here to swell so while those are setting up these Prometheus configurations are not complex in any means This is pretty much the basic sample Prometheus configuration has been updated to great from the node exporter source as well The main pieces here that are important for the demo is they're all pointing to this URL remote write data to so everything they collect they'll push up to that URL Which is where Cortex is listing to write the data into its storage to differentiate each region we're adding a region tag to each one So this one's from region one there's one that says region two three and we'll see that on the Grafana dashboard here momentarily So if we go back to our or we flip over to Grafana here to global overview I created this demo board to just showcase just some some sample metrics that were coming in from the node exporters So I mean collecting region one for quite a while and we just extended out to two and three so those lines are just starting to show up now So these these this is one one query on one data source that's just highlighting that you can now see that there are multiple time series coming from the different regions You can see all that in one view and in one query so this also means you can do aggregates on this data as well which is what the total is the total graph shows as well So we zoom in on that I mean it's just summing all that data together and also because everything is also being brought in from all the nodes now as well Beyond just this global aggregate we can now take something like the node exporter and we see all the nodes here with data So if we click each one of these you'll see node one had a lot of data because it's been recording before I started the demo but node two and three are just being added now So they're just starting to send in data So while this was all on my machine deployed locally using Docker networks assimilated it should it should have been a good inspiration of how straightforward it is to take any one of your Prometheus installations from any of your environments and now point it to a central Cortex installation for you to store the data query the data and aggregate it with other installations from a single dashboard. Thank you. Again that was great. Now we're going to we're going to try and take some Q&A do bear with us but but hang around and ask any questions. Thank you so much for listening. Bye.