 Hello everybody It's really nice to be here This is the first time for both Nikolas and I speaking at KubeCon or any co-located event. So Thanks for having us So without further ado scaling tactics for Prometheus metrics collection, but before a quick introduction Cool. Hello everyone. Pleasure to meet you. My name is Nikolas. I'm observability tech lead at Coralogs And I'm currently a Prometheus operator and purses maintainer and and also I'm mentoring some Colleagues on GSOC 2024 for the Prometheus operator as well My name is Arthur. I'm also permitted operator maintainer besides that I help with Prometheus client goaling I've been mentee for Prometheus in Google Summer of Code last year I was also mentee for keyword in 2020 and this year. I'm finally mentoring For both projects. I mean for LFX and GSOC, but for permitted operator and Prometheus Just to be aware like who is using Prometheus in production here today? Wow, okay, and who is using Prometheus operator to manage Prometheus on Kubernetes. Yeah, thanks people nice That's as more than I expected but Even though you already know Prometheus operator, let's do a quick TLDR or Presentation focuses a lot on the breeze is operator So it makes sense to just explain it Prometheus operator is used to manage a set of tools on top of Kubernetes And those are Prometheus Prometheus in agent mode. We also manage alert manager and tennis ruler They scrape configuration for both Prometheus and Prometheus agents are done through more CRDs The pod and service in monitors. They are used to as an abstraction on top of Kubernetes service discovery We also have probes to configure black box exporters and we have a brand new CRD Called script config. It's still an alpha But the target is that we mimic the whole Prometheus configuration in one CRD It's the configuration for this service discovering like Colonsu and then so what yeah, like Gcp Azure all the other services discoveries that are not Kubernetes, but we still want to support on on primitive operator Finally, we also have the Prometheus rule CRD which is used to configure alerting and alerting and recording rules for both Prometheus and also the tennis ruler Cool, but today we are here to tell you a history and this is Eva and we are telling you her history during automating some manual works done by her family on on their Their farmer so Eva's decided to unite her passion for technology and their business their family business So she notes that their parents are doing a lot of manual working like Measurement soil humidity soil p8 like irrigation systems and so on and thank you me to journey to allow us this generate this cool images Well, she just started simply implementing IoT devices to monitor soil humidity and these devices are of course exposing Prometheus matrix and This simple devices are exposing around 100k unique time series, which is like pretty low amount of time series like she is using Prometheus operator because of course if you are using virtual machines, you're gonna use in Kubernetes This is the only way to go And she's using permit operator. She defined the Prometheus CRD one gigabyte of memory It's more than enough to to handle all this this time series now. She's alerting. She's dashboarding and very happy She she enjoys using the setup using it devices Prometheus. So she starts to She wants to do more and she's now replacing the manual work She was doing to measure pH and she's using new devices for pH measurements with this new pH Devices they they are a little bit more Revolve verbose and the time the amount of time series that one single Prometheus were scraping went from 100k to 1.5 million With 1.5 million she notices that one gigabyte is not enough anymore Prometheus gets all out of memory killed and okay, what we what do we do now? So she came up came up to us note to self mid-journey doesn't do well with tattoos but anyway She she asked us how to how to scale this Prometheus initially and we suggested that she could use a simple query just Get the amount of memory that Prometheus is using Divide by the amount of time series you have in the head the head block The the number is not a hundred percent correct correct But it gives you a very rough estimation of how many bytes per series you're you're using So if you if she gets the 100k time series divide you you get the amount of sorry She has a number for the 100k. She just multiplies this for 1.5 million and she gets another number and She knows that around 10 gigabytes is more than enough to measure this new load Cool, so 10 gigabytes like probably everybody here first thing that we did when we start getting all in queues Let's put more memory, but the success is too ongoing and she just notes their parents bought two new Farmers and these new for these new farmers is producing now in total in total 10 million and a half unique time series well 10 gigabytes will not be enough again and But she she were fine because she knows what to do. She just need to run the expression that are to shared and have Some kind of estimation to understand how many how much memory she needs and so on so She came up with the Prometheus using 70 gigabytes 20 cores It's pretty okay medium big size may be but she start to face in few drawbacks that we have when we start having like Prometheus with 70 plus gigabytes of memory Does anyone know like any kind of common problems that we might see like remember she set up is just one Prometheus instance running with like 70 gigabytes is anyone here using a Prometheus with 70 giga or more and not happy with it. I guess Grafana, maybe Cool some drawbacks on these Scenarios single point of failure because she's using only one Prometheus replica. So when Prometheus got on I'm killed You you do she's not rate able to call ingesting any metrics While replace taking a little longer even if she's using like taking memory snapshot during shutdown And depending on your host provider, you might spending more money or have some difficulties to Allocating bigger machines than smaller ones So again, she came to us we started brainstorm some ideas on how to scale Prometheus and and We we mentioned that When you get a very big Prometheus, it might be a good idea to keep them in small and just start to scale horizontally And this and Prometheus is what we call sharding Sharding is the strategy of splitting targets between different Prometheus instances and it's different from HEPLICUS where HEPLICUS you have the same Prometheus scraping the same metrics for a highly availability Some common strategies for sharding that we like to call the functional sharding we have also hash mod sharding We can do a mixture of functional and Hashed we can also run Prometheus as node agents So there are basically several strategies and what we explain to Eva. We are gonna explain to you now The first sharding strategy like it's the functional sharding. Maybe you might see being reference lies as Vertical sharding as well, but we prefer the functional one and this means this this is the ability to group the same the same Group targets under Prometheus instance So on on this context Eva's can group all the soil pH devices and being monitored by one Prometheus instance While the soil you need it is being monitored by another one And which is cool about Prometheus operator is you can achieve this in two different ways The first one we are seeing right now. It's using Name space selector. So Eva can just run each devices in different namespaces and then select Only the monitor his earth is like the body monitors the servicing monitors on these namespaces, okay? The other strategy if you cannot using different namespaces because why not run everything in the full namespace, right? She can just selecting everything in our namespaces and Using labels available on the available on the monitor his earth is such as the podge monitor scrape configs And so on and have different Prometheus doing this label selection This is the way that we can we can achieve functional sharding own Prometheus operator the next one is Hash mod Prometheus has a very cool relabeling option called hash mod, which is used For exactly this use case to shard targets between different Prometheus's No, Eva read the Prometheus documentation She read the service monitor documentation and she noticed that she could use the some cool relabellings to achieve that so she has She has one service monitor with all the hash mod configuration. She has one Prometheus matching the labels, so she scrapes the metrics for for this specific service monitor and She also applies another Prometheus and another service monitor just changing the labels and the hash mod value But honestly, this is quite hard I like I I work with Prometheus for quite a few years and I still have a hard time writing relabeling configurations So Prometheus operator has another option You can just configure one Prometheus instance just tell on the manifest how many shards you want and all the Relabeling configurations are written Everything just just works Cool and if we have two different strategies, why not mix both right because this is Always a good idea But jokes apart like if you have a group of targets is big enough and you need to Contain the amount of his earth is your primitive services is using you can mix the functional sharding and then using Hash mod sharding as well We just need to play with the label selectors name space electors and the the target The the shards is back This is commonly used when you know that one group is a lot bigger than the other and you might want to shard to do a hash mod sharding even Yeah, just mixing that the both of them the other The other strategy that we mentioned is the node agent. Unfortunately, Prometheus operator does not support Running Prometheus as the demon set yet I know that Bartek and Max from from Google they have another Prometheus operator from Google and they do implement this They will have a talk later on So we suggest we suggest that we you watch With that said Simon who is sitting right here Myself and came out from polar signals We are mentoring someone a student this year to implement the node agent strategy in Prometheus operator So you can expect support for this by the end of the year Okay now Before we have one Prometheus and now we have hundreds. We don't even know the number because Because it's just too many I guess So we have new challenges with this the two most common problems are single place for querying and how to How to do rule evaluations when some metrics is in one Prometheus other metrics and in like the data is all spread around So how do we do that? The permission operator has nice support you can deploy Prometheus with tennis sidecars We don't we don't have a CRD for tennis courier, but you can deploy your own tennis courier points Discover all the the sidecars with DNS discovery and we have you have now tennis courier as a your single place for queries another strategy is deploying yet another Prometheus and use the Prometheus Federation The the strategy for our learning is quite similar. We do have a CRD for tennis ruler you can just use tennis ruler point to the Tunnel sidecars and you have a single place to evaluate all your alerts If you need metrics the duplication if you're using Prometheus replicas, you can use the tennis courier Point your tennis ruler to the courier couriers the dude all of the down sampling The duplication and that's another way to get the the single place for rule evaluations Cool, so now Eva's she has options for charging their there. She's her Prometheus data Like make this more scalable But she now deploys She just let let us know that she's going to use the charging option because it's easier She just need to increase or decrease the number of shards and like easy easy to go But like now she has an automated farm like a lot of IoT device is a brand new farmers and now a brand new Irrigation system like this is from time to time Trucks are crossing the the farm irrigating the farm and so one and this is causing like some Seasonality increase on the time series because in the morning you need to keep your crops fresh But in the noon you don't need but in the evening you need again, so the seasonality is very Very high in the morning in the end of the day Summarize sometimes we have spikes on exposed metrics sometimes the amount of metrics exposed go very very low And this is like causing when the in the spikes moment Some Prometheus are getting I'm killing and then she's not happy because This is not not so good, right? But I have a super look because from its operator is just allowing you have some out scalable shards are to just raise a PR few months ago implementing this Cool feature on the permit operator where you can use some HPA flavor like or a native HPA in Kubernetes Or any project like kid and so on and you can have some scaling decisions based on memories age for example if you're using playing HPA Kubernetes object you can see on the image on their own on your left You have the thresholds for memories age the mean and max happiness and like we are gonna scaling the shards Proper on the Prometheus CRG on the other side we can also use Keda to Have a more smart scaling decision. Maybe you can scale by the amount of time series You are ingesting for example, and you can have these in many different ways okay Eva is now a Prometheus operator expert. She is ready to go to Mars and start at her brand new farms on Mars A quick recap A quick recap we suggest start simple small Vertical scaling is super easy start with workers Kaling once Vertical scaling once you are annoyed with wall wall replay when it wants is too long. Okay. Now start you can start thinking about horizontal scaling Remember we have several ways functional hash mod a mixture note agents Once you go there that way you will have to think about global curry fields ways to do centralized alerting If your metrics have spikes and like if you have to upscale downscale Remember we have integrations with horizontal out with HPAs and Yeah, that's it Cool folks So if you would like to see everything that we show you to Eva, but in practice join us on Friday We are gonna run contribute fast me art to Bartek Max and Jesus on Friday We are showing you these in in in live so hope to see you there and Thank you to be here