 Today, I'm going to try being an emcee, this is my first time, so bear with me. Thank you. And we have David and Jesus with us from Cystic, right? And they're going to talk about scaling your Kubernetes cluster with custom metrics using Cata and Prometheus. Hello, everyone. How many of you have nightmares when you have to deploy an HPA in a Kubernetes cluster? Any of you? Come on. Really? We are the only ones? You can't believe it. We've had several nightmares using HPA, especially in life environments, in life Kubernetes clusters. And how many of you have afraid to deploy an HPA to in a live Kubernetes cluster? Because it's scary, too. So my name is David. My name is Jesus. We work at Cystic. We help our customers to understand what our cluster is working. Yeah. Yeah. Again, what's an HPA? An HPA is an horizontal bot of the scholar that allows us to scale our workloads, base it on memory, CPU, and it's great in many scenarios. But HPA is only for workloads. So if you are going to scale your cluster, you have to use another thing. That's it. Kubernetes cluster autoscaller. H is for horizontal. So if you want to scale your workload vertically, you have to use a vertically bot autoscaller that comes with Kubernetes. So we're talking about our application. There are a few tools that we can use, and depending on each scenario, we'll be using one or another, just for summarize. So everyone is on the same page here. We can use HPA, VPA, and cluster autoscaller. We'll be using HPA and VPA for our scaling workloads. And we'll be using cluster autoscaller for scaling your cluster, your machine. HPA is for pods. VPA is for limits and requests. So we'll be using HPA if our application can be easily distributed, for example, a microservice. On the other hand, we'll be using a VPA if our application needs just one single core thread with higher resources, for example, a monolith. And we'll be using cluster autoscaller in both previous scenarios when we run out of physical resources. So here's the thing. HPA has some limitations. We all use HPA every day, but it has some limitations. HPA doesn't allow combining metrics, and it exposes just a reduced number of metrics by default, which are basically CPU and memory. These metrics come from the Kubernetes metrics server. If you want more metrics there, you have to make a custom implementation of that server. So sometimes we need more. For example, imagine that you want to monitor your application using the golden signals. You might need to monitor your traffic, your saturation, your errors, or your latency. For imagine that you want to combine those metrics in formulas or make mathematical operations with them, we were talking earlier about how Prometheus let us do this, right? Or make aggregations. So what if we could use Prometheus and PromQL to feed our HPA? That would be great. So let's see a couple of scenarios to show what we want to say here. Imagine we have an ng-next server, and we want to scale it up to five pods based on saturation. What's the problem here? ng-next doesn't increase memory or CPU when it's saturating. So with the out-of-the-box metrics that HPA can use, this isn't enough. What would be the solution here? The solution here would be to use the fantastic ng-next Prometheus exporter, which has this metric, which is the connections waiting. We could calculate the average of the connections waiting and set up a threshold, for example. If the average connections waiting for the ng-next are above 20, for example, we'd like to create another pod. Let's see a more complex example. Now let's say we have three apetite tonkatsu pods, and we want to scale it up to five pods based on the memory usage versus the memory limits. So the problem here is we cannot get the limits in the HPA that Kubernetes gives us, because the metric is not there. And also, we could fix a threshold for a particular workload, but if we have to change the spec or we have to change the request or the limits in this case, we will have to change the HPH too. So the solution here is to use the Qt pod container resource limits that give us a QSN exporter and use the container memory usage that gives us the C-advisor that is already running in Qubelet. Having the container memory usage divided by the container limit, we are getting the percentage of the memory usage by the workload. So we can, let's say, we can use a threshold of 85%, and we will scale it with that. So one of the solutions here is using Prometheus Adapter. The Prometheus Adapter is a custom implementation of the Kubernetes metric servers. This adapter uses Prometheus as a storage metric, so we can use all metrics that the supporters give us. But also, we can use all math that Promeql give us. We can create aggregations, and we could use all the power that Prometheus gives us. Prometheus Adapter is great, but you cannot use it in all scenarios. Imagine that you have a Prometheus server outside your cluster, or maybe you have a Prometheus that is authenticated. So let's meet Kira. Let's meet Kira. Yes. Let's meet Kira. Kira is a CNCF incubating project. It's an open-source project that you can deploy on your Kubernetes cluster. Basically, what Kira does is more or less the same as Prometheus Adapter. It creates a custom implementation of the Kubernetes metric server, so you can feed it with metrics from different scalers. One of these scalers that Kira brings us is Prometheus. If you take a look at the website, there are tons of them. What Kira does is it brings us a custom resource that we can use to create easily an HPA. The main benefit of using Kira is how easy it is to deploy an HPA. It's just a couple of steps. It's easy, really. And also, you can use Kira to outscale your applications with metrics from a Prometheus server outside your cluster, or a Prometheus server that is behind an authentication. So this is a summary of the cases and areas. If you just need the CPU and memory that came out of the box with the default HPA, you can use Kubernetes HPA. If you need a more powerful, more sophisticated metrics, you can use Prometheus Adapter if your Prometheus is inside your cluster and it doesn't have authentication. And you can use Kira in the other cases, like any other case. I mean, you can use Kira in all the cases. So to install Kira, it's pretty straightforward. You just create an empty space and you install the hand chart there. And then this is where the magic starts. This is the custom resource I was talking about earlier. This is a scaled object. In a scaled object, you have to set a few things. For example, what is the target? In this case, the workload target you want to outscale. In this case, it's the engine server, which is a deployment. And then you have to set up the mean replica count in this case 1 and the max replica count in this case is 5. And finally, what is the type of the scaler that you're using? In this case, we're using Prometheus. We love Prometheus. And what is the query that we are using? In this case, this is the query we were explaining earlier in the first scenario. We want to use the average weighting connections of NGINX. And we also have to set a threshold, in this case, it's 20. So basically, this is a solution for the first scenario we were talking about earlier. If the average weighting connections are above 20, the HPA, we will create a new pod for our NGINX. And when we deploy this scaled object in our Kubernetes cluster, if we install Kira, this will generate automatically the HPA in our cluster. And that's it, that you have your HPA working based on the data you want. So let's see a demo. Yes? It's a video. Yeah. No, no, no. Yeah. Can you start again? Okay. So we are here, we have an NGINX server with an NGINX supporter running, and we are going to apply the HPA object, the Kira, the object scale, sorry, that we saw before in the last slides. And this is the object. So we have the same that we saw. So this object, this scaled object has created an HPA automatically. So we are going to see how it's looked like in QCTL. So that is the HPA. We have now two replicas, we have automatically, the HPA have auto-scaled to two replicas and we have the main pods, one main pod and five pods. And now we are going to scale the generator. We have a generator to create more connection so we can see how the scaler works. So now we have five traffic generators. And now we are going to see the pods that we have two. So right now we have two still, like we saw in the HPA. And now we are going to show the traffic generator pods that are the five that we escalated before. Now we have another one. So if we wait some minutes, because the scale has a threshold, so we want to wait some time, we will get that one. So we will get all the pods that we are expecting. So right now the HPA has the max pods. You can see five replicas of five in the max and we can see the pods are escalated. So we have five. And the metric is there, so we have nine waiting connections. On our Kubernetes cluster, but we need to monitor it. So I was wondering, do you guys know any cool monitoring tool to use to do this? No? Maybe NADJOS? Fortunately, Kira exposes Prometheus metrics. Not by default, you need to set it up when you install it. Just use that options over there, Prometheus metrics server enabled. And then Kira will start exposing four Prometheus metrics. The first three are four errors, which are great for creating alerts and have a picture of what is happening on your HPA or on your Kira. But the last one is very interesting. This metric gives us the values of its metric that are automatically generated by Kira in the Kubernetes metrics explorer. So you can see the picture of actually what is happening and why your HPA is doing whatever. So with these Prometheus metrics, you could create a cool Grafana dashboard like this one to have the Kira errors aggregated by a scaled object or just all errors. You could have all the metrics that are automatically generated and also use some HPA metrics to see the health of your HPA in Kubernetes, not just Kira. So that's all you need to know to start using Kira in your Kubernetes cluster. Let's summarize a little bit. The main goal of this talk is to show how easy it is to deploy an HPA in Kubernetes with Kira. It's really easy. It just exposes a custom Kubernetes metrics server with Prometheus metrics. These metrics can be from Prometheus, but there are a lot of scalars, so maybe depending on your use case, you can find any other scalars that fits into your business. You can use Kira in almost any real-life scenario because it's highly configurable and it also exposes Prometheus metrics to expose its internals, so monitoring it is also straightforward. So that's all we wanted to show you guys today. Thank you.