 Hello, everybody. I'm glad to see you here in this room. Today, we're gonna spend next half an hour talking about KEDA, and, you know, we should probably get to each other a little bit more, so I would like to ask, are there any KEDA users that are using KEDA in production? I can see a bunch of people, cool. And is there anybody who doesn't know what KEDA is, or maybe just, you know, know the name, auto-scaling, whatever, but, you know, don't know the details? Okay, okay, some, also some newcomers. This is great, so now it's my turn. So my name is Binyak Rubalik. I know the name is pretty hard to pronounce, so don't feel bad about it. I'm based in Czech Republic, Europe. I've been around Kubernetes and Open Source for many years. I'm a Microsoft MVP, and while I'm here today, I'm a longtime KEDA maintainer. So basically I'm with the project since the beginning. And finally, just a few weeks ago, we started a company around KEDA called KERIFI, so I'm the CTO there, and we tried to support the project and help our customers. But today, I'm not going to talk about the company. I would like to talk about the project, so, you know, today I'm wearing my maintainer's hat, so I should probably take off this one, right? And let's start it. So first I will, you know, maybe wait a little bit because there are more people coming. But basically, I will try to explain what KEDA is, what we are trying to achieve, what is the vision, what is the idea, maybe talk about some advanced features, about some interesting features that are in KEDA. And then I will cover some tips and best practices so you can use KEDA like the best way it could be. So it is, you know, working well at your environment, short demo, and I will also quickly talk about the future of the project and what we are trying to do in the future. So I can see that the people are coming. Okay, this is great. So actually, let me start with a short story. So we are in this room, and imagine that there is a bar in this room. So probably somewhere around here, I'm the bartender. I like beer, you know, so I'm serving a good beer. And so I have a few taps at the bar, and I'm serving one tap and I'm giving you good beer. So maybe, you know, this guy in the front row would like to have a beer, so he's coming to the bar. I'm giving him the beer. This is great. Maybe after some time, all of you will learn that, okay, this beer is really great, so I would like to have also one beer. And you all start coming to the bar, you know, and I'm fully utilized, fully, you know, occupied by serving the beer. So in the end, I'm not able to handle the load, you know, handle the amount of people that are coming to the bar. So maybe I'll ask a bunch of people to help me, you know, with it, so I have more taps on the bar, so they come and they basically help me autoscale the solution, help me autoscale the taps, so they will serve the beer, you know, from the extra taps. So this is the first solution. Basically, I autoscale the taps based on the utilization, based on my utilization, maybe based on the beer flow, right? But if I'm a little bit smarter, what I can actually do, I can, you know, observe the space, observe the room, and I can maybe see that, okay, over there, there are people are coming, you know, they are forming a queue, so maybe they are coming to the bar. So I can be more proactive about what's going to happen, and I can be more proactive in the autoscaling of the taps. So I can, you know, ask those people to help me a little bit in advance. So I suppose you got the picture. So if we basically convert this solution, this beer problem into a technical problem, it is the very same. So we have some consumer application, and it's consuming some stuff. This application is basically the other taps at the bar, and they are consuming some data from some external system. So it could be RabbitMQ, Kafka, or it could just, you know, do some stuff and query, and do it based on Prometheus. So the first solution, the nice solution, basically, is that we basically autoscale based on the utilization, based on the flow of the beer. So this is the HPA, this is the Kubernetes HPA. It just monitors the utilization and scales out the application. But if we are, you know, more smarter, what we can do, we can actually observe the queue of the people coming to our bar, or you know, it can observe the queue in RabbitMQ, Kafka, whatever. So basically, we can predict that something is gonna happen to our workload, and it can be, it is done by some external events, by some custom metrics, maybe. And you know, this kind of autoscaling is useful for some certain scenarios. Because sometimes, you know, the actual utilization, the CPU or memory consumption of the application doesn't reflect the actual need of our application, you know, because sometimes the application just consumes messages, not doing much stuff, but we need to, you know, consume the messages much faster. With a traditional HPA approach, you cannot do that. You need some external metrics. So this is what Keda is doing. You know, and also we have like this, our bar, you know, with the beer, and maybe in the future, we will start serving shots, maybe some cocktails, drinks, and these are like all the additions, extra features that Keda provides on top of HPA. So this is like the idea behind Keda. And it brings the elasticity to your platform. Because you know, you may ask, okay, so why do you need to autoscale the test? Because you can put like the people at the bar, like right away, right from the beginning, so they will be like serving the beer from 10 taps from the beginning. Yeah, this is doable, but it is not, I would say efficient. Because you know, if the beer is not good, people will not come, and the people at the bar will be useless. So we really need to think about the platform, about our solution, about how to bring the elasticity, because it solves two big problems. So the first problem is the ability to handle like a high load, so more people are coming, so we can autoscale the, scale out the application. Or, you know, we can save costs, because if there is no need for the application to be running, we can scale it down to zero. So this is very useful, especially these days when everybody is trying to save costs. So this could really help saving our cloud cost, because the application autoscaling makes the pressure on cluster autoscaling. So imagine that you have an application deployed, for example, on AWS, you have Kubernetes, and the application is running on the Kubernetes. If you enable autoscaling, application autoscaling, if the application is scaled down to zero, it makes the pressure on the cluster autoscaling, and the cluster autoscaling can also scale in the number of nodes. So you are saving resources and saving costs. Okay, so what is SCADA? SCADA is trying to make event-driven autoscaling as simple as possible, so as I just told you, you can scale any deployment or job, or you can scale a job based on some external events or custom metrics. It doesn't rely just on CPU or memory. We have 65-plus different event sources like these connectors to external services, so AWS services, Azure services, Prometheus, WMQ Kafka, you name it. And if there is any connector missing, you can implement it and contribute, or if this is some kind of in-house solution that you have in your company and it's proprietary, what you can do, we have this concept called external scale. It's an interface that you need to implement. It's a very simple interface, and this way you can feed the metric to CADDA and CADDA will handle the rest of the stuff for you, so it will autoscale the stuff based on your custom scaler. Actually, the greatest news of today is that CADDA has been recently graduated, so basically it is a good news. I would like to thank our community, CNCF. It's a great achievement. And yeah, thank you. So I was with the project since the beginning, so originally the project started, it was 2019 or something like that, and it started as an internal cooperation between Microsoft and Red Hat actually, as a POC project. Then we saw that the project is quite useful, so we open sourced it and then later donated to CNCF, and we went through all the stages, through sandbox, incubation, till we graduated, which is happening today, so it is a very great achievement. Our community is quite, I would say, quite huge. So we have some users listed on the web page. We have contributors from many companies, and I would like to highlight this QR code for CADDA users. We are collecting a survey about how are you using CADDA and what we can do to improve it, so this is the QR code with a very short form, so please take one minute to fill out the form. I will also share the QR code at the end of the presentation. So let's talk a little bit about the architecture. So how are we actually doing this kind of stuff, so this kind of event-driven auto-scaling? So CADDA is built on top of Kubernetes. So we are not trying to, you know, remand the wheel. We are trying to use the existing stablet in Kubernetes. So CADDA is built on top of HPA, so we are still using HPA for the auto-scaling under the hood, and we are using Custom Metrics API interface that's provided by Kubernetes, and we feed the metrics to HPA. There are two main components in CADDA, so the first component is a CADDA operator or controller. It is doing the majority of work, so basically, if you deploy some resource, it monitors the resource. It connects to external services and provides the metrics, and the second main component is the metrics adapter or metrics server. This is the connection point to Kubernetes API, and it's basically a proxy, and when HPA or HPA controller is trying to do the decision whether it should scale the application, it asks through this metrics server, metrics adapter, like the metrics that we are providing from CADDA. We also have some admission controllers and additional stuff, but this is not so important. I would like to highlight one more thing. So basically, at the moment, HPA cannot scale to zero, so, and this is the way, so we need to handle it ourselves from the CADDA point of view, so we have doing it through the operator. So we have actually two phases, so we call activation is the phase when you are scaling from zero to one or one to zero, and then there is the scaling from one to N, and the one to N scaling is handled by HPA. The activation phase is handled by the operator. So CADDA has, you know, few resources. The main object is called scaled object. This is the place where you define your scaling metadata. So basically, we are saying, okay, I would like to scale this deployment based on this kind of data. You can auto-scale deployment status set or any custom resource that provides scale sub-resource. So basically, anything that's running on Kubernetes can be auto-scaled. So basically, in the scaled object, you can see in the middle there is a scale target ref. This is the reference for our deployment. Then we are setting some replica bounds, so minimum, maximum replicas, and then at the bottom there is a trigger section, and there we are basically saying which metrics should be used to auto-scale the workload. We have also additional advanced settings, but this is like the very basic stuff. We can specify multiple triggers in one scaled object, and actually I would like to highlight this point, so if you would like to scale your application, even with HPA, and you would like to scale it based on multiple different metrics, always use just one resource, because if you use, for example, two HPAs or maybe a scaled object, an HPA, this kind of things will result in conflict because each thing is independent, so it will try to fight each other. So if you need to specify multiple metrics, please use this single scaled object where you put all the triggers. This is very important. And also, through the scaled object, we also provide all the capabilities that HPA does, so you can specify all the related HPA-related settings. It is very easy. The second, I would say, main resource, it's called ScaleJob, and this one is useful for processing of long-running executions. So imagine that you have a workload that is doing some, you know, maybe pulling some data, and then it's doing some calculation. It could be maybe hours or days. For this specific use case, Kubernetes deployment scaled through HPA might not be the right choice because in the middle of the execution, the HPA might decide that it would like to kill this particular bot and scale it back to minimum replicas. So for this kind of use case, you can use Kubernetes jobs. So basically, we are building the scale job on top of Kubernetes jobs. So again, we are reusing the concepts, and it's very similar. So basically, instead of the reference to the deployment, you put the whole, let's say, Kubernetes job reference, like the whole stuff in there, and then specify the replicas and the same trigger section. So in this case, you know, we can monitor RabbitMQ and, you know, consume messages from this queue, and, you know, we will spin individual Kubernetes jobs that will, you know, process and do some use for stuff. So these are the two main resources. We also have a resource for handling of secrets because, you know, usually when you would like to connect to your external service, to this Kafka or RabbitMQ, you would like to still as some kind of, you know, authorization authentication. For this specific use case, we have a resource called trigger authentication. It's like a single place where you can store all this information and then just reference it from the scale job and it will pull all the stuff. It supports secrets. It supports Azure Core Vault. I manage identities on AWS Azure, so all the stuff. So I really, really recommend using this kind of stuff to, you know, to, let's say, store the credentials or secrets in the main resources. So this was like the brief introduction to the architecture. I would like to maybe just quickly go back and spend more time on this. So recently we did a, let's say, architectural change because as I mentioned, there are two main components, the operator and metric server. The metric server is providing the metrics to HPA and operator is doing the rest. And originally, like the metric server was doing more stuff for us, so it was opening the connections to the external services and, you know, more things. But recently we have decided to move the majority of the logic to the operator, so to have the single source of truth. So this resulted in less connections to the external services because originally we have open connection, for example, if we are scaling application based on Prometheus metrics, then if you deploy your deployment, then you deploy your scale object with this reference. From KEDA we open connection from the operator to query the metrics, right? And then another connection from the metric server. And this is not needed, so basically we decided to move majority of the logic to the controller, to the operator, so we are now opening just one connection to the external service, which is good. And it also helps us, you know, with implementing of some features that I will try to explain later on. So this was like a huge internal change that wasn't present or visible for users, but it helped us a lot with the project. So the first thing is certificate. So we have certificate management for any internal communication, so between all the components, the communication is encrypted. We have support for set manager. We have support for plugging your own CAs into KEDA. So for example, if you need to connect to some service that is using your own certificates, you can use this. So this is, at the moment, is quite nice addition. Also, Prometheus metrics. Scaling is nice. If you scale the application, this is nice, but if you scale at scale, it might be harder to manage the solution. So we are constantly trying to add Prometheus metrics about what KEDA is doing. So we have a number of resources, a number of errors, a number of failures for individual scalers. We have metrics there, the activity of the tab that's happening. So we have this Prometheus metrics. And recently, we have added support also for open telemetry. So we are exposing the very same metrics. Also through open telemetry, so you can connect to your open telemetry connector and we will send the metrics that way. So it really helps managing the deployment on our cluster. Another quite new feature is pausing of auto-scaling. It was like a feature that imagined that you are auto-scaling based on some metrics in Prometheus. This Prometheus instance is going into some maintenance or something like that. And you would like to pause the auto-scaling. And for some reason, you don't want to remove the scaled object. We would do the same job to stop scaling. You just want to annotate the scaled object. So we introduced this annotation where you can specify that, okay, let's stop auto-scaling and stay on the current number of replicas or you can specify the number of replicas that you would like to set your deployment on. With pausing the auto-scaling, we are also closing the connections to the external services so you don't see any errors, any problems in the external service. Service is not in operation. At the moment, we are working on adding this same capability for scale job as well. And now, I would say the main thing that I would like to highlight today is this kind of thing. It's called scaling modifiers. And this is a useful feature that we were able to implement after we did the big architectural change. So what is the problem? So imagine that you have application and you are scaling the application base, for example, on two or three different metrics. What HPA is doing under the hood? It is taking individual metrics and then choosing... So to do the final decision on the amount of replicas that needs to be scaled, it is choosing the maximum value, so the greatest value from all of the metrics. There is no way how you can modify this kind of behavior it's baked into HPA. And for some reason, maybe you would like to have an average from all the metrics or you would like to have a minimum. But we were, okay, let's do this. Let's try to, you know, let's say, fool the HPA and collect the metrics on our own, you know, do the calculation and then send the final metric to HPA. But we have decided to support, like, let's say, more complex evaluation so you can specify a formula and it accepts mathematical and conditional statements. So you can say, for example, trigger one minus trigger B divided by two multiplied by five and, you know, all this kind of crazy stuff you can do. And so basically this formula is applied during each request for the metrics and we compare it with the target and then send this individual value to HPA. This way we can really, like, basically fool the HPA and do the job for us, you know, to specify the specific conditions for auto-scaling. If we have multiple metrics. But this is also useful not if you have multiple triggers but this is also useful if you have, for example, just one trigger. This is a quote that we got from one user and he was mentioning that they have requirements to be always over-provisioned by free ports, you know. And each of their ports can process free messages at the time. I suppose it was probably RabbitMQ or something like that. And, you know, before this feature, it wasn't, you know, we weren't able to do that because for each, you know, each iteration or each scaling operation, you need to add, you know, these three extra ports. With this formula thing, it's quite easy so we can just specify, okay, let's use the value from the trigger, trigger one, and plus, plus nine. Why nine? Because, you know, they need extra free ports and each port can process free messages. So it's quite simple. So I think that this kind of stuff is very useful and we still like trying to figure out what does can be achieved with this, but this is really cool stuff I wanted to highlight. Okay, so these were the features I wanted to highlight and now some maybe best practices and tips. So the first one, use Fallback. Fallback is also like a feature that we have. It's an older one and it's quite simple. So you have an application, you are autoscaling the application, but what would happen if the service that you are getting the metrics from, the Prometheus or Kafka is down? You know, what would happen then? So we have implemented this Fallback feature when you can specify, okay, if there are, for example, four errors, connection errors in a row to this external service, let's scale to 10 replicas. So this is really useful. There is no, you know, like a downside. So please use this one, you know, just to be sure that you have the right amount of replicas running if there are some problems in the external service. Other, let's say, maybe not the, but explanation on the actual behavior that we are having. So we have this concept which is called polling or setting that's called polling interval. So basically with this setting, you specify how often the operator checks the metrics and, you know, updates the stuff. But polling interval is only relevant to the activation, to the zero to one skirting. Because if you recall, when I was explaining the architecture, the zero to one skirting is been handled by the operator, so we can, you know, set the specific interval. But the one to n skirting is handled by HPA. And HPA doesn't have a, you know, let's say, setting for specifying the interval for querying the metric values. It's cluster byte setting. Usually on Kubernetes clusters, it's 15 seconds. And you cannot change it as a user. You need to have like, you know, admin rights. And so every 15 seconds, HPA asks us for four metrics. So it is not that Keda is pushing metrics to HPA, but actually HPA is asking the four metrics. So we have no way how to reduce the polling interval. But thanks to the architectural change, because the operator right now is the, you know, single source of truth, if you enable metrics cache in feature, it's like in the advanced section, what Keda is actually doing, it is just querying for the metrics during the polling interval. So for example, the polling interval is 15 seconds. So each 50 seconds, it asks for four metrics. And if HPA is asking in between, it is not doing, Keda is not doing any additional request to external service, but it is using the last value from the cache. So basically, it is caching the stuff. Why is this useful? Imagine that you have, you know, hundreds of scaled objects in the cluster and all of them are asking maybe or querying the same Prometheus instance, you know? And if you have like the polling interval set very low, then the HPA is asking every 50 seconds. It could be like a substantial load to this Prometheus instance. So this way, we can, let's say, let's say do some things around it. So for example, if you are okay with getting the metrics, maybe just one every 60 seconds or something like that, you can enable this feature. And operator will just ask the Prometheus for the metrics every 60 seconds. So any request in between will be just hitting the cache in the operator. So if you are, you know, deploying Keda on scale, this is a thing that you should consider and really like be careful about setting the right values for polling interval and metrics caching. Another thing I would like to highlight, this is not Keda feature, this is like the feature of HPA, but we are, you know, enabling this capabilities through scale object as well. It's a stabilization window and scaling policies. So stabilization window is useful when you want to prevent replica count flapping. Imagine that, for example, again we are querying the metrics from Prometheus and once the Prometheus tell us, okay, let's scale to 10 replicas. The next query will be scale to one replica and 10, 1, 10, 1, so you know. So you will see this constant flapping in the cluster. So I prefer, you know, you should try to think about like the solution and about how to handle this. So stabilization window is the right thing, so you can specify the window during each, like the HPA is considering the replicas. Scaling policies are just settings that, with these settings we can control the rate of change of replicas. So for scaling out and scaling in, we can say how fast or how slow these operations should happen. Useful stuff and if you really want to have like a good solution, it's useful always to think about these settings to nail them correctly. Okay, and this is another, like let's say Kubernetes topic. The first question that you can see on the slides, why the HPA reports metrics like this. This is a very common question that we got from users because you know, users have deployed applications, they deployed scaled objects, CADR and everything is working correctly. Scaling is happening as usual, but then the user will start, let's say, okay, let's check the stuff. So they are querying HPA to get the specific metrics that it's getting. And HPA is reporting this really strange number and the reason is very simple. This is the way how Kubernetes represents float values. So basically the first part, it's 4.8 and then it's divided by five. So it gives you like this result of the operation and it'll give you like the float value. So it is really just a representation. You don't have to be afraid about what is happening with my system, what are these strange metrics. And then other thing I would like to highlight is or are metric types. So when you are creating HPA or a scaled object, you can define like what type of metric is this. And the setting, it basically tells HPA which algorithm to use to compute the number of replicas from the metric. So the average is using the average. So basically it's comparing the metric value with average across all the ports. The value is doing a little bit different. So again, please read the recommendation on this and think about like what is the proper way or the best way for your solution. Okay, so talk is cheap. We don't have much time so I will just do a very quick demo actually. Maybe a little bit shorter than I want. So I hope it's visible. So I have like zero ports running in my cluster. And what I'm gonna do, show you today is just this very basic stuff. So I have an application and I would like to, it is consuming messages from Kafka topic and I would like to auto scale it with Keta. So the stuff I described, very basic stuff. So first I will deploy the application. So the Kafka consumer application. It is standard deployment and nothing fancy. It just connects to the Kafka. So let me create this deployment. And as we can see, there is one port running with our application. And now I will probably generate some load. So I'm using this job to generate load and it will create 15 messages. So let me create this load. So I will create 15 messages. And if we check logs... Yeah, if we check logs... Okay, okay. We can see in this... Oh, it's not so visible, sorry for that. But basically it... This is this index and it received 15 messages. So the application is able to get the messages from the Kafka topic. And now what I'm gonna do, I will apply a scaled object. So let me actually start with applying the scaled object. I will create a scaled object. And you can see right now the port is... It's been terminated because there are no messages in the topic. So I decided to kill the port. I can just quickly show you the scaled object. It is the very simple one. So I have the minimum maximum replicas. I have the polling interval set in here. Just telling to scale this consumer application based on this Kafka topic. Perfect. So let me generate some big road. It will be more messages coming there. So we should see the auto-scaling. So this time we are creating, I don't know, how much 500 messages. So hopefully in a few seconds we should see more ports coming running. Okay, so the first port has been started by Kedda. And we will find out that we are not able to handle load and more and more messages will be coming to our application and we will see more ports. But actually we don't have much time. So let me just quickly jump to the next part. So I just want to show you the fallback. So basically I will kill the Kafka. So I will delete the Kafka cluster. And how can I do that? Let me show you this. So basically the only thing I have added is the fallback. So actually let me create this scaled object with the fallback, so it is just the update. So right now we have the fallback. The application has been already scaled down to zero because all the messages have been consumed. And right now I will delete the Kafka cluster. So I'm using stringsy, so I will just quickly delete my Kafka instance. The Kafka has been deleted. And in a few seconds, or maybe, you know, we will see that the application has been scaled to two replicas. This is the fallback. So, okay, I hope it will happen. Yeah, okay, so right now we can see the application because we have no connection to the Kafka topic. So scale the application to two replicas. The last thing I want to show you is just the example of the multiple triggers, but we don't have much time. So I will just show you the scaled objects. So basically here I have two triggers. The only thing I need to add to the trigger is the name, so I need to name the trigger, and then I can use it for the formula. So here I'm saying, okay, let's use the sum value for both triggers. Very simple stuff. And for the pausing, I have like just this annotation. So if I create this scale object at the moment, it will be scaled to six replicas. And the connection will be to the Kafka will be dropped. Okay, let me go back to the presentation just to really quickly discuss the future. So we are working on a Cloud Events integration. So basically the metrics that you see, like from Prometheus and OpenTelemetry, we also want to expose a Cloud Events. So you can subscribe and receive this kind of stuff. We would like to work on multi-tenant installation because this is a huge topic for us because at the moment due to the limitations of the Kubernetes API, we can create only one cdincense in the cluster. So we are working with the upstream to provide some kind of proxy to be able to install multiple these things. Also, another AIS order place. So we would like to have some open interface for predictive autoscaling because we have all these metrics coming from different sources about our application. So maybe based on the story, we can predict the actual scaling. So we can say that, okay, maybe start scaling a little bit sooner based on some history. We would also like to work on global configuration, plug-in management for external scalers. And last thing I want to say, it's carbon-evern autoscaling. So basically to reduce the number of replicas or the autoscaling stuff based on the carbon consumption, there is a talk about this on Thursday. Unfortunately, there are two catatocs on Thursday and they are both at the same time, but I definitely recommend to check this one and the other one as well. Okay, so thank you very much. This is the feedback for the session and the blue one is for the user survey and we have maybe just a few seconds for a couple of questions. So anybody has any questions? A quick, will you be supporting job sets with your scaled jobs? Job sets? Yeah, it's doable, yeah. Okay, thank you. Any other questions? Yes, hi. For the advanced trigger, there's an expression hidden in a string in a YAML document. Is there any plans to validate that expression through a dry run or is it just apply and hope that the user... Yeah, there is a validation. Like when you are applying the scale object, it's validated. By the way, I have some t-shirts over here and stickers, so if you want, come here after the presentation, okay? Hi, and do you have like a support for like a time-based? I know the nine o'clock is going to spike. Can I like a set of time and then do something with it? We have a current scaler, so basically you can specify a cron. So you can have a cron schedule where you can say, okay, during this period of time scale out to, you know, for example, five among replicas. Thank you. Question about the HTTP scaler. Is it possible like while we're waiting for the application to scale up, to show a message saying we're warming up the application or something like that just to kind of give a better UX? We are thinking about it. There is a plugin for HTTP scaling, but it is still like, I would say beta, alpha, so it is still not like I would say production ready. Because, you know, HTTP base autoscaling is much, much harder to achieve because you need to hold the incoming requests in case you are scaled down to zero. So we have to come up with some creative solution to show a message if you want to use that, that plugin as it exists today. Yeah, so there is, yeah. Okay, thank you. Okay, so we are probably, we can continue offline. The talk is probably done, right? Thank you.