 Good morning, good evening, wherever you are. Welcome to this talk about Kubernetes event-driven autoscaling, where I'll show you how you can do application autoscaling, which makes it super simple on Kubernetes. So first, before we dive into it, so my name is Tom Kerkhofer. I'm an Azure architect at a company called Codeit, and I am one of the co-maintenors of CADA, and if you want to reach out to me, you can always find me on GitHub, or you can also ping me on Twitter. We are always open for feedback, so don't hesitate to reach out. So before we get started, let's have a look at how you can autoscale applications on Kubernetes without CADA, so plain old Kubernetes, fresh cluster, how would that work? So if you have a look, imagine we have four deployments, which represents applications. What you would typically do is you use a horizontal auto-scaler, which allows you to scale on CPU and memory. So this works fine in vanilla Kubernetes. Now imagine you want to scale on one or more of these external dependencies at the top of the slide, how would that look like? So over on the Kubernetes side, you would use what's called an external metric, which basically defines these metrics from outside of the cluster. Now before you can use these external metrics, you would need to use a metric adapter. So a metric adapter basically pulls the metric from one of those systems, makes them available for you to automatically scale on. Now there is a caveat, however, you can only use one metric adapter. So if you want to use multiple of these systems, you'll have to choose one and make sure all the metrics are available in that one too. So in this case, imagine you want to autoscale on Prometheus, Kafka, and Azure Monitor. What you could do is then send all metrics from Kafka to Prometheus, send all metrics from Azure Monitor to Prometheus by using a tool like Prometheus. Now all of this is a bit much, and we with Gata figured we can make this a lot simpler so that you don't have to worry about all of this autoscaling infrastructure. So with Gata, we basically have a variety of scalers and secret sources so that you can automatically scale deployments, jobs, or practically anything inside your cluster that has a slash scale super resource. So it could be also a resource from a CRD, from another tool that you're using. Then you just choose one of the 30 or more built-in scalers that we have, or you can build your own, or you can use an external scaler from the community and use your application by using that. Now one of the beauties that we have is you can also optimize for cost and scale all the way down to zero so that your cluster resources are free enough to those that actually need it. Then when Gata sees that there is new work coming in, we will basically scale you from zero to n instances to keep up with this. Security is important, so that's also why we strive to have good authentication, which is simple, reusable, and manageable, and that's why we also allow you to manage that separately from the scaling definition if you want scope to a namespace or scope to the whole cluster, which you will see later on. Today we're very happy to be a sandbox project and we're actually in the process of proposing to become an incubation project. So if you are actually already using Gata, it would be my pleasure to talk to you so that we can convince the CNCF to graduate. Now one of our main mantras is that it allows you to focus on the application and not the scaling internals itself. So we really want to do as much as possible for you so that you don't have to worry about anything. Now if we remove everything again, and we have a look at what it looks like with Gata, basically instead of adding HPAs, what you will do is you will install Gata in your cluster. It has all the scalers out of the box, you don't have to worry about anything anymore. And then what you do is you deploy scaled objects, or if you're using jobs, the scaled job. The scaled job or object basically defines what you want to scale, when it has to scale, and how far it has to scale. Then for the rest, basically Gata interprets these by using an operator, and then it handles everything for you. So it has a scale controller that basically checks the external dependencies, and then it manages the auto scaling for you, and under the hood it will actually use an HPA as well. So fairly simple, you just install Gata, you create a scaled object or a scaled job, and all the rest is managed for you. No longer need to worry about external metrics, one or more metric adapters, no, we manage it for you. So how does it work under the hood? For us it is very important that we do not reinvent the wheel, so that's why we are extending Kubernetes. So Gata only handles the 0 to 1 and the 1 to 0 scaling. For all the rest, we basically rely on the HPA and serve external metrics, sorry, to the HPA. So we are actually using a metric adapter ourselves, but we make sure that we support all the systems that you need across all the cloud platforms. Now, like I mentioned, we have a lot of these built-in scalars. We also allow you to use add-ons, which are not maintained by us, but maybe by another vendor, another product, or maybe yourself. And we make it super simple to get it installed, either by using Helm or the operator framework. So you can also find this on the operator hub. Now, what scalars do we support? Well, actually, these are just a couple of them, but we basically support all the major cloud vendors and all the major products in the ecosystem. If we are missing some, don't hesitate to open an issue or maybe even contribute the scalars to us, and then you can also autoscale your applications with them. Now, I've mentioned security before a bit. What we want to try to do is that we have as less secrets in the cluster, and that we also reduce the duplication of the secrets. Now, your deployments will have to authenticate to the systems anyway, but we also want you to have the controls to use separation of concern and give an identity to the workload, give an identity to KLAB for autoscaling, so you can manage the permissions differently. You can rotate them, but you don't have to. We don't force you to do it, although it is a best practice, sometimes you can't or it is not the best fit at the time. Now, what we do is we give you the trigger authentication and the cluster trigger authentication CRDs, which basically allow you to reuse the authentication, by either using environment variables on the scale target, Kubernetes secrets, but we also support other secret stores like HashiCorp fold, or if you're on a cloud platform, we support some of their no secret authentication offerings like Azure Managed Identity and AWS Pod Identity. So by using those, you have no secret authentication, nothing to worry about, you simply rely on your cloud provider to manage that. Now, let's have a closer look and show this in action. So what I'll show you here is one of our examples that we have where we will use a dotnet core worker, which is processing an Azure service bus queue, and then we'll basically add auto scaling with data so you can see how easy it is to get started. Now, before we dive into it, I have a service business space here in the Azure portal. You can see that I have one queue, which is called orders at the bottom. And then if we have a look at the access policies, so the different identities in this case, because we will use connection string, you can see that we have an identity for the auto scaler, which is KDA and requires managed permissions. We have one for the portal and the generator, which we just used for the sake of the demo. And then we also have one for the application. So that's the workload that we run, which only has listen permissions. So because we will use the trigger authentication, our application can be scoped to just listen permissions and the auto scaler manage. If we would reuse the same connection string, now our application would require to have more permissions. We will not do this here, but this is just to show that you can separate those. Now, as you can see on this portal, we have no messages on the queue. We have a clean sheet and we will get started. So in here, I have basically one deployment running. As you can see, it is the order processor. It is super simple and it is purely demoware. So if we have a look at the deployment, you will see that we just run the image from GitHub's container registry for KDA core. We mentioned that it is using connection string authentication. If you want to run the same demo with managed identity for Azure, you can just go to the GitHub repo and follow along. But for the sake of the demo, we pass the connection string of the queue and we give it the name of the queue. Then we also have the secret here with the connection string. So if I would go here and say, give me the parts, we will have one instance. And I will watch the logs for that one and just show that it is up and running. So on the bottom right here, I have an order generator to queue some example work. So if I now queue one message, you will see on the left side that it got immediately processed straight away. Now, what we'll do is we'll queue a lot more work. So let's say we will queue 250 orders. So now the generator will start generating the traffic. And on the left, you see that our order processing is slowly processing those messages, but it is not that fast. So in the portal here, you see that it's going way up and it is almost never going down because it is that slow. So this is a perfect case for Kata. So what we'll do is we will break in here, we will clean, and we will just watch the deployments here. Now, if you want to get started with Kata, it's very simple. You can do a help install Kata. Just as simple as this, in this case, we installed it to the Kata system namespace. However, I already did it. So if we just check, you see that it is up and running. And if I do a K get all on the Kata system, we can see that it basically installs the Kata operator and the Kata metrics API server. Okay. If we do a K get CRD, you'll see that we also add some CRDs over here to authenticate. Now, the version that I have installed, it doesn't have the cluster trigger authentication yet. If you would use the latest version, you can do it. Now, what does it look like to get things auto-scaled? So I have another file here, which has a scaled object and the trigger authentication. So let's start with the trigger authentication. So with the trigger authentication, we have the various options to authenticate. In this case, I will refer to Kubernetes secret, which you can see here below. So it has the base 64 encoded connection string, which is not super secure. But for the sake of the demo, we will use this. And we just refer to the secret. We say which key it should use. And then the parameter name. The parameter name is specific to the trigger, the authentication trigger that we want to use, which brings me to the documentation. So if you go to Kata.sh and you want to see which scalers we support, you can basically see the full list already here. And if we go and have a look, you see every single scalar that we have. Okay. If I go to Azure service bus, which is the trigger that we will use, it basically gives you all the information you need. And in our case, we are using the connection authentication parameter for our trigger authentication. So this is what we will use. In the meantime, you see that our message processor still hasn't picked up. So we still have time. Now, if you look at the scaled object, this is basically where we define everything related to the scaling itself. So first, we define what the scale target is. So in this case, we use just the name, which implies that we will scale a deployment. So this is an exact match of the deployment that we have already up and running, picking up those messages. Now, we also define what the maximum replica count is. So basically, we're telling Kata to go all the way to 25. You can also define a minimum replica count. If you want to always have five instances up and running, this is where you can configure this. Now, I left that out because we will use zero, which is the default, meaning if the queue is empty, we will scale down to zero instances. And then it is super simple. So you can have one or more triggers. In this case, we have one trigger, which is of type Azure service bus. We give the name of the queue and say, hey, we want to start scaling if the queue has five messages or more. And in terms of authentication, we just reference our trigger authentication resource that we have in the cluster. So this is just as simple as it is. We have an existing deployment running. We just point to it. We say how many instances we want and when we want to scale. If you add multiple triggers, it will basically start scaling if one of them meets the criteria, which is important. So what I will do now is I will apply that scale object. And as you can see, it adds the scaled object trigger authentication and our secret. So if you would give me the scaled objects, you can also see that it is targeting the deployment, the name of the deployment, maximum 25, and it is ready to scale. And it is actually active. So on the right, you now already see that it is scaling from one to four. And it will start spinning up the pods to process them as we go. So if we now have a look, you'll see it starts slowly picking up the book. And then in the meantime, you see it already went to eight. Now for the trigger authentication, we have the same experience. So you can see all the trigger authentication resources in your cluster. You can also see that we are in this case pointing to a secret and not pod identity, for example. So in terms of the operators, they also have the controls of getting more information as we want. And of course, if you want to have a closer look, you can always use describe to understand what's going on and have more information. In the meantime, if we check, our queue is already empty. So we have basically found out process the whole queue. And if we go back, you see that data already scaled all the way down to zero. So if we do a get pods, there's basically nothing to show here because nothing is up and running. If we now queue back up two messages or 10, sorry, you will see that in a second, Kata will see those messages again, and we'll start scheduling the work again. So this is just as simple as how you can do auto scaling with Kata. You have an existing deployment. You just say how to scale it and how far and then we will handle everything for you. You don't have to worry about HPA, metric it out, et cetera. If you want to, you can see the HPA that we use under the hood. You can also influence the HPA by tweaking it, how it should work. You can see in our documentation what that should look like. So you can also have control there if you have to, but generally we say that you don't have to. So if you're not a Kubernetes expert, you can also do auto scaling fairly easily. Now, if we have a look at the community that we have around Kata, it's growing fairly a lot. So we have more than 3K stars on GitHub and more than 100 contributors from various companies like, of course, the maintainers, but also people like IBM, Shutterstock and other companies. And we do bi-weekly community stand-ups on Tuesday. So if you are interested or have questions or doubts, please make sure to join one of those and we're happy to discuss them. And you can find more information on our website. Now, most importantly, who is using Kata? Is it already mature enough? Do people already trust it? And I'm happy to say yes. We have various products that are relying on Kata to do the auto scaling. For example, Apache Airflow and Astronomers are auto scaling the workflows with Kata. We have serverless tools like Azure Functions and Fission who are relying on Kata to do the application auto scaling. And then we have others like Dapper and Knative that use it as well. But also in terms of end users, we have a growing catalog, including major companies like Alibaba Cloud and Microsoft and also others like KuraLogic and Shadeir Hyperion Roadsworks and others. And we're actually constantly adding more and more of them. Now, if you're interested in why Alibaba Cloud decided to use Kata for all their application auto scaling, you can go to the CNCF blog where we did an article with the kind people from Alibaba Cloud talking about why they chose for Kata and how it helps them do auto scaling very easily. Now, another example was Azure Functions. So I will not dive into the demo here, but it is just as simple as using the Azure Function core tools. You can deploy Azure Functions to Kubernetes and behind the scenes, they are automatically installing Kata for you, creating the scaled object for you. So they are using Kata to manage the auto scaling for their serverless workloads without you even having to worry about it. So if you want to give this ago, it's just as simple as doing these three lines and you are up and running on your Kubernetes cluster automatically scaling those Azure Functions. Now, in terms of roadmap, we are constantly evolving, but one of the major aspects that is missing today is HTTP workloads. HTTP is a little bit harder to scale because all the things that we mentioned before, for example, the queue, you exactly know how much work is on the queue while with HTTP. It is synchronous communication and you don't know how many calls will be there in five minutes, for example. So it is a bit harder. So now we have an alpha version of this up and running on GitHub, basically, where we have an add-on scalar, which you can deploy separately, which is aiming to deliver scale to zero for HTTP without a dependency on Prometheus. And this is fairly important for us because if you are using a cloud vendor, you could also use their telemetry and observability stack. And it's a bit ridiculous to just spin up Prometheus to do autoscaling. So we want you to choose. If you have Prometheus already, you can use Kata today and use the Prometheus scalar. Or if you don't have Prometheus, but you still want to scale HTTP, what you can do is use this add-on. You can deploy it and you can use the new CRD, which is called HTTP Scaled Object, where you basically define what you want to scale. You define the amount of replicas, and then we will basically do all the hard work. We support the day ingress, but we're also planning on supporting Gateway API service meshes by relying on the service mesh interface and service-to-service communication. So if this is important for you, please go and give it a try. Share your feedback and let us know what you think. But we are super focused on this because this is very important. Okay, now how does it work in the current version? So you bring your own workload, you bring your own service, because you know how to expose your service. And basically, we'll use core Kata and extend it. So we have a Kata HTTP operator to manage the new CRD. And basically, we use an interceptor between your ingress and your service so that all the traffic goes through the interceptor to measure the traffic, but also to hold the requests. Because if you support scale to zero with HTTP calls, which are synchronous, you have to hold the request until the service has spun up. And then you can forward that request through. But as I mentioned, this is Alpha, where we are very eager to hear what you think and also what the scenarios are that you're looking to solve. And we can work with you to make it better. Now, in terms of what else we are planning to do, so we are constantly adding new scalars and new secret sources to authenticate. For example, at the time of the recording, we're adding Azure Pipelines so you can automatically scale build agents so that if you have a lot of pipelines or builds that are pending, you basically scan out on Kubernetes to serve that build pool, sorry, the build queue, and then you can basically scale as you want. If you have any other needs, we are happy to add them, frankly. We have recently added Kubernetes events to Kata. Now we are also going to add cloud events in case you want to expose them outside of the cluster or integrate with existing tools to gain more insights. We're also working on adding, building a community around the external scalars so that you can also use the great scalars from other people and also making it better to discover them. That's why, for example, we're working with the artifact hub to expose Kata scalars in there as well. So if you today have an external scalar, you can now go to the artifact hub, add a new source, and then it will automatically be listed on artifact hub. And then lastly, we're working on CNCF graduation to incubation, which you can find on GitHub as well. But we are fairly agile where we have a public roadmap, and if we have people who have certain needs that are crucial, then we are happy to work with you and see what we can do. So don't hesitate and open those feature requests. Now if you go back to our Kubernetes cluster to summarize, you can basically use Kata in the whole autoscaling bigger picture of Kubernetes where we have Kata for the application autoscaling so we can scale our app across nodes. But of course, this is not enough. You can still use the cluster autoscaler to basically resize the whole cluster so that your application can even scale further. But at some point, your cluster will be full. Cluster autoscaler will not be able to keep up. And then based on the cloud provider that you use, you can also scale beyond the cluster and overflow the workload. So in this case, if you use an Azure Kubernetes service cluster, you can use virtual nodes to overflow the capacity to Azure container instances for that serverless Kubernetes experience. Now the beauty of Kata is also you can scale down to zero which is also important from a cost perspective because if our orders app goes all the way down to zero, implicitly, you can fully rebalance the cluster. You can get rid of Azure container instances. You can remove one node and you can basically save a lot of resources in your cluster. And with that, I would like to thank you for your time and open up for Q&A and here if you have any questions. Thank you for joining.