 much. It's thrilled to be here. Thank you all as well for dialing in, whether it's morning, afternoon or evening, or watching this recording. As was mentioned in the introduction, I'm going to be talking today about Cata, Kubernetes Event Driven Auto Scaling. I'm excited to do that. Before we jump into it, though, let me give a quick introduction about myself. Fantastic to meet you all. I am Jeff Holland. I've been at Microsoft for a little over seven years now. I'm currently the lead product manager for Azure Serverless. That includes the Azure Functions Serverless service. That includes Azure Static Web Apps to do some serverless web hosting. It's been great. I thought I would add something a little bit more fun into the mix as well. I decided to call out that I have actually been a guest on the Ellen DeGeneres show. My wife and I got called up to play a game during one of her shows. I was not a guest to talk about Cata or Serverless or anything fun like that. But it is just something more light. You can find me on Twitter at Jeff Holland. If you ever want to reach out, have questions or whatever it is, just want to say hi. And speaking of which, I do have the questions here live. So as you are going through this presentation, if you want to go and drop me a question about Cata, about what we're talking about, I'll probably pause here in 20 minutes or so and grab a few of them depending on what's there. It makes some time at the end as well. So feel free to keep those coming in and I'll make some time for them. As for our session today, there's a few topics I want to spend some time covering. The first is a little bit of an overview of what we mean by serverless and event driven. These are both incredibly powerful and emerging popular trends. A lot of buzzwords involved, but there's some meat behind them. There's some benefits behind them. There's some use cases that I want to go over. We're going to weave that into how that works with Cata and what's Cata's role in this event driven world. Some different concepts of Cata, including scaled objects and scale jobs that we'll get to. We're going to talk about how serverless run times and Cata can work together and make some time for Q&A. All right, so let's start with those fun words. Let's do event driven first. So event driven architectures is not new. This has been around for a very long time, but the idea is that you want to run your workload in reaction to events. This is very natural in some ways, right? You don't answer your door unless somebody knocks on it. There is some event that lets you know I need to get up and answer the door. You don't do things usually until something happens. So event driven is saying, can you build your application in a way so that it is responding to the business events, to the technology events, to the platforms that actually have work that needs to be done. Means is you'll have something like a new employee is hired. That's an event that gets admitted into your system. Well, maybe an employee is hired to issue company email address. And so one of your event driven pieces of code might say, oh, I see the employee hired event. I'm going to go issue them any email address. Maybe somebody else is is, you know, I need to go make sure that they're registered in our HR systems. You get the idea. When you're building off of these events, some of the benefits is that your architecture becomes very loosely coupled and very highly composable, right? I can add additional for when an employee hired necessarily go change the issuing the email or the registering HR system. I just write another microservice, another component that says, oh, I actually care about the event and employee gets hired. So you can move these around, you can reuse them. It has some benefits. Now as part of event driven, this is taken off a bit more in the cloud recently, because of serverless. Often you'll hear about things like AWS Lambda Azure functions, Google Cloud Functions, open FAS a myriad of other serverless options. In the intent of serverless, while the word itself is a misnomer. Yes, there are servers in serverless. It's more or less a meaningless term in many ways. But the gist of it, what we often try to get across when we talk about serverless is enabling developers, enabling users to focus as much as possible on writing their code. In the other aspects of shipping and deploying software are taking care of by your form, by your cloud vendor, by something else. So you're focusing on the layers that matter, your individual IP, your differentiated tech, and the other elements like patching the operating system, scaling, managing capacity, managing resources. Those aren't in the full of the mind of the developer. Now part of that in something your function service. If I talk about, you know, where I mentioned I have a lot of experience, that means that people are deploying these functions to the Azure cloud. And they're saying, Hey, when something happens, I want you to run my little bit of code. Well, in order to run their bit of code when something has, it needs to be event driven. They need to associate it with some event. Hey, when a message gets dropped in a queue, when an image shows up in the storage account, when a record is changed in this database, when an HTTP request is made when it's Friday at five PM, an event could be anything, but that's one of the core elements of serverless. You also have this aspect of on demand compute because it's event driven, you can build your applications and serverless is that way where you only pay for a serverless function when it's actually running when that happens. If no employees are hired in the month of January, your new employee serverless functions never run and you're never charged. They never consume any CPU or memory. And so part of that too is the billing aspect. Now I want to introduce these concepts now because it's going to weave into what Kata is and what, how it brings these capabilities pretty seamlessly and in a very user friendly way to any Kubernetes cluster running anywhere on premises in the cloud with red hat a Azure doesn't matter. Now just an example of where we see these event driven and serverless patterns use. These are four big ones I see in that you might have seen yourself to accomplish automation tasks. Hey, when a whole request is submitted, I need to run some checks when, you know, Friday at five PM, I need to re-index my anything that maybe manually I'm having to do now that you would love to automate. It's often a great ask for serverless or event driven. Often you'll hear about serverless being used is the glue between different services. I have my postgres database here and I have my file store over there and I need them to be able to talk to each other somehow. And if there's not a direct bridge, I can stick in a serverless function and say like hey whenever data gets changed here transform it and write a record over there. Building very rapid APIs, especially APIs that you want to be able to scale in that event driven way. And then finally event streams, whether it's a Kafka stream, an event hub stream, a Kinesis stream, having streams of data from something like an IoT device or from system telemetry, another great candidate for event driven and serverless because you have these IoT events. The temperature is this someone has entered the room pop, run your event driven code. In a real world use case, if we've seen of this is something like in retail. And so I pulled a slide from a presentation of one of our retail partners that they gave at the Java one conference where they showed their architecture of how they're using event driven and serverless. And this will kind of give you the idea of the types of scenarios for extra building. So this is a massive retailer. They need to be able to scale to billions of transactions on an event like Black Friday or the holiday season that just happened. And so they have these individual event driven functions in their case to say like hey someone places an order. I need to run this process function. I need to then run this notify function. I need to run this persist function. A few capabilities that were important for their serverless platform is they need to make sure it was resilient. If something like an error occurs, that Kafka stream needs to be able to replay those messages. Losing a transaction, not getting the item that was ordered actually shipped, is a really big deal. So you need to have some resilience here. You need to be able to use like Kafka checkpointing if you're familiar with event streaming terminology. They really cared about cost optimizations. You can see here they're using things like Kubernetes and OpenShift in addition to some Azure Cloud functions. They want to make sure that in fact this partner, they're tracking things like the cost per order. You don't want to be spending CPU cycles or running events or scaling when you don't need it, right? You don't want to be running at Black Friday capacity in a Black Friday load all of the time. You want to be very smart about when and how you're using those precious compute resources. And then finally they had some more advanced patterns too. We'll get to this maybe a little bit further into the presentation, but things like the order that those events get processed in. So order made, order sold. It's very important that the functions that the serverless pieces of code that were running this grab those in the right order. You can't just throw them out and hope that the order canceled is processed after the order was made event. So a few patterns that we see from time to time. In a few kind of an example use case. In this case in retail though you could see similar architectures whether it's in finance, healthcare, you name it, IT. So the one thing out here and it's kind of the history of CADA is now we're kind of we've had an intro into event driven and serverless. We've seen an example architecture. What about CADA? Well one of the ways that this came about is from our learnings in Azure of running serverless services like the architecture I've shown. More and more we saw that people were interested in running these event driven workloads not necessarily using the Azure serverless service or any serverless service. Maybe they wanted to run on premises. Maybe they wanted a little bit more control. Maybe they're just unifying their strategy around things like Kubernetes. And what we saw was that the default Kubernetes scaling was not optimized for event driven applications. So how it usually works in Kubernetes for those of you who are familiar is I'll have my application. Let's say I order a processing application and I'll deploy it to Kubernetes. What Kubernetes by default is going to watch are what's called resource metrics. That's things like how much CPU is this consuming and how much memory is this consuming. And as it watches those it will be like well the app you publish seems to be using a lot of its CPU. It's at like 80%. I don't know why it's using CPU but it seems to be using a lot. So I'm going to scale it and then I'm going to wait and I'm going to watch. See if it's still using a lot of its CPU. It's very reactive. It's looking at the symptoms of what's happening. It's cCPU rising. It doesn't know how. It doesn't know why. It doesn't know how long the CPU is rising for and it's making the best decision it can based on that information. Now that's fine but it's not optimal. In running a service like Azure Functions, a serverless service, we use what I call proactive scaling, event-driven scaling, where rather than looking at the CPU for a function we look at the event source. We look at the thing that is triggering your function. So if this is an order processing function and you have a Kafka stream we're going to look at that Kafka stream. We're going to say hey there are a million messages here. Someone just dropped a million messages in Kafka. We need to scale right now. It doesn't even necessarily matter what the CPU and the memory are. There are a lot of events that are coming in. So let's scale you out to process those events. That's that event-driven scaling that we're talking about. It's much more rapid. You're scaling on the cause. You're scaling on the actual events that are happening and not just on the symptom. It also enables you to do things like scaling all the way to zero because if there's no messages on that stream of data you don't need to be running that application at all. There's no work to be done. So event-driven scale also lets you scale all the way down to zero. So let's show you a little bit of an overview of Cata. So Cata was born. This was a partnership initially between Microsoft Azure and Red Hat. We kind of huddled together talked about some of these problems and worked together to build the initial release of Cata. There was released a little over a year and a half ago. It's currently a CNCF sandbox project. So this is full open governance. I mentioned I work at Microsoft but Cata is not a Microsoft project. It's a CNCF sandbox project. It's going to monitor the rates of events happening in your system enable you to proactively scale any of your containers, any of your apps. You can say Cata, go scale this thing using event-driven auto-scaling. It feeds the data into Kubernetes APIs. You'll see in a second when we walk through how Cata works, but it's very non-intrusive. It's a single-purpose component. It only does one thing, event-driven auto-scaling, but it does it really well and it integrates with what Kubernetes already provides. We didn't want to reinvent Kubernetes. We just want you to say use this with Kubernetes. You're going to find a lot of value from it. It lets you scale to zero and back out of zero. So you can really efficiently use your CPUs and your cores. You can add it to any cluster, a new cluster. You could have an existing cluster that you've already been running for years or months. You can pop Cata in there. It's not going to get in the way. It won't do anything until you tell it to. One of the great things about Cata is that it knows about all of these different event sources. So Cata knows how to monitor events from Kafka, from RabbitMQ, from Prometheus, from a number of cloud services like Azure Qs, AWS, GCP. I think at this point there's like 33 event sources. It's extensible as well, so it can really monitor anything. But it's super easy and efficient to use there. And just a quick answer to one question that I'll do. CNCF, and thank you, it's a good one. This is the Cloud Native Compute Foundation. So this is an open governance foundation sponsored by a lot of large companies like Microsoft, AWS, Red Hat, you name it, that is a safe spot for open governance projects. It's actually where Kubernetes itself is hosted, is out of the CNCF. So a great foundation to be a part of. We were thrilled when we became a Sandbox project. Okay, so there's a few questions around how Cata works. I figure at this point I'm going to show a quick demo and then we'll walk through what happens behind the scenes. I think this is actually going to answer a lot of the questions that I see coming in. So I have here a quick scenario that I want to show you all. And I see this little bar up here. I don't think you can see it, but I'm going to move it out of the way just in. So I'm showing here on my desktop console, I have a Kubernetes cluster that is up and running. If you're not familiar with Kubernetes, that's okay. I'm going to walk you through this. You don't have to know about Kubernetes to follow along here in the story. What I have here in this Kubernetes cluster is a Rabbit MQ queue. So this is just a message queue and it's totally empty right now. There are no messages on that queue. And what I want to publish is an application that will hold messages from that queue. So maybe this is my order process function. Whenever something drops in this queue, I want to process a message. Okay. And I don't think I actually have that application there now. I think I need to deploy it, but I just want to make sure. Okay. Yeah. So nothing, there's nothing here. I haven't deployed any apps. Okay. So what I have over in this window, I think it's right here, I have a simple piece of go code that I want to run. And this is just a simple container that's going to talk to Rabbit MQ and pull the messages. It doesn't matter. You don't need to worry about knowing what this code does. It's just the hello world for pulling a message from a queue. And I'm going to now deploy that to my cluster. Okay. So I'm going to do a Kubernetes deployment. This is how I defined it. I'm saying, hey, I have this thing that's going to pull Rabbit MQ messages. Nothing here is different. Yes, I do have my username and password in plain text, but that's okay. It's a local one. You can't do much with it. We'll get to this in a little bit, but as part of this deployment, I'm giving it some metadata to tell Kata, hey, by the way, you know this thing I'm about to deploy? It cares about messages coming from Rabbit MQ. And I want you to use this using event-driven scaling based on Rabbit MQ. So I'm telling it the deployment, like, hey, I'm going to deploy this Rabbit MQ and I want you to monitor and scale it. There's even this other metadata I can provide where I can more securely handle secrets. No, I'm only half using it as you can see here. But you get the idea. So short answer is I'm going to deploy something that triggers and listens to Rabbit MQ messages. And I'm telling Kubernetes, let Kata do it. Okay, so let's run that in there now. So I'm going to add that to my cluster right here. So now I'm telling Kubernetes, hey, go about my container that pulls Rabbit MQ. Here's the cool. If I look at Kubernetes now and I say, tell me about your deployment. I have my consume running, but what I mean when I say scale to zero, it's not actually running. It knows about my container, but it's not running. It's not consuming any CPU. It's not consuming any memory. It's not consuming any container spots because Kata knows that there's nothing on that queue. So why should I run this container right now? Well, let's change that. Let's actually go ahead. In this other tab, I'm going to watch. We're going to watch this happen in real time. Let's watch the containers that are running. So I only have one container in my whole cluster right now, and that's the actual queue itself. So we're going to deploy something that's going to drop, I think a thousand messages to this. So I'm just going to publish a thousand messages to the queue and let's see what happens. So I'm going to run that. If I come here, we should see the container that's going to spin up and publish all these messages. But you'll notice actually, look at all these consumers that are spinning up. As soon as I start publishing those thousand queue messages, Kata realizes, oh, there's actually work to be done now. And very quickly, hopefully you saw it. If you blinked, you might have missed it. I went from zero instances to four instances. Now I'm spinning up to eight instances. This is much faster and more reactive than something like Kubernetes default scaling. CPU hadn't even been hit before I realized, oh, there are a lot of messages to be processed here. I might have cranked this up to 10,000 messages to make the demo more apparent. But you can see what Kata is doing here. It's scaling out very rapidly. I've configured it to scale out rapidly. You can actually kind of scale it down until it not to go too far, too fast. But I wanted to blow your minds here. So I've told it scale to the moon. But this is scaling a lot like a serverless function does in the cloud. In this case, in my Kubernetes cluster, if I waited here for a little bit longer, we actually might wait long enough. What will end up happening is once the queue is drained, once I've emptied that queue, I process all the messages, Kata is going to scale it back. It's going to say, okay, we did all the work we needed to do. Oh, wow. It's happening as if it's listening to me in real time. The queue is empty. Kata realizes there's no more work to be done scaling me back down to zero. I don't need those CPUs anymore. So I've had this very serverless scale-like experience in this case on Kubernetes with RabbitMQ. Okay. So let's walk through what happened behind the scenes and then I'll pause there and answer a few questions. Okay. All right. So what happened here? Let's watch this. What was happening behind the scenes? Well, a few things. The first, before I ever did the demo, I had deployed what's called the Kata operator. This is a one-time install step. I'll actually show you what it looks like. So hey, get me the pods. I have a special namespace for it. I've deployed here with this is a really smooth bit. I says in go entirely and go very tiny little operator that just knows how to orchestrate what you just saw happen. Okay. So I have the Kata operator. It has all these what we call scalers. Scalers are all of the different things that Kata knows how to monitor and scale for. I just brought over the Kata homepage. You can just the list as I scroll through it. These are all scalers that Kata knows about out of the box. So these are all the things that it can drive and run. Okay. So that's sitting there in my cluster. Now these darker blue boxes or gray boxes, these are actually just pieces of Kubernetes itself. This isn't part of the Kata project. This is a part of Kubernetes. Kubernetes has this thing called the metrics API, which enables you to omit and publish specific metrics about your application or about your system. And it has this other thing I annotated it is the HPA. This is the horizontal pod autoscaler. This is the default Kubernetes autoscaler, the one that scales based on CPU and memory. It's not a bad thing. It just doesn't know a whole lot about what your code is doing. And what Kata does here is if I, oh, let me switch to the next part of the animation. I've got my RabbitMQ queue. My events are in these slides. I'm going to switch this to Kafka. They all work the same. Like the dance that's going to happen here is the same. And I deploy my app along with the scaled object. You remember I saw that briefly. This is the little bit of metadata that tells Kata, hey, the thing I just deployed might cares about Kafka. It cares about RabbitMQ. It cares about Azure queues. You need to go listen and scale. I have value here called the lag threshold. This is how aggressively it scales. So 50 means if there's 50 messages that haven't been processed, I only need one instance of my container. If I set this to one, the lag threshold to one, that would mean if I have 50 messages, I need the instances of my container. One message per container is my target. It's just a target. It won't always be honored, but if I make the number higher, I'll scale slower. If I make the number lower, I'll scale more aggressively. I go and pop that deployment, my Kubernetes deployment and the scaled object description into my cluster. Kata now knows, okay, here's your deployment. You care about Kafka. Now, Kata starts asking your event source. How many events are being generated? Hey, Kafka, how many messages here aren't processed that deployment cares about? Now, when I started the demo, the event source was totally empty. So Kata told the metrics API, which told the HPA, actually, in this case, Kata did it all itself. Kata scaled that deployment down to zero. It said, we don't need to run it at all. Now, as soon as a message comes in, the message hits your event source. Kata asks this question, how many events are being generated? It sees that there are messages to be processed. Kata tells the HPA, hey, this deployment has a hossages that need to be processed. Kubernetes now goes and does its cool scaling thing because Kata has just made it a lot smarter. It's told it about how many events are being generated. Now your deployment up, and it pulls the messages. So one note I want to call here. I know there's a lot of info. Kata doesn't pull the messages and send the messages to your deployment. Kata just makes sure that your deployment gets scaled and it gets woken up. It's up to your deployment. It's up to your app itself to actually go and fetch those messages. And there's a reason for that because it means that you can now do things like checkpointing and order delivery and all of the things that whatever event source you're using is going to have its own semantics to do that. Now this goes on and on until, as you saw, if I'm actually getting a lot of messages, if maybe it's not one message, it's thousands. Kata's telling the Kubernetes autoscaler, hey, I need you to scale it a little bit more. There's a lot of stuff coming and you'll see what happened in that demo. Okay, let me go over these core principles and I'll do a quick pass through the questions. So when we build Kata, and as a maintainer now, we have some core principles. One is we didn't want to rebuild anything that Kubernetes offered itself. We didn't want to provide our own autoscaler. Kubernetes has an autoscaler. We just want to figure out how can we extend that autoscaler? How can we make it a little bit smarter, a little bit faster? So we only built the pieces that made it smarter and faster. We didn't rebuild the whole thing. It's very single purpose. It's not doing a lot. And we want it that way. If you want to do more, then pull in other stuff that does other things. It's not a service mesh. It's not an event broker. It just scales based on the number of events that are happening. Kata works with any container, with any workload. It can scale stateful sets. It can scale deployments. It can scale go code. It can scale Java code. It can scale Azure function code. It can scale Python flash code. Anything works with it. We want to preserve what makes messaging brokers powerful. So there are things like Kafka and RapidMQ that give you these rich semantics for processing messages. We didn't want to bypass any of that. And then finally, we want to make Kata open and built by and with the community. We have biweekly standups every Tuesday that you're all invited to join anytime. Kubernetes Slack channel on Kata. We wanted to make this general purpose, not just a Microsoft tool or a Red Hat tool or a Codit tool or you name it. A few folks who are using Kata today, these are just a few of our logos. Ali Baba Cloud has it integrated with some of their offerings. Apache Airflow, we're using it at Microsoft for a few projects. A number of folks who have been using Kata in production as well. All right, so let me take a pause here. I'm just going to look at a few of these questions and see if there's anything here. There's one question about incoming traffic. I'm actually going to pause that one and answer it a little bit later for time. Yeah, and I think there's one question on how we configure the burstiness of the metric. So I briefly went over this, but let me just show you this very quickly here. If I come over here to the Kata site again, Kata.exe, if you have questions, I'd recommend you come poke around here in the docs too after the webinar. Hang out with me for a bit more. I've got some more stuff I want to share with you. But when I deploy that scaled object to tell it to scale deployment, I've got a few knobs here that I didn't go over. I can configure the polling interview. How frequently does Kata ask Kafka if there's messages? For my demos, I made this really low because I wanted it to scale quickly, but that means that I'm adding a bit more traffic there. Cool down period is how quickly it can scale back down to zero. I can set minimum. Maybe I never want to scale to zero. Maybe I always want to keep one available so I don't need additional latency. I can control whatever minimum I want, one, 10, 50. I can set a map pay. That's cool if you want to scale to the moon, but only scale up to 20 because at past 20, you're going to start hogging too many resources. I can control these fancy HPA metrics. I won't go into them now, which is how many it adds and how quickly it adds. So there's a few knobs here and more, I mentioned, the lag threshold that give me a bit more control over the behavior of Kata. So it's not just running wild. I'm trying to move this window out of the way, but okay, let's look here. I'm going to count this one as answered. Hopefully I described it simple enough. I know a few of you here are more beginner, but at the very least, I want you to at least understand Kata is helping your container scale kind of like you imagine a serverless function scale. We did mention Kata is written and go. We talked about what CNCF stood for. I talked a little bit more about what scaling takes. In the other one of comparing Kata to some other open source projects like Knative, I planned out later along with the HD. This is, I'll answer this one now. Actually, I'll pause. Let's move on. Let's move on. So end you there. Keep them coming. We'll take another break. And I think there's a few I didn't answer. I'll get to them. So there's one more Kata pattern I want to go over before we dig into some of those other questions. And that's this quite we get sometimes, which is what about long running executions? And you would by that demo I showed how quickly I scaled out to 50 instances and how quickly I scale back down. Well, what if each of those messages wasn't something small doing a simple mission? What if it was actually like the link to a video a three-hour video that I need to transcode or encode? Well, that's a lot of work. That might take a few hours to do. How do I make sure that Kata scales me with scale, but it doesn't disrupt a long running process? And let me show you why that, because Kata or, I mean Kubernetes, Kubernetes has this interest. Let's imagine I'm scaled out to four in five containers of my app running. And each of them I've got animated like these little progress bars. Imagine I'm encoding a video and instance one is barely started. Instance two is almost done. Maybe it's going to finish in the next minute. Instance three and instance first kind of far, somewhere in the middle. That Kata, Kubernetes, and sorry, I'm getting unnoticed. I might have a bit of a hiccup in my video feed. So I'm going to turn off my camera. I'm going to keep her to hopefully save some bandwidth. Let's imagine that Kata tells Kubernetes, hey, we actually don't have a lot of messages in the queue. We can probably start to scale you down a little bit. We might not need four instances anymore. Well, when Kubernetes tries to something down, I often talk about it's kind of like Thanos, where it snaps its fingers. Trouble hearing you. I said Thanos. In theory, I thought I was talking. It snaps its fingers and Kubernetes is going to down, but you decide what it's down. It's going to randomly decide what it scales down. So you can see here in this case, it's going to scale down one of those things that was almost done. You're like, no, now that's going to have to restart on one of these other instances. Like I have something long running. Please don't scale me down where I'm in the process. So if it's short lived, that's fine. If it's long running, you're going to have options. One is there are some APIs in Kubernetes to tell Kubernetes like, hey, hold on, give me a second. I know you're trying to scale me down. I'm just going to give you a second. It's probably intended to really give you like a few seconds to clean things up, but we have tried this out and you can kind of like do this for hours. We're like, hey, Kubernetes, give me a second. And it comes back a second later and like, just give me a second. You do that for hours. It's fine. The other option though is use this capability of Kata called scaled jobs. Now what scale jobs does, it's instead of scheduling containers that are going to be constantly pulling those messages, you create what's called the Kubernetes job for every single message. And jobs are different than deployment because jobs, the intent is that wakes up, it runs to completion, and then it terminates, right? I don't want to terminate my web server. My web server should keep serving traffic constantly, but a job is something like wake up, process this data, and then terminate yourself. So you can tell Kubernetes, hey, for every Q message, I actually want you to create a Kubernetes job. The job wakes up, pulls one Q message, encodes the video, terminates, and then it's done. Kata is going to orchestrate this process for you. It's going to periodically remove completed and failed jobs. So it will keep them clean. You can control how many jobs you want to have in parallel. Like, hey, I'm cool transcoding 10 videos at a time, so you can have 10 parallel jobs happening at once. You can control how aggressively it creates those jobs. There's a lot there. So I just want to note, if the event driven thing you want to do is a little bit more long running, you might be interested in this scaled jobs pattern, other than that scaled deployment one that we kind of talked about, where it just wakes up and keeps pulling and pulling and pulling as many messages as it can. So very useful pattern here that I wanted to share. Other thing I want to cover quickly, Kata currently is 2.0. We actually rolled out the 2.0 version in November at KubeCon. So a few things I wanted to highlight for those of you who might have looked at Kata before, things that are new. We broke out scaled jobs as its own custom resource. So we always supported scaling jobs, but it's to kind of be like a subset of functionality. Now it's its own first class thing. We let you scale lots of stuff. You can scale any type of resource. You could scale a stateful set. You can scale custom resources like Argo, rollouts if you're using other frameworks. We allow you to define multiple triggers on a scaled object, and we will scale based on the noisiest of the triggers. We provided a bunch of new scalers. We enabled better extensibility so that you can do external scalers. Maybe you have, like at Microsoft, one of the ways we're using Kata is we have some internal tools called Geneva that we use to monitor our services. You can build a Geneva Scalar that knows how to scale based on our own proprietary thing and just attach it to Kata just by writing a bit of extensibility code. We support liveness and readiness probes so that Kata stays reliable. We expose some Prometheus metrics for every scaler so you can have better first class monitoring. More around that 2.0 releases in this blog post though, if you're curious in checking that out. Okay, let's get him here. This will actually start to answer one of the questions here and I'm seeing a few other ones of how Kata compares to some of these other serverless runtimes, including things like Knative is a very popular one and a very useful tool that we get a lot of questions on. And the way I want to kind of dive into this a little bit is talking some about a serverless runtime in addition to Kata. So here's how I want to explain this. What I'm showing here on my screen right now is the code required, you actually saw this a minute ago, to pull a message from a RabbitMQ queue. So this is the code. This is like the hello world thing. I've got to create a connection. I've got to connect to an exchange. I need to pull in messages from RabbitMQ. It's not a whole lot. Like this isn't terrible. This will only take you a few minutes to write. But there's code here that I need to write to actually connect to that queue, to pull in the message, to make sure that I'm retrying it and keeping it resilient and doing all those cool things that I want to do when I'm using a queue. Now, the challenge with this though is that that is a bit of code. And oftentimes when you think about serverless, you don't want to have to write all of this code. Like if I'm writing an Azure Function or AWS Lambda and I'm pulling messages from a queue, I don't have to write all that code. In fact, what a serverless runtime will provide is code that looks more like this. This is the Azure Functions runtime. Very similar code sample could be pulled from AWS Lambda, Google Cloud Functions, you name it. It's doing the same thing. The left and the right side of the screen are doing the same thing. But what a serverless runtime is going to provide is all of this code on the left as part of the runtime. Like you don't have to write it as a developer. You just write like, hey, I want a queue message. When I get that queue message, this is what I want you to do. So pairing a serverless runtime with Kata can be very useful. Now, there's a few serverless runtimes that are open source. Azure Functions is one of them for sure. So you can create an Azure Functions project and you can just write, hey, I have this Azure Function. I want you to package it up in a container and I'll go deploy it to Kubernetes. So instead of running this in a Cloud serverless provider, using Kata, you can package that runtime with the code using some of these commands and go run it in Kubernetes. So hopefully you get the idea that you can still have that serverless runtime experience. In Kubernetes, you use Kata to do the scaling. You use the runtime to minimize the amount of code you have to run. So there's another demo that I could do here to kind of make that solid. Maybe I'll run through it very quickly because I don't want to spend too much time just to kind of get you the sense of like, what does using a serverless runtime with Kata look like? Let's create a quick project here. I'm going to just create a folder called live demo. It looks like I've used something like that before. So if I'm using a serverless runtime, let's say I want to use Python. Here, I can choose a bunch of different event sources that I actually care about. I think there's even more here than I'm not showing like Kafka and RabbitMQ. But the idea here is that I can use a serverless runtime in this case inside of Visual Studio Code to pull messages from my queue, which I have lovingly named queue. I could build this, debug it, run it the same way I would a serverless app that's going to be published to the public cloud to Azure in this case. But the last step of this, and I'll just show this part quickly, here's all the code that I have to have. This is pulling from a queue. All of the additional code is handled in the runtime. I want to pull from a queue. I could come in here and do whatever work I wanted to do. But once I'm ready, once I've debugged this thing and it's doing what I want it to do, the piece that I wanted to highlight is that I can just simply run this command. In fact, I have it saved here because I knew I was going to need to use it. Maybe I don't. I can run this command, funk Kubernetes deploy. We'll name this Python function and I'll give it a registry to stick the container into. But what this is now going to do is it's going to build a Docker container for my function, this Docker container. In fact, it's going to build a Docker container for the serverless runtime and it's going to stick it up in my cluster so that Kata can now scale it. Apologies, I know I ran through it really quickly. It's because I want to get to the other stuff. But just to give you an idea, the thing I want to emphasize is you're now using a serverless runtime to pull in those events to help broker those events to simplify the code you have to run. Kata, in this case, is just going to be doing the scaling. So once this is finished and I'm not going to wait for it, I would be able to come into my cluster here and you would see that my serverless function is parting in... Actually, it's there. Wow, that happened quickly. There's my function, scale to zero. If I drop something in the queue, you would see this code spin up and run. So again, hopefully that makes sense, that the main gist here is serverless run times can simplify the code. All right. Yep, brings an event-driven programming model paired with Kata for scale. It gives you that same serverless developer experience, package it in a container, scale it with Kata. There's more docs in that if you're interested as well and samples too. In fact, every demo I've showed here, there is a sample on kata.sh where you could run through the RabbitMQ demo or using a serverless runtime. Both of those are ready for you to walk through if you're curious to learn more. So what's some of the reasons that folks might want to use a serverless runtime and host it on Kubernetes with Kata instead of hosting it in a serverless provider, whether that's Azure Functions AWS Lambda, you name it. Well, you might want to use your serverless on-premises. You don't want to be running in a data center all of the time. You might be more hybrid. Maybe you have existing Kubernetes investments. You have more custom compute security or networking requirements. That's when I face a lot, when I'm working with financial companies. They're like, hey, I love the serverless service you have, but I have more constraints around what I can and can't pass through in networking or maybe I need special nodes that have watchdogs running on them or whatever it is. And then finally, it's no vendor lock-in. If I'm running in Kubernetes with my own containers, I can run that anywhere. I could run that same... I didn't even specify for this demo that I've been running. Where is this running? I could do this exact demo on the Azure Cloud, on DigitalOcean, on Azure. I said Azure twice on AWS, on Google. I can run this wherever it makes the most sense for me. Okay, so there was one question. Now let's start to answer some of these other ones. How can you do Cata with incoming traffic? So often when I think about incoming traffic, I think about maybe a non-traditional event source. That's a bad way to say it. Not a message broker, but something like HTTP traffic. So how can I use Cata with HTTP traffic? And the answer is you can. It just requires an additional piece here into the mix. So I think actually, and there's a... I had a slide here, because I knew this was going to come up, but in the FAQ of Cata.sh, we have a question here, which is, hey, can I use Cata with HTTP? And the answer is yes. You just need to add in one additional piece. In fact, there's a diagram right here. So Cata needs to ask something how many messages are coming in, right? That's how Cata works. So I need to talk... Cata needs to talk to something. It can't talk to the ingress controller directly today. It's something we've actually considered doing. So the pattern we recommend doing here, and I'd recommend... You saw how I found this doc. So whoever asked that question, I would recommend you come to this doc. You would usually set up something like a Prometheus server, have Prometheus expose how many HTTP messages are coming in, and then Cata can scale the thing. The other thing I'll mention quickly here, since we're kind of in the Q&A section now, we're actually spinning up an additional project as part of the Cata org called Cata HTTP. Let me see if I can find it. If I can't find it, I won't worry about it. But we actually have some folks, HTTP add-in. We actually have some folks who are looking at building capabilities. Yeah, it's still in the early stages that let you do HTTP without having to manually set up what I've just described. So it is possible. You have to use Prometheus today, or at least that's what I recommend, but there is some work happening in that space. All right, so good question. The other question here was around, how does this compare to Knative? And I knew this question was going to come up, so I prepared some slides. Cata, I mentioned, if I think about my container, the actual user code, my container is responsible to pull messages from the event source. It is up to my container, maybe a serverless runtime in that container, but something needs to connect to RabbitMQ or Kafka or whatever it is and pull the messages in. What usually happens with solutions like Knative and OpenFaz is that what happens is you write a container that knows how to respond to HTTP traffic. So you would write something that's like receive HTTP requests and process the messages. If you want that container to process Kafka messages, you would deploy what's called a Kafka adapter, and there would be some adapter that's provided by the project, OpenFaz and Knative, both have a Kafka adapter, and you have this adapter that pulls the messages from Kafka and then sends them over HTTP to your container. Okay, so there's a little bit of a difference here in how the events are distributed. Now, there's pros and cons to both of these approaches. So the pros of kind of the alternate, the non-Kata way of doing stuff is that the developer just has to know how to talk about HTTP. They don't have to know how to talk to Kafka. Everything becomes HTTP, which is great because pretty much everyone can write an HTTP server. Now, some of the downsides though is that when I have some more specific message systems, like if I'm using Kafka, Kafka provides way for this SDK to preserve ordering, for it to do checkpointing. This becomes a lot harder when you're translating everything to HTTP. In fact, none of the providers that I'm aware of enable you to do something like checkpoint or do ordered messaging at scale using this pattern. So you get some pros, but there are some cons. You can't talk directly to that event source anymore. So things like stream processing where you're windowing and pulling and windows of data just become really difficult when you're trying to do all that windowing with just HTTP messaging. And then the last one here is like, this event adapter is now responsible for talking to the messages. Your developer can no longer control that code for better or for worse. Now again, the alternate approach, the Kata approach is not perfect either. It has pros and cons too. It looks like I messed this one up. I think it's almost the opposite. The downside here now is though the developer has to have this SDK code in there somewhere. I mentioned a serverless runtime is a useful library that can abstract that, but that has to exist. And then the pros though being like you can leverage things like dead lettering sessions, partitions, checkpointing, windowing, those all become possible because this could be over HTTP. This could be over GRPC. This could be over AMQP. This could be over any protocol. You're talking directly to the event source. Kata is just scaling it up. The only other thing I'll mention in terms of I'll leave this up as we answer the last few questions. The only other thing I'll mention in terms of differences between Kata and KNative, I mentioned Kata is very single purpose. It just does the scaling thing. KNative provides actually a lot more. So KNative provides like ingress. It provides things called replicas where you can do versions of apps. It provides routing so that you can route between the apps. You can optionally provide in like this eventing component. It does a lot of things, like 30, 40 things, all pretty useful things, but you grab a lot of them when you use KNative. Kata just does event-driven autoscaling. It's very unobtrusive. It just sits there. You apply it to scale what you want it to. That's not saying one is right and one is wrong, but it's worth noting that if you use KNative, you also will be adopting a lot of other features for better or worse as well to it. So hopefully that helped some. I'm a fan of KNative. I follow that progress of that project a lot. I think it's doing some incredible things. And KNative and Kata have actually been working together for a bit now to figure out how we can bridge some of these worlds together to make KNative do other stuff as well. Take a look. Take a look at both. Okay. Let's answer a few more of these as we wrap it up in the last five minutes here. Limit scaling. We talked about already. You can define it. I'm going to ignore this one around service meshes. It's a great question, but not this webinar. I think there's other webinars on service meshes. Definitely check that out. KNative eventing, that event pattern is what I talked about. KNative eventing is the kind of thing in the middle that turns everything into HTTP. So hopefully that answers that question. Let me read this online. Can we combine Kata with the usual HPA? For example, if the app is processing images of different sizes, it may happen that just scaling and function of images may not be enough. Yes. So the answer is yes. You can you can combine HPA scaling with Kata scaling and you kind of leave it up to the HPA to use its algorithm to use both bits of data. But great question, Sergio. And the answer is yes. You can't combine them all. Good question, Kevin. If you're extending Kata, like you have your own event source, does your extensibility need to be written in Go? And the answer is no. We actually provide a GRPC protocol for you to follow. One of our most popular extensibility pieces for Kata right now is actually for this tool called durable functions. The extensibility that lets Kata know how to scale a durable function is written in .NET. But it can talk to Kata, which happens to be written in Go. So no, you can extend it in any language. Great question. Yep, Kata, this is similar to the other one I see someone ask. Kata can be combined with traditional metrics. So you can use Kata, but also include CPU and memory. You just make it smarter. You don't have to say it's only event-driven or it's only CPU. Let's see what else is here. We talked to him about Kata and KNative. Someone asked, I just like this question. We got about three more minutes left. Incorporate Kata with quantum computing. Sure. As long as quantum computing works with K Kubernetes, I would assume that there is a world or a project somewhere that has Kubernetes using computing. I have no idea. I'm not a quantum expert, but nothing would stop you from that. In fact, I guess an interesting note. Because how Kata integrates just natively with Kubernetes APIs, usually if Kubernetes can do something, Kata can do it as well. In fact, almost always. So like for example, a few cloud providers have this feature called like virtual nodes. There's another CNCF project called virtual kubelets where let's say I want to scale out really far, really fast. Similar to how I did it before, like imagine if your cluster didn't have the capacity to scale out as far as maybe your best, your Black Friday sale is causing you to scale out. You can turn on this virtual kubelet and scale out to serverless containers. It's this cool feature. Check it out. Virtual kubelets, another project. Anyway, Kata works great with that. You can use Kata to scale into serverless containers that aren't even in your cluster. Because Kata just tells Kubernetes, I need to scale. Kubernetes is actually in charge of taking care of it, whether that invokes cluster scaling, serverless container scaling, you name it. All right, doing a last pass here before we run out of time. Great questions. I appreciate all of you asking all of these as well. Incoming traffic. Yeah, I'll just wrap up with this one, Moe. I know this one was a little bit earlier. I could have grabbed this one earlier. I think I just missed it. Any benefits to event-driven scale over traditional scaling? And it depends on the solution. I think the big benefit of event-driven, if it's much more proactive, you're scaling based on the cause of the load. So you actually are scaling far and fast. You're able to scale all the way to zero, which isn't really possible with resource scaling, because the CPU is never going to be at 0%. So when do you scale it down to zero? That said, resource scaling isn't all bad. And in fact, I would encourage you, and we kind of talked about, Kata lets you combine them both. See where Kata might fit in? See where it doesn't? Super unattrusive, as I mentioned. You can pop it into a cluster. You can only have one of your containers using Kata to help it scaling. The other 99 containers might just be using vanilla Kubernetes. That's totally cool. Maybe you want to mix and match some event-driven metrics, some resource metrics. That's totally cool too. But it's a very valuable piece in your toolkit. Definitely worth considering. And so thank you all again for joining. Thanks for watching this recording. This slide was up. Hopefully you read it. Check out the site to learn more. Reach out if you have questions on our Kubernetes Slack channel. I would love to see some of you in our community stand-ups, whether you want to contribute with code, docs, issues, samples, designs. Just saying, hey, thumbs up. I noticed all of it is very welcome. We want to bring everyone in. So thank you all very much. And with that, I'm going to wrap it up and pass it back over. So thanks all for joining. Thank you, Jeff. That was awesome. Really appreciate all your knowledge there. And thanks, everyone, for joining us today. Have a great day. Bye-bye.