 Hello, and good afternoon, K-NativeCon. It's my privilege to present my talk at the 2022 K-NativeCon North America. The topic I bring to you folks today is how to achieve a highly available and a scalable backend in your event-driven architecture. My name is Ansu Varghese, and I'm a proud IBMer working as a senior software engineer within Hybrid Cloud Research. This is a rundown list of what I would like to delve into deeper with you during this session. K-Native eventing backend and its pool-based components like eventing sources and other custom resources today do not provide auto-scaling out of the box or have the framework to be scaled up and down for a rich serverless experience. This is a basic must requirement for most enterprise organizations building solutions on top of K-Native. So we have thus implemented an eventing scheduler that gives you a scaling solution and distributes consumers across the data plane pods based on your priorities. We will then go through a demonstration of this new feature using a Kafka source installation on IBM Cloud Kubernetes service. This is a very active area with a lot of interest from the upstream community and from organizations building products based on K-Native eventing. There is work currently being done to expand the scalability using this scheduler into other K-Native components. And also integrating these resources directly with the K-DAW autoscaler. Moving on to our introduction slide. So let's discuss some of the expectations of a K-Native common user when considering an event-driven architecture. Now, as K-Native is built around a serverless experience, users expect that services when not used will be scaled down to zero. And when used to scale up and down corresponding to the amount of events that need to be pulled. So our backend deployments must have the capability to autoscale up or down as the workload demands change to accommodate for faster or for idle processing. Also, in a multi-tenant world where the number of resource instances can be numerous or a few, we expect the help of an autoscaler to produce a similar experience for all tenants. Secondly, cloud users do not want to be paying unnecessary costs for resource inefficiencies, in particular for environments running thousands of source instances. Third point, the backend data plane and controller should help with providing a maximum throughput and high compute density on the running cluster. Next, we should be providing highly available and resilient support for our resources across all failure domains in a multi-region environment, such that when there is a failure, recovery is quick and disruption is minimal. Overall, we want the expectations of our users to be fully met and for them to get nothing short of a truly serverless experience. We want this to be enabled by default and we should really focus on making Canadian eventing more serverless. So all of these expectations that we discussed lay out the foundation for implementing this new eventing scheduler. So currently, each dispatcher backend replica instantiates one consumer for each Canadian resource. And with an increasing number of resources, the dispatcher resources need to be increased as well. We also need to be able to configure how many consumers we need to run for a specific resource. And in addition, users today don't have a way to configure parallel deliveries to increase throughput. The only way to increase throughput is by scaling the data plan deployments and partitioning consumers across them. So that's exactly what our solution does. Our solution also allows easier integration with an autoscaler like CADA. This new eventing scheduler is also a Canadian generic component. It today lives in the Canadian eventing repository along with all of its subcomponents and it's a plugin implementations. It can be customized to use with your custom resources and sources and is meant to be a reusable framework. It's not specific to any existing Canadian implementation. So when it's integrated with your controllers, it can scale your backend dispatcher deployments and schedule these virtual replicas onto real Kubernetes pods. Next slide, let's discuss a bit further the technicalities of the scheduler's implementation. First is a placement duck type object, which is the outcome resulting from the scheduler doing his job. It's implemented as a duck API type. In familiar terms, it's like dynamic typing for data plane architectures. These placements store the name of the pod where the replica is placed and also the number of replicas assigned to that particular pod. Now the subcomponents of the scheduler include the scheduler itself and the pod autoscaler, which increases the number of data plane pods when there are more replicas to be scheduled. And similarly, the pod autoscaler decreases the number of pods when placements are descheduled from pods. It also has a state collector that periodically checks the cluster state and gathers information about it for the scheduler to pick the most optimal placements. It also has a compactor that on every interval checks the distribution. And if a lower ordinal pod has space, it evicts some replicas from the higher pods and moves it over and scales down. The scheduler also allows many scheduling features to be implemented as plugins while keeping the core of the scheduler simple and maintainable. Scheduling happens in a series of stages. First is the filter stage and these filter plugins known as predicates are used to filter out the pods where a replica cannot be placed. The next stage is scoring. These scoring plugins, also called as priorities, provide a score to each pod that has passed the filtering space. Scheduler will then select the pod with the highest weighted score sum. These plugins are registered and they're compiled into the scheduler. So next, the scheduler can handle recovery from unexpected domain failures or planned worker restarts due to some choices in its design. Data plan replicas used for scheduling consumers are actually part of a stateful set architecture whose pod anti-affinity rules is such that it constrains which node each replica in this stateful set is allowed to be scheduled on. The scheduler also relies on the sticky identity of pods in a stateful set. Finally, any changes in the consumers count causes rebalancing to be initiated by the scheduler on the next reconciliation loop. The scheduler here is inspired from the real Kubernetes scheduler that I have referenced at the bottom. Okay, we have a simple diagram here showing the various components of interest. The eventing scheduler and all of its sub-components are on the right. There's CADA autoscaler on the left and then the eventing Kafka broker back resources in the center all working together to provide a serverless experience. So let's focus on the Kafka source external API in the green box. As you know, the Kafka source API is used for consuming messages from one or more Kafka topics, which are then forwarded as cloud events to a single sync service. So this Kafka source API is a user-facing external API. As part of this new architecture, we have some new custom resources as part of the scaling feature called consumer groups in the white box and consumers today. Please note that these two new APIs are internal APIs, unlike Kafka source and users typically would not need to care about them. Disclaimer that this naming is not to be confused with Kafka terminology for consumers. These are subject to change. Okay, so each Kafka source instance is associated with the creation of a new consumer group which behaves as a virtual pod within the eventing scheduler framework. And it can be scaled up and down by increasing or decreasing the number of virtual pod replicas for maximum distributed processing. It can be manually scaled or it can be auto-scaled by Keda. Now, the job of the scheduler is to place these virtual replicas onto real Kubernetes data plane pods that you see at the very bottom and then come up with a set of placements which is then propagated back to the Kafka source status fields. And a new consumer resource is created for each of those data plane pods and they're bound one-to-one. On the right side of the screen, you'll see a sample scheduler config map. It has the predicates and priorities, the default predicates and priorities listed. And it's important to choose the predicates and priorities that fits the needs of your environment or you can even implement ones of your own and plug them in via the scheduler's registry. Next slide. So some of the advantages of this eventing scheduler for your K-native resources are the following. The main goal of architecting a general standalone scheduler as such is to encourage reusability and not have to reinvent the wheel for new resources. The scheduler and how it picks placements is very much configurable in terms of choosing the strategies that is important to you and also being able to easily plug in these new strategies. And such flexibility helps in separating concerns for different persona and for different environments. For example, a developer in a local environment may need a different scheduling setup from an SRE in a production environment. To promote this reuse, having a consistent API like the shared internal APIs I showed you for consumer groups and consumers and the placeable duck API type we talked about with similar vocabulary is critical for a smooth reuse. Some other benefits to keeping the scheduler component separate is having resource controllers that are easily extensible and loosely coupled from one another. Now, this goes without saying that the scheduler provides a simpler way of supporting high availability and scalability for all your resources. Many times event meshes rely on not just one candidate component, but a collection of them. With one core package to maintain for the scheduler, it makes this whole thing more sustainable across different candidate resources. You may also have noticed that the data plane one time is shared between the Kafka backed components in the previous diagram, which provides a lower memory usage. Next slide. So now let's switch modes to demonstrate the scaling of an example Kafka source. So the core repository associated with the changes needed to run this eventing source is the eventing Kafka broker repository from Knative Sandbox. I have an IBM Cloud Kubernetes cluster already running that contains nine worker nodes spread across three US zones, each containing three nodes. And as preparation for this demo, I've already installed Knative eventing in my cluster along with an IBM event streams instance that has a new topic with a hundred partitions and these are already successfully running. I've also installed the control and the data planes from eventing Kafka broker for Kafka source. If for the purposes of this demo, I've also installed Kda for a quick show of auto scaling. And if you'd like to do all these steps by yourself, these links will take you to the installation documentation. I also have a simple event sync where messages will be delivered by the Kafka source and an event producer goal line script that will just help us write messages to the topic. Okay, so let's switch screens to our terminal. Okay, so let me give you a look into what the Kafka source custom resource looks like. So here I have a Kafka source specified that in the specification has information about the Kafka event streams instance. You can put in your information about your bootstrap servers and also a secret containing that will help authenticate to your Kafka source. In addition, I have specified 12 consumers just started with. Okay, so let's get out of that. And let's go ahead and install this example Kafka source. So one thing is that currently I have Kda disabled for this very first step. Go ahead and install it. Okay, so once it's installed, we can take a look at the status by, okay. It is in the works. It's doing the binding that I mentioned, the data plane pod to a specific consumer resource. Let's refresh. Okay, looks like all 12 replicas that I've requested for have been scheduled. And now let's take a look at how, what the placements look like. Okay, so as you can see here, this is showing the placement's information in the status section of the Kafka source. And as you can see here, the 12 replicas have been equally distributed among three different dispatcher pods. And the way the stateful set architecture is defined, these pods live on different nodes in different zones. So this satisfies HA. Okay, so next we're going to hit this Kafka source, hit a Kafka topic with a few messages just using an event producer script that I have. But before that, let me just quickly re-enable Kda. Let's bring auto-scaling into action here. Okay, let's make sure that it's running. Excuse me. Okay, so that's gonna be up soon. And when Kda comes up, it's going to bring some changes to our Kafka source. Let's wait for a few seconds. Okay, so what you see here is that Kda is in control now and it has auto-scaled the 12 replicas that we had originally requested down to zero because the Kafka source currently is idle and it's not processing any events. I haven't sent any events yet. So let me go ahead and call my event producer script to send some events. I'm going to do just 10 events. Let's send that off. Okay, so all 10 messages have been sent and let's see what's now going to happen to our consumer group. Okay, so what you see here is that Kda has now increased the number of scalable replicas on the Kafka source from zero to one so that there is a consumer replica doing the processing of these events. And to confirm that the events are being received at the sync, we can quickly just take a look at the sync, which is just an event display. And here you see the events that I sent to my Kafka topic have been received and are being logged in my sync service. Okay, and once the processing of these 10 events are over, we should expect again Kda to bring back the replica count down to zero. There you go. Okay, so we've seen the whole flow from Kafka topic through the Kafka source with the auto scaler in action and the messages have been processed and received at the sync location. So now let's make this a little bit more practical by performing a worker node update and see how the Kafka source and the scheduler helps with the recovery. So for that, I am going to scale up my consumer group replicas manually. Let me scale it up to 60. Okay, because I'm gonna send it a whole bunch of events now. Okay, and before we send the events, let's take a quick look at the status. Let's send the events first. Okay, so now I'm going to send a 10,000 events, a whole lot more than we did last time. Okay, so these events have been sent and now let's try scaling up. Okay, okay, so there you go. The scheduling is happening of these 60 replicas that I requested in order to be able to handle these 10,000 events sort of quickly, not sort of waiting on the auto scaler to do it. Okay, now let's decide on which node we want to take down. Let's see if maybe we can take down this pod that actually one of my dispatcher pods is running on currently, 18826. Okay, so let's give this worker node an update so that it's sort of temporarily not available. Okay, so what we're gonna see now is because of the stateful set architecture, this particular dispatcher pod is going to be moved to another healthy node. So it's been moved already to another healthy node. Okay, so what we've basically seen here is that the stateful set architecture immediately recreates a pod on a new healthy node, keeping the same identity, thus leaving the placements still accurate. Okay, so that's the end of the demo. Now let's go back to our slides. Okay, so using Kafka eventing source in an event-driven architecture provides at least once an event delivery guarantee with events in each partition being processed in order. This means that operations are retried until a return code is received. This also makes applications more resilient to lost events. However, it might result in duplicate events being sent among other performance issues like events in a queue taking too long to be processed or not having enough concurrent clients to process events synchronously and we've seen all this. But what I wanna share with you is that the eventing Kafka broker data plane today exposes some of these Kafka consumer configurations through config maps that can be tuned to suit your specific workloads. These are some of the parameters that when tweaked showed a significant improvement in performance, especially to lower the number of duplicate messages and to speed up concurrent processing. Next slide. If you're still interested in the work here, it might help to know that there is work currently being done to expand scalability using the scheduler with other Knative components like triggers and channels. We're also integrating many of these Kafka backed Knative resources directly with the Kata Autoscaler. So direct integration is guaranteed to really simplify further and operators experience whose job is to manage various installations. This also means that we will have autoscaling and the scheduling that we just saw all in one place happening on the same APIs. This brings us to the end of this talk. Thank you so much for attending and taking the time to listen to what's going on in this space to have scalable sources and to support our Knative users in a production setting. It's also been a great time collaborating with our Red Hat serverless team partners on bringing the solution to completion and also members of the IBM research and the product teams at IBM Cloud. I look forward to answering any questions via the online chat or you can reach out through some of these other ways and join the effort to make Knative eventing even more serverless. Thank you so much.