 Hello, everyone, and welcome to KubeCon. And yeah, my name is Jakub Schultz. I work at Red Hat, and mainly, I'm a maintainer of the project called Strimsy, which is all about running Apache Kafka Kubernetes. And today, I will be talking about the work we have been doing there, which is around removing the zookeeper dependency. But before we get there, a little project announcement, because recently, as a Strimsy, we moved from sandbox to incubation. So we're a that's a big achievement for us. Thanks. Unfortunately, it doesn't really allow us to stop working on the zookeeper removal and just enjoy our rest, so we have to finish that. But if you ever run Apache Kafka, you probably know that for a very long time, it always depended on Apache Zookeeper. You had to first start the Zookeeper cluster. You had to get it running. You had to know how to deploy it, how to run it, how to operate it. And only then, you can actually run Kafka, because it was using Zookeeper for storing metadata, for bootstrapping the cluster, for electing the Kafka controller, and all these things. But that's now changing, and the Zookeeper is going away. And so to say, Kafka is now released from the cage, and it's free to fly. The way it is done is that the Zookeeper is replaced by something that's called Kraft, which stands for Kafka-Raft protocol, which basically is Kafka's own take on the RAFT algorithm, which will replace Zookeeper. Maybe I can start by talking a bit about motivation, why this is happening. So one reason is to get rid of Zookeeper, simple as that. Zookeeper has some nice things, and it does many things very well. But it has also some issues, for example, lately we faced a lot of issues around VNS resolutions on Kubernetes, where a lot of our users face some problems. So yeah, these are kind of the things by, to be honest, I personally would not really cry when Zookeeper is gone. But what's also important is that removing Zookeeper simplifies the deployment. You used to have the Zookeeper cluster and the Kafka cluster. You had to know both of them, had the knowledge for both of them, and now it will be just Kafka. So you don't need to know Zookeeper, you can forget everything, you knew about it, and it will be all Kafka. So that simplifies it a lot for all Kafka users, but obviously also for us in Strumzy, because in Strumzy had to also manage and operate the Zookeeper clusters as well. Finally, there might be also some scalability and performance improvements, but I guess a lot of that might be visible when you are having very big Kafka clusters, when you have hundreds, thousands of partitions, then there might be more improvements than when you have a small cluster with just a few hundred partitions. Maybe some of you get the feeling that this is not really new and why the hell is this guy talking about Zookeeper removal here? And that's not completely wrong because this is something that basically started in 2019, and today if I checked correctly in the morning, we have 2024, so it's a very long time, but that kind of demonstrates how big change and how big effort this is. A lot of that is obviously in the Apache Kafka project itself, not necessarily on the Strumzy side, where I spend most of my time, but even on Strumzy side, that's quite a lot of changes and a lot of work. So how will Kafka look like when Zookeeper is gone? The way it is implemented is that the Kafka nodes will now have two different roles. One of them will be the controller role. That's basically what will replace the Zookeeper clusters. That's what will be used to manage the metadata. That's what will be used to bootstrap the cluster. And then there will be the broker role, which will do exactly the same as the old brokers in the Zookeeper clusters we're doing. It will just be responsible for delivering the messages between clients, storing them, and so on. What's interesting is that you can combine these roles in a single JVM or in a single process, which actually gives us quite a variety of different architectures, where in the past we always had Zookeeper and the brokers. Now you can, for example, start with a single container with a single pot, which has both the controller and the broker, and for some development testing, this might be completely sufficient. If you would want some high availability, you can basically just scale it and run three of these containers, each having both roles. And yeah, you get the high availability. It still runs in this combined mode without any Zookeeper cluster or anything. If you would want to scale it a bit more, then you don't really want to scale the controllers up and up because they will not scale forever. So you can simply decide that the next nodes you will add will be the broker role only. And that way you can have some mixed nodes and then you can have the broker nodes. And then finally, what we kind of expected most production clusters will actually use, it's quite similar to the Zookeeper-based architecture where you have kind of the dedicated controller nodes, which do the metadata management and the bootstrapping, and then you have the dedicated broker nodes for handling the messages. Why do you think that this will be what will be used in most production clusters? It's because this way you can control how many resources will the controllers get, how many will the brokers get, and you get the best performance and scalability out of your cluster. But in general, each of these architectures might fit a bit different environment, depends on your infrastructure, whether you run on the edge, for example, or whether you run on bare metal, where each node is beefy and has the same size or whether you run on VMs, and it's quite easy for you to have smaller nodes for controllers and bigger for brokers. In Strumzy, we really want to support all of these architectures because we have users using Strumzy for all of these different things and in these different environments. And we also want to allow the transitions between them, which is quite important if you wanna start small, but maybe you want to grow your Kafka cluster as your project grows, product grows, or as your company grows. There will be also a migration, so if you are already running Kafka cluster based on ZooKeeper, don't worry, it won't be left behind to rot away over the time. You will be able to migrate it from ZooKeeper to the craft forum. It's kind of a multi-stage process where you first introduce the craft nodes, you let them sync the data from the ZooKeeper, then you kind of shift the traffic from the ZooKeeper nodes to the craft controllers, and then you definitely remove the ZooKeeper. So you can migrate like this. It's not as easy to migrate the other way around, so once you run craft, then you run craft and you don't really get to ZooKeeper. What's important is that in Kafka 4.0, ZooKeeper will be completely removed from Kafka. And you have to do this migration before you upgrade to Kafka 4.0, because you have to first, within the same version, migrate from ZooKeeper to craft, and only then you can within craft, upgrade to Kafka 4.0. So that's something that might be useful if you have the ZooKeeper-based cluster and need to plan the migration. So that was kind of the intro to how the craft stuff works and how the ZooKeeper and Kafka look like. I wanted to talk about some challenges which we faced, which maybe might be useful also to people not just interested in Kafka, but also in other projects and tools. Some of these challenges were related to Apache Kafka itself. So as I said in the beginning, it's something that's going on for several years now. So obviously there have been plenty of bugs, there have been plenty of missing features which were added over time. I think that's kind of expected. One of the other issues we faced was that a lot of the stuff which was done initially was not operationalized. So there were no APIs to manage things, there were no metrics to kind of observe the state of the system. And when you work on something like Strimse which is all about automating things, then you add instructions like, yeah, read the log, find if there's some line in the log, and then do something that's not really reliable and something you can automate on. So that was one of the issues. Another issue was what I call premature production readiness which Kafka announced Kraft's production ready quite early for my taste when it still had a lot of missing features and a lot of bugs, but for a lot of users who didn't really follow the Kafka project, this triggered a huge flow of questions on us like, hey, this is now production ready in Kafka, why isn't it supported in Strimse? And to be honest, we spent awfully a lot of time just answering these questions and explaining that maybe it's not as production ready as it sounds in our opinion and that it will take time to get there. But there's other set of challenges which were quite unique to Strimse and to running Kafka on Kubernetes. And one of them comes back to the different architectures which I described. On the beginning, I said that the advantage of the removing ZooKeeper is that you will have now only Kafka nodes, right? And that's great on paper, but then you run on Kubernetes, it can be quite challenging because you have only Kafka nodes but each of them has a little bit different configuration. And that's not always something that fits well because it leads to what I call asymmetric node configuration or pod configuration where each of them is kind of slightly differently configured. And to try to demonstrate how this fits Kubernetes this is how what your old school ZooKeeper cluster look like. You would have, for example, free ZooKeeper nodes and free Kafka brokers. Each of them has its own kind of ID which has to be unique but it needs to have a unique ID for the ZooKeeper nodes and unique IDs for the Kafka nodes. It doesn't matter that there's Kafka broker with the ID one and a ZooKeeper node with the ID one. Also, each of them has a different configuration but they are simply different applications. So if you want to deploy that on Kubernetes, the kind of obvious thing is you create two different stateful sets. Each of them has its own configuration for the application, each own configuration for storage, resources, scheduling and all these things. So that fits quite nicely into the Kubernetes workload APIs. But when we move to the ZooKeeper less Kafka cluster, yeah, you have only the Kafka nodes but each of the Kafka nodes has a little bit different role. Maybe here you started with free brokers and some mixed nodes with both controller and broker role. Then you scaled up, added some more brokers. Then you decided that maybe it's better to run the dedicated controller nodes instead of the mixed nodes. So you started adding the dedicated controllers and then you will remove later the mixed controller nodes. So it can easily have the Kafka cluster in a state like this. Now, all of these nodes, they share the same sequence for the node IDs. So they start somewhere and end somewhere but you can't have their two Kafka nodes with the ID one. It has to be unique. So that's kind of another limitation. And each of them has a different configuration for the application itself but also for the resources for storage and so on. So the brokers, they will have the broker configuration. They will have the CPU and quite a lot of memory. They will have definitely a lot of storage. The mixed nodes, they might have the same hardware configuration or resources configuration but will have a different configuration for the application. Then you have again some brokers which have the broker stuff and then you have the controller only nodes which will have only the controller configuration and will possibly have less resources because they don't need as much resources as brokers. So you have kind of only Kafka nodes but you can't easily go and represent them with a state full set. So this is something what we had to deal with and it's funny because something similar was discussed in one of the previous talks about the Mongo operator. We actually took the decision to leave the state full sets behind and we moved to managing the bots ourselves directly. Internally we are actually using our own resource for it which is called Strumzy Podset but it's kind of just to better separate the code and better kind of design internally the code base. It's not something the users will be using but what we did is we introduced another custom resource which is called Kafka node pools which is used by users and it's used to define the configurations for different groups of the Kafka node. So you can create a node pool for the brokers, you can create a node pool for the controllers and so on and you can of course change it as well. And to make this a bit more challenging we had to roll this out with while maintaining some backwards compatibility making sure that the existing Strumzy users can easily move and upgrade into these new things and so on. So that alone was quite a big challenge for us and yet pretty much took the last two years of my life. So hopefully it will be worth it. And how does it look afterwards is that, yeah we have now only Kafka nodes but each of the group of the Kafka nodes it's kind of treated separately and it's kind of using the Kafka node pool to be configured but because we don't use stateful sets we can then easily play with the different IDs how they will be shared between the nodes and so on which stateful set would not really allow us. Okay, let me see if my demo works. So I wanted to show how you would deploy the Zookeeperless Kafka cluster with Strumzy. The way you would do it is basically you would as before use the kind Kafka resource which will be used to define kind of the Kafka cluster entity. That's what contains the different shared configurations which apply to the whole Kafka cluster. So for example, the version of Kafka use is currently here and it's always shared between all the nodes. The listeners are kind of shared, the Kafka configuration is shared as well. So these are in the Kafka resource as they were before but then the user basically adds the Kafka node pool resource which defines the different groups of nodes for the Kafka cluster. So in this case, I specify that I want to have these nodes with both the controller and the broker role so that they have both of them. I say that these nodes should have three replicas. I define how many resources should these nodes have and I also in this case define what the storage should be used by these nodes. And when would I do kubectl apply? Then if I write it correctly, I get something like this where you can see that I have the Kafka cluster named my cluster and it has three different nodes. And if I would dig into the pot and check the configuration, you would see that it has both of the roles for broker and controller and they can manage the cluster but also send the messages. But what I can do is I can simply take another resource, so another Kafka node pool and in this case, I specify only that the role should be broker and I again specify how many replicas do I want, what should be the resources, what should be the storage. In this case, it's just a simple demo cluster so the resources are actually the same but you could of course have them different. And then if you notice here, I have here some special annotation to make it a bit easier to manage the different IDs so I say here that I want these nodes to be numbered 100 to 199 and not continue 345. And now when I do kubectl apply and watch the pots then what we can see is that we have these three new pots starting here. They are named my cluster broker 100, 100, 102. And these are the new brokers I'm just adding to my cluster. And if you would already have the node pool, you can add third, fourth node pool but you can of course also do just kubectl scale to add additional brokers to the pool as well. And maybe once these brokers are ready, I decide to migrate to this other architecture where I have the dedicated controller node and the dedicated broker node. So what I can simply do now is I can do kubectl edit Kafka node pool Kafka which is the Kafka node pool resource where the mixed roles are created before and I can simply go and remove the broker all from the cluster. I can apply this and when I watch the pots again, I can see that the old pots are now being rolled by the operator and they are being reconfigured from having both of the roles to having only the controller role. So if you remember the slide with the different architectures, I basically just now moved from the architecture with the mixed nodes to the architecture which I called as a most production ready with the dedicated controller nodes and with dedicated brokers. So that's the demo. So actually I cheated a bit because of the time. I didn't have any clients there so there were no data. If there would be some data, then the operator would automatically block the change and you would have to reassign the data to the brokers. For example, we have a cruise control integration. So you could use cruise control to kind of move them to the new brokers which I created and then the operator would remove the broker role. But that would take several minutes to kind of execute so that's a bit tricky for a 25 minute talk. You need to bring the cruise control. Do all the needed and only after that you can remove the role. And maybe it could be implemented with the operator just you're removing the brokers. The data should be evacuated from there. So I hope that one day we get to these things such as triggering the cruise controller balances automatically when these changes happen. To be honest, we don't have that yet. It's only something we are planning or talking about. And one of the reasons is that we spend the last few years working on craft. But yeah, it's a good point that it's something that we are thinking about. But already today you don't have to use the APIs of cruise control yourself with KURL or something like that. We have custom resources which you can use to kind of trigger the removal of the node. So it's not completely automated but if you would want you can for example write some simple shell script to do the automation yourself. Thank you. So the question was if the brokers are being run on the same nodes, in this case they are probably not because in the background is OpenShift cluster with multiple worker nodes. But yeah, I just simplified the YAML because if I had all the affinity rules, topology spread constraints and so on then the YAML would be super long. If you want in the autumn in Chicago in the data on Kubernetes day I did a talk about making your Kafka cluster production ready where I cover kind of all of these things you should consider for production like affinity, configuring the record awareness and so on. There's YouTube recording so that might be a good thing to check for all the things which I kind of omitted from the demo to make it a bit easier. I asked this because the 100 brokers they were running quite fast. So that means, I mean, the nodes they were already there or this is what. Yeah, the worker nodes already existed there. Ah, okay. Okay, I wanted to touch a bit on the different risks which I might see behind this implementation. One of them is around managing the pods ourselves because that means that you take over a lot of responsibility which otherwise Kubernetes does for you. When you create a stateful set then the pods are actually managed by Kubernetes not by you. And that means if there is some disaster in your Kubernetes cluster and if for example an availability zone goes down or some nodes crash, then it will be Kubernetes who will be kind of responsible for re-spinning the pods. When you manage it yourself without the stateful set then now it's Trimsy who's responsible for it. And if something fails then yeah, you might have a problem and let's be honest, the stateful set control in Kubernetes is way more used and way more proven than Trimsy or do I think we have our share of users. So that's kind of one risk. Another risk, I'm a bit curious to see how much do people get used to the combination of the Kafka and Kafka and Opel resources. I don't think we are the only operator using multiple custom resources like this but it's a big change from what we did before. So I'm a bit curious how quickly do people get used to it. I also wanted to touch on where we are today. Just last week released Trimsy 040 which is a big release for us in terms of craft support because it enables it by default so you can now just choose on your own whether you want to use craft or zookeeper based clusters. And it's also the first version which adds support for migrating to existing Trimsy based zookeeper based Kafka clusters to craft. So yeah, if you have some existing clusters you can try that as well. There are also some limitations which I don't think are necessarily production critical but yeah, let's be open about them and let everyone decide if they want to run crafting production or not. For example, there are some limitations around scaling the controllers. There are some limitations around unclean leader election. These are actually in Kafka itself. On the Trimsy side, we for example have some limitations around the J-Bot support for using multiple disks in the craft nodes. Hopefully this will be all addressed later this year but yeah, I think in many cases you can use it in production even without it. And finally going back to the timeline which I shared on the beginning and let's talk a bit more about what's next. Later this year probably October or something like that Kafka will release the version 4.0 or that's current plan, of course plans change sometimes. So in the autumn Kafka will release Kafka 4.0 which should remove the Zookeeper for good and then probably early 2025 we will kind of follow that in Trimsy and remove support for Zookeeper from Trimsy as well. I guess we will try to provide some bug fixing and CV fixing for the last version with support for Zookeeper for some time but yeah, this is for someone maybe going a bit too quick but you can't stop the progress so sooner or later every Kafka user will have to move to craft. And that's it, thanks for your attention and I will be around for the rest of KubeCon so if you would have anything else around Trimsy or Apache Kafka Kubernetes or even about this talk then feel free to stop me whenever you see me or ping me on the CNCF Slack and so on. Thanks.