 and welcome back to another OpenShift Commons briefing and today we're really happy and pleased to have Kyle Liberty and Paolo Paterno with us from Red Hat. We're going to talk about finding a workload balance and using cruise control for Kafka on Kubernetes. So I'm going to let them introduce themselves. They've got a wonderful presentation and a couple of deep demos to share with you today and if you have questions please join us in the chat wherever you are on Twitch, YouTube or here in BlueJeans and we will get that relayed to our guest speakers today. So Kyle and Paolo take it away and I'm looking forward to this talk. Yeah thank you very much everyone. Thank you for joining us for this session. So let me introduce ourselves quickly. So I am Paolo Paterno. I am Prisma Software Engineer in Red Hat working on the AMQ Streams team. So I mostly work on Apache Kafka project and on Streams that we'll introduce in a little bit about running Kafka on Kubernetes. I am here with Kyle. Over to you Kyle. Hello everybody I'm with Paolo on the AMQ Streams team and back to you Paolo. So let's see what we are going to see today during this session. We are going to have a kind of quick introduction about Kafka for people who don't know about Kafka itself and then how the Streams project that yeah we worked today. We work on today is really useful for managing Kafka clusters running on Kubernetes so on OpenShift. I will cover these two main points then I will hand over to Kyle about talking on Chris Control and how Chris Control is really useful for balancing Kafka cluster and how it's really integrated pretty well with the Streams. So let's start a little bit about talking about Kafka. Let's a brief introduction about Kafka. The Kafka definition is kind of changed over time or years. So Kafka at the beginning was mostly used as a messaging system. In the end Kafka is a messaging system based on the published subscribe pattern where you have some producer publishing messages on topics and on the other side consumers subscribing to topics and getting these messages. It's a distributed messaging system but over time so it's evolving as a standard de facto for doing some event streams processing so stream processing in real time even ingestion things like that. And in the end we can say that Kafka is just a commit log. So you can think about these topics as files, as logs where all the messages are appended by the producers. So all the events ingested by Kafka are appended by the producers and then on the other side they are read from the consumer. So since the beginning other than to be a distributed messaging system it's really scalable and for tolerance. So it's scalable in sense that we can add brokers, we can handle a lot of consumers on that side, we can add more producers. So we can end the high log traffic in order to exchange messages across clients. And it's even for tolerance in sense that if you have your Kafka cluster deployed with more brokers but at some point one or more brokers it depends on the size of your cluster go down. The Kafka cluster is still alive, the clients can still use the brokers in order to exchange messages through the topics. So it's really for tolerant. But Kafka is not just about the brokers. So the core of the upstream Apache Kafka project is the broker itself for exchanging messages across clients but it's more. So the upstream projects provide even library related to the producer and consumer API that you can use in your clients for using Kafka as a messaging system or an event streaming platform. There is even the Kafka Streams API which is a kind of library that is on top of the producer and consumer API for doing some data streaming analytics in real time. So using a pretty simple DSL in order to do some filtering, mapping and real time processing on your data without using the low level producer and consumer APIs. There is even Kafka Connect which is today used for moving data through Kafka. So using Kafka topics across other different systems like for example database. So maybe people know the project called Divisium which provides some connectors for moving data from some database to the others. So for doing some CDC, so change data capture for example, getting the change of a database into a Kafka topic and then moving this change to a destination database for example. Another project is for example Mirror Maker which is really useful when you want to mirroring your two different so one Kafka cluster from one data center to another for example. So it's not just the brokers, it's even more it's an integrated ecosystem of different components. So this is what Kafka is today. Here you can see a kind of simplified Kafka architecture. So there is a kind of really simple Kafka cluster with just three brokers. I already mentioned that all the clients exchange messages using topics. The topic in the end is a kind of virtual object because Kafka it's using shards. So each topic is made by shards that are called partitions. And for example, in this case, I have one topic with the three partitions zero one and two. And then for each partition we can have one or more replicas that are useful for full tolerance because I have more copies of the same partition on different brokers so that if one broker goes down I can use the copy of that partition on another broker. So this is why Kafka is really full tolerant. So here we have these three brokers Kafka cluster with the three partitions made of three replicas and that the green ones are the so-called leaders. So each partition is leader on one of the broker and the leader is an important role because a producer and a consumer can exchange messages through a partition. So sending and receiving messages through a partition connecting to the leader while all the others are called followers because they are just copying the messages from the leader just for full tolerance. From the consumer side it's not even true because today there is a new feature in Kafka where a consumer can even read from the closest replica for example. So not really from the leader but the base way as Kafka works is just connecting all the clients to the brokers where leaders leave. From a scalability point of view Kafka can scale because for example if you add more partitions you can add the partition on a different brokers for example in this case. So we add the new broker, we have partition tree added on the new broker and you can scale even on the consumer side. So you can have more consumers getting more messages from more partitions and it's all handled by Kafka. In the Kafka land compared to the other traditional messaging system we can say that the Kafka brokers are more dumb than the clients that are really smart. For example the consumers taking information about where they are reading the messages. It's not done on the broker side. So for this reason the Kafka brokers can even handle more loads and more consumers with more traffic. Talking about fault tolerance as I already mentioned it can happen that if a broker goes down in this case for example with the three brokers in Kafka cluster you add the partition two to be leader on broker two. Broker two goes down and the new leader of partition two is elected for example in this case on broker one and the application will just connect to the broker one and continue to exchange messages through partition two even if the broker two is down. When the broker two will come back again it will happen that the new leader election will happen. The preferred leader will be again partition two on broker two. So everything will back in the starting situation as we saw maybe on the first slide. So this is how somehow Kafka is handling the scaling side, the fault tolerance side and so on. But we can say that managing Kafka is not so simple. So let me say it's kind of hard. You have a lot of configuration to do on Kafka brokers so if you can double check the Kafka documentation there are a lot of parameters that you can set in the configuration. You have to operate your Kafka cluster. Operating your Kafka cluster means that you have to deploy your Kafka cluster. So having the same configuration on all the Kafka brokers you have to set up the connection across brokers. Kafka today runs alongside the ZooKeeper Ensemb. So you also need a ZooKeeper cluster where Kafka is storing some information, some metadata like for example topics and partition information, the ACLs for the user so the access control list where the user can the clients in general can read and write topics on the cluster, information about with the controller, what is the topology of the Kafka cluster and so on. So there is one more thing to handle which is the ZooKeeper Ensemb. That is going away likely the latest Kafka 2.8 RIS as a kind of a way for deploying a Kafka cluster not yet in production but you can use itself without ZooKeeper and it will be removed in the trade.x version that are coming but for now you have to deal with ZooKeeper running alongside Kafka. So you have to operate your Kafka cluster when you have to upgrade some configuration but I mean there's we have to re-spin the Kafka brokers again so it's all in your hands right even from a development point of view it's not so simple for a developer to have handy a Kafka cluster running you have to set up Kafka on bare metal or even I don't know on on virtual machine so this is where even running on Kubernetes on OpenShift comes into the picture not just for production but even for development of course. So talking about Kafka running on Kubernetes there is an easier way to do that. This is where the streamed Z projects come into the picture so we could use the Kubernetes resources for deploying and handling our Kafka cluster. Kubernetes provides us state full set secrets, config maps and all the other stuff that you can write so you can write your yamls, describe your pods for your Kafka brokers, the secrets with the TLS certificates, the config map with your configuration but it will be always on your hands so you have to handle all these yamls, you have to update these yamls for upgrading your cluster, etc. With streamed Z you are getting an operator pattern approach so you don't have a human operator handling your Kafka cluster but you have an application so in the end an operator is an application the pod running in your Kubernetes cluster that has the knowledge about your business in this case the business is Kafka is running Kafka so it has all the knowledge about how Kafka works, how upgrade Kafka brokers, things like that. With the streamed Z operator approach you also use the way to extend the Kubernetes API so using the so-called the custom resource definition you can extend the Kubernetes API adding some more custom resources so other than having pod state full set deployment all the stuff that you have in Kubernetes natively you also have a new Kafka custom resource and new Kafka user, Kafka topic custom resource, etc. Even for handling Kafka connect, Kafka meter maker, etc. So this is what streamed Z is providing to you. So it automates the configuration of your Kafka cluster describing your Kafka cluster through a Kafka custom resource and the deployment and then even the upgrade of your Kafka cluster. It also provides a built-in security in sense that you can configure TLS certificates, you can encrypt connection with clients inside the cluster so across brokers, you can configure authentication and authorization for your users and we say that it's a simple user interface because you are to reuse your Kubernetes experience to handle a Kafka cluster. So if we deep a little bit more into it we can see here how a Kafka custom resource looks like so you have a new kind which is this Kafka custom resource and in the spec you can just describe the configuration about your Kafka cluster so the version how many brokers I want, the configuration parameters even the storage that can be the ephemeral one or even the persistent one and you can describe even ZooKeeper with at the same time the number of nodes, the storage and more. It's just a really simple Kafka custom resource in the first demo I will show, let me say a more complex one. What happens when you create your YAML, you create your Kafka custom resource? The streamed Z operator which is watching for the Kafka custom resource that you are creating will just take care of it so it will just translate this Kafka custom resource into creating some Kubernetes native resources like it will create a stateful set handling the pods for Kafka brokers, the pods for ZooKeepers, even the secrets for the certificates for connecting from outside to the brokers for intra encrypted connection in our cluster even creating the config maps with the configuration the persistent volume claim for yeah for persistent volume for storing your messages in a persistent way so the streamed operator is doing what you should do manually but it's doing everything for you so that if you update your Kafka custom resource in the config section for example and I don't know tweaking some parameter if this parameter needs the Kafka pods to be restarted the streamed operator will take care that for you so it will start a rolling update one by one of the pods so that the clients will continue to use the Kafka cluster because the streamed operator will restart just one pod at time so it will take everything for you the same if you use I don't know the Kafka connect custom resource for deploying Kafka connect or for mirror maker but even for creating a topic or creating a user with the corresponding ACLs you don't need to use the Kafka tools but you can just define a Kafka topic custom resource a Kafka user with all the information about that we will see a little bit more in the demo but we are not out of the woods yet because when you start to use Kafka at some point your Kafka cluster becomes unbalanced because you are using maybe more partitions than the others like for example you are sending messages with some keys the producer will decide what is the destination partition based on the key for example so you are sending more messages with the same key to a specific partition to a specific broker so on that broker for example you have this partition getting a lot of messaging some other partitions and few partitions on the other brokers so you have the first one getting more messages than the others so you can have a poor performance for that from a storage point of view because you are you utilize more the first brokers than the others the network you are not spreading the traffic the network traffic across all the brokers even about CPUs right so this is a scenario where you should use kind of better your Kafka cluster because you are putting all the traffic your load on broker zero and broker three maybe but not utilize it so so well all the other brokers here so this can happen even when you add more brokers or brokers goes away and then come back so this can happen what does this mean for example if you start in this scenario where I have three brokers I have a topic with four partitions they are distributing in this way when the topic is created of course we are getting two leaders on the same brokers because there is nothing that we can do but at some point we spin up a new broker we would like to have the partition tree moved on the new broker but it's something that doesn't happen automatically in Kafka if in this scenario you create a new topic then the partitions will be spread across all four brokers but what you already have it's so it's just still running on the same brokers that you have before so what you would like to have is just having the partition tree be moved on this new broker for spreading the loads across all the brokers for using in the best way your cpu network storage on all the brokers that you have now so the problem is as I already mentioned that this doesn't happen automatically so this is another scenario why you can have unbalanced clusters at some point when you start adding more brokers but you already have some topics spread with the partitions across the already running brokers what we can do today for we have to rebalance the cluster right it's not simple because it's a moving target so brokers can come in and come away you should optimize in terms of network storage so you should get some metrics you need some metrics to know how the brokers are behaving and then rebalancing and moving partitions in order to have more resources use it in in more or less the same way across all the brokers the old way today is to use a Kafka reassigned partitions tool that is provided by the Kafka project upstream but it's a kind of manual process so you have to run this tool for getting in a JSON file how the partitions are distributed across your cluster then you have to specify this JSON how you would like to move these partitions for rebalancing and then you have to run this tool again for starting the rebalancing it's a one-dimensional balancing because it's based just on storage so you cannot get the rebalancing related to network and cp when things like that so we can do even better and at this point i would like to yeah to end over to Kyle to introduce what you can use for having and for doing rebalancing even better using criss-control and how criss-control is really integrated pretty well with streamers so in the Kubernetes land so ending over to you Kyle sharing my screen can everybody see my screen look good paul yes yes so as paul said we can do better we can use criss-control so criss-control is an open source tool developed by linkedin it offers us fully automated rebalancing and what i say by that i mean criss-control constantly monitors the state of your cluster it calculates and caches optimization proposals to best balance your cluster it also provides a rest api for querying the state of the cluster and generating these optimization proposals and rebalancing your cluster based on these optimization proposals criss-control offers fine-grained resource tracking so it can track the resource utilization of brokers and topics and even down to partitions and criss-control also offers us multi-dimension balance multi-dimensional balancing so criss-control can optimize rebalance proposals to target several different factors of a rebalance including things like rack awareness or resource capacity and utilization for your resources like disc cpu and network io and also per brick per broker replic account and many more which i'll go into further detail later so here i've just drawn a simple architecture of of criss-control as you can see each broker runs the criss-control metrics reporter alongside it's run as a java agent and it'll collect raw apache kafka broker metrics and it'll do a little processing and it'll store those back into a kafka topic now these were these process metrics will be taken in by the criss-control worldwide and sampled over time to get a more accurate view of those raw metrics and build a model of the cluster what it looks like at that point in time and it'll pass this model to the analyzer which is basically the core it's the brain of criss-control so criss-control will use the analyzer to build an optimization proposal of how to balance your partitions using the cluster model which was produced by the load monitor for optimum optimally rebalancing your cluster and we can also feed the analyzer a list of constraints with how we want the criss-control brain to come up with this optimization proposal but we'll go into more of that in further slides and as i said before the criss-control also offers a REST API where we can query the workload state of the cluster and get detailed information on brokers topics and partitions and also reviewing those optimization proposals that are that are generated by the analyzer and then execute partitionally balances using the executor so if we drill down more into the the data that's kind of moving through criss-control so we have as i said before we have a metric reporter the criss-control metric reporter running alongside every broker in our Kafka cluster and these are collecting they're collecting raw apachey Kafka broker metrics which you're probably familiar with information like the partition size the broker cpu utilization of that broker the topic bytes and then out and then the message rate you know for this at the partition level the topic level very detailed so criss-control will sample these raw metrics to get kind of a better view of these a more accurate accounting of these of these data points and you can see you know we get kind of a we get the broker idea where that those raw metrics came from we get a better measure of the cpu utilization the disk usage and the network IO as well so these metrics uh they're read and processed from these raw metrics and we can produce these kind of samples these metric samples at the partition in the broker level and they're used by the load monitor to get an accurate sound accounting of the data surrounding the brokers in your cluster so here's kind of just a diagram showing kind of how that data is aggregated and and put together and processed so we have our metric sample which we saw in the previous slide which we can basically we can customize the sampling interval we can also customize reporting interval just depending on what our needs how fine the range we want those metrics to be we also can customize time windows to basically create snapshots of these metric samples that we take and we specify how many snapshots we want the analyzer the brain of cruise control to use to come up with a cluster model so we can we can basically set up these numbers so we can get the past hour of traffic of cruise control or the past you know day past week month year it just depends on you know how quickly your catholic cluster changing or moving and so we can just customize that for to meet your needs so just to reiterate a specified number of snapshots which the user will specify produces a precise and up-to-date estimation of what the cluster looks like at a given point in time and it's used to create a cluster model so the cluster model is basically just the workload data of the cluster resources so this is kind of a very simple example of what a cruise control cluster model will look like so you can see if we had a three-broker cluster you know broker zero is 24 replicas 24 liters you see how much disk that broker is using the CPU utilization the network IO but there is much more detailed information we can get from this cluster like down to the topic and even as far down as the partition level as well and this cluster model will be used by the analyzer to basically simulate partition movements and decide how to best rebalance the Kafka cluster given constraints we give it and as I alluded to before the cruise control REST API it's there for the user to basically query information about the cluster like the current broker partition load the Kafka cluster state the optimization proposals what they look like the information about them from the analyzer and also executing those optimization proposals using the executor and many more so like you know Kafka and of the other Kafka components like Miramaker and Kinect Stringsy offers integration with cruise control this optimization this partition balancing tool so Stringsy will deploy cruise control and auto-configure it it'll also roll all your Kafka brokers to include the cruise control metric reporter agent for collecting those fine-grained metrics and Stringsy like all of his other components offers a simple user face user interface for interacting and controlling your Kafka cluster all in one centralized place in the Kubernetes CLI and custom resource so I've expanded on Paolo's diagram for the cruise control part so using the Kafka resource we will declare our cruise control configuration for our cruise control application the Stringsy operator will see that description in our Kafka resource and like it's deployed as the Kafka cluster it'll deploy the cruise control application alongside everything else and it'll also roll all your Kafka brokers to include the cruise control metric reporters or Kinect and get gather that data we also have a Kafka rebalance resource and this is used for interacting with the cruise control REST API I was talking about earlier so we edit and basically we can using the rebalance resource we can generate we can tell cruise control to generate optimization proposals based on custom constraints and then execute the rebalances based on the optimization proposals that cruise control creates for us so here's just a closer look at the same Kafka resource that Paolo showed earlier only with the cruise control configuration so here we see the things we want to focus on here in the next slides are the cruise control goals section which is in the cruise control config note this is you can put the majority of cruise control configurations here in the Stringsy Kafka resource as well and also the broker capacity section here which we'll I'll show in the next couple slides so as I said before we have constraints that we can give the analyzer to come up with optimization proposals so these goals or constraints cover many different dimensions of a partition rebalance like for example rack awareness ensures that replicas are spread across different racks we have you know can specify the replica capacity goal which ensures that no broker contains more than a specified number of partitions and then there's also resource capacity goals which make sure that broker resources don't exceed a specified threshold so we give cruise control these goals or constraints and the optimization proposal we get from cruise control is promised not to not to violate those constraints that we give it so and note there's two there's two different types of goals here that we can feed cruise control configuration there's hard goals and there's soft goals so hard goals are goals that are constraints that must be satisfied by cruise control when it's generating optimization proposals they can't be violated so if we tell cruise control we want no more than two replicas on each broker the optimization proposal will either give us an optimization plan which will meet that restriction you know balance of partitions so that each broker only has two replicas but if it can't meet that restriction and it's a hard goal cruise control will not generate an optimization proposal it'll say no we can't do that we also can provide soft goals for cruise control which is you know maybe we have constraints that aren't as important maybe the disk usage per broker isn't that important and so we can specify as a soft goal and cruise control will do the best job it can to satisfy that constraint but if it can't meet it exactly you know maybe maybe it's at you know 80 percent you know of your disk but you specified 70 percent but it's okay because you said it's a soft goal so cruise control offers kind of a lot of customizable flexibility there when setting up these constraints also specified you start on that Kafka custom resource we have configurable capacity limits which are used by the cruise control goal restrictions so here we can specify the exact amount of disk that we want each broker to be limited by the cpu utilization or the network throughput so if you look at kind of this diagram here if we specified a cpu capacity like this line here note that the cpu capacity the cpu usage bars right here will not exceed that line so we specified this in our Kafka resource and the goals will use that to respect that when cruise control is coming up with an optimization proposal to balance our cluster so we've looked at the Kafka resource in detail now it's time to look at the Kafka rebalance resource so remember the Kafka rebalance resource is the interface to the cruise control API so we can communicate with it and it focuses mostly on just rebalancing the cluster so here we can see that we have a Kafka rebalance resource you know it's you know an object just like you know we have a type Kafka resource we have a type Kafka rebalance resource and we specify a goal section here remember those constraints we were talking about in the previous slides so the goal list we provide here it will override the default goal list we specified in the cruise control configuration in the Kafka resource so maybe you know maybe generally you want cruise control to be balanced in a certain way but maybe you have a sudden need to balance your cluster maybe have different new constraints and you just it's just a one-go balance so you can specify a new list here and pass it to you know Kubernetes the cluster operator will communicate with cruise control and have cruise control create a partition rebalance optimization proposal based on these restrictions that you've put in this Kafka rebalance resource so the Kafka rebalance resource goes through a series of states so it starts when you create a Kafka rebalance resource it starts in this proposal pending state that just means cruise control is calculating an optimization proposal based on those goals you've given it and once that's complete cruise control will tell it'll tell the cluster operator a cluster operator will pass this information back to the Kafka rebalance resource and it'll move to the proposal ready state that just means the optimization proposal was generated by cruise control and it's ready to go and until the user specifies that it's looks good and they want to rebalance um cruise control will wait but once you're ready you can set it you can annotate the Kafka rebalance resource the cluster operator will talk to cruise control and it'll start rebalancing your Kafka cluster in which the rebalance resource will be in the rebalancing state and once it's finished once it's all done the rebalance is complete the Kafka rebalance resource will move into the ready state so you can basically track the the life cycle of a rebalance through this kind of state machine so we track kind of the state of the Kafka rebalance resource through the custom resource status section so here we have the second state the proposal ready state so that remember that just means that cruise control has come up with an optimization proposal based on the strengths you've given it and it also feeds us some nice information about that rebalance so this is a dummy example so we didn't the partition's removed in this example didn't have any data so there's zero data moved but in a real-life situation you'd have the exact number of you know data you need to move you know and you have information like you know how many uh leader partitions are being moved and the replica movements and they also provide cruise control also provides a balanced in a score which is basically their their view of how balanced your cluster is and you know as we see here before we've executed this rebound before we tell the Kafka rebalance resource we want to move forward with this rebounds we see that cruise control thinks that you know our our balance score is lower than what it could be you know based on the cluster model that it's created it's understanding of the cluster at that point in time and you know after the after cruise control has done its work you hope to have a somewhat even distribution of load across your brokers it's it's probably not ever going to be you know perfectly you know flat but it's going to be in much better state than what it was before based on your constraints but anyway enough talking why don't i dive into the cruise let me dive into introducing Strimsy and then we'll dive into a second demo for cruise control all right so in this demo i'm just the first demo i'm just going to show you how to deploy a Kafka cluster and what you know kind of the nice interface Strimsy provides for that Kafka cluster and how it can manage manage it so i already have the Strimsy operator deploy and i'm just going to show you the custom resources that we provide with Strimsy using custom resource objects let's take a look at those so as you can see we have the Strimsy objects based on custom resource definitions so we have a Kafka resource object that's used for creating Kafka clusters and Kafka topic objects for creating Kafka so let's just make sure that our our cluster operator is up and running so we can just double checking making sure that it's there so now that we see it's ready why don't we just deploy our Kafka cluster by sending it a description of the Kafka custom resource like this and while that deploys let's just take a closer look at what i that description of the Kafka that Kafka custom resource that past Kubernetes so here paul has already showed most of this there's a little extra of ad here which i want to highlight so as you can see we have you know our Kafka resource we're running Apache Kafka 2.7 we could have put you know any Kafka version here that's specified by that's supported by Strimsy and it would have deployed a Kafka broker of that version this cluster just has one broker we also have a listener section which is basically bootstrap addresses for accessing a Kafka cluster we have one bootstrap address this allows insecure traffic and one bootstrap address that allows it requires TLS client authentication and we could have put another listener here to allow traffic from outside the Kubernetes cluster here as well but i haven't included here we have our Apache Kafka broker configuration we can put here it'll be automatically applied to all the brokers in the cluster we have some jbox storage 10 gigabytes per broker that's persistent and we also have you know because Kafka still has a hard dependency on ZooKeeper we have to deploy a ZooKeeper cluster alongside Kafka we declare it here we have one ZooKeeper instance with this storage so these customizations here are not exhaustive we could have added also other configurations things like metrics and security and other you know Kafka components let's just see that this was deployed oh so we can see that our Kafka broker has been deployed so as our ZooKeeper instance so it's done its job and with this cluster there are a few things Strimsy gives us for free out of the box one of those things is security so you know all communication within the cluster is encrypted and authenticated by default and now we also get automated configured configuration management which Paulo talked about earlier where we make basically one change to our Apache Kafka broker configuration in the Kafka resource and it's applied to all of our brokers automatically so now that we have our Kafka cluster running let's create a Kafka topic now so we can do a little more interesting things so I've created a Kafka topic just like that you know the same way I created a Kafka cluster using a Kafka top resource but let's just take a closer look at what I've just passed Kubernetes so here we have you know our Kafka topic object which I was talking about before we want this topic to have three partitions and one replica per partition and the Apache Kafka topic configurations can be listed here and it'll be taken care of us taken care of it it'll be taken care of by Strimsy for us and note that it's named my topic which will be later so we've passed Kubernetes a topic resource right here and the topic operator which we deployed with our Kafka cluster will read that resource and create a Kafka topic in our Kafka cluster just like that and as we talked about earlier we have a simplified user interface so to interact with our topic see that it was created we can use the Kubernetes CLI like this and search for our Kafka topics so we can do like this so as we can see um we have our our other Kafka topics are consumer offsets but we also have my topic which we just declared and I showed you we have three partitions and we have one replica replica per partition just like that so now that that's been created let's go ahead and create a Kafka user that can write to this so we'll again you know I think you get the idea we'll apply another YAML file that describes our Kafka resource our Kafka user resource let's take a closer look at that so there's a few things I want to focus here one is our you know authentication field right here so this will allow our Kafka user to be recognized by the Kafka cluster here we're specifying we want our Kafka user to use TLS client authentication we could have used other authentication methods like Scramshaw we also define an authorization section here so our Kafka user will have privileges to interact with the topic we've created here we use access control lists for accessing our topic for for reading to my topic as you can see here as well as writing to my topic we created the previous steps and then the last thing I want to focus on here is the user quota section which allows us to limit how much our user can read and write to the brokers in our cluster so here we can limit the producer byte rate and the consumer byte rate as well as well as the CPU utilization limit as a percentage of time used by the client group that this user is a part of so just like our Kafka topic resource we've created a Kafka user resource and this will be used by the user operator which we deployed originally with our Kafka cluster and that will create a Kafka user in our Kafka cluster and we can interact with our Kafka users just like we interact with our Kafka topics and other objects using the Kubernetes CLI like this so as we can see my user which we just showed you in the previous step has been created and it's authenticated to the cluster using TLS client authentication and it's and it's ready to go so now that we have those set up you know we have our topic we have our user let's start writing to this topic so I'm just going to go ahead here and deploy some producer and consumer apps so let's take just a closer look at what I've deployed so the core things I want to highlight here is that we are using the secure bootstrap address that showed you in the Kafka resource so this bootstrap address is so clients can access Kafka but it only allows authenticated traffic so these producer consumers have to be authenticated to my topic which we created in the previous step by using the Kafka user my user which we created in the previous step so these environment variables will hook up this producer and this consumer down here to the Kafka user we created so we'll be able to read and write to our Kafka topic my topic so those have probably been deployed by now so let's just make sure that they're writing and reading messages accordingly so if we go here and we look at the logs they speed up the traffic to logs producer we'll follow that so as we can see we have our Kafka producer it's authenticating to our cluster and it's writing messages now let's check on to our consumer make sure that's reading appropriately and we can see that our consumer on the right side is reading those messages you know it's authenticating to cluster and reading those messages from our Kafka cluster appropriately so note that no other clients that are using the secure bootstrap address will be able to read and write to this topic without being tied to our Kafka and we can nicely you know interact with our Kafka topic and Kafka users and other objects using the Kubernetes CLI but anyway that's the end of this demo let me move on to the cruise control demo to show you some more interesting rebalancing magic so I've recorded this part of the demo for to kind of reduce the the wait times these cruise control can take some time to generate proposals so let me go into that so here we're just going to show you you know how to use stringy cruise control integration so so far our example our demo has just used one Kafka broker so this raises concerns with performance and fault tolerance as paul has told you and we can even see that our topics partitions are piled up on one broker here you can see that all our you know broker zero is where all our partitions of our cluster are right now which isn't good so we're going to want to scale our Kafka cluster by editing our Kafka resource like this so here we can just change the replicas field right here to three and that will scale our Kafka cluster to three brokers so we'll see that you know as a similar pair in the last demo we have our Kafka resource and the cluster operator will read that Kafka resource description we provided and it'll create a Kafka cluster based on that description so we'll have three Kafka broker pods when this is finished let's just give it a second to let the cluster operator scale our cluster and let's just check right here see if it's complete and as you can see we have three Kafka brokers in our cluster right now before we only have one but we have one problem and that problem is that all the partitions are still on broker zero as paul was talking about these partitions aren't automatically spread amongst our brokers so we need to do a little more work and basically to get around this issue as we talked about we can use cruise control and we can deploy cruise control in a similar fashion to how we deployed other components you know like Kafka the Kafka topics Kafka users against the operator by editing our Kafka resource so let's go ahead and do it here just at our Kafka resource we'll just put a simple cruise control field right here just empty now this will just basically you know be read by the cluster operator like all the other changes and it will deploy the cruise control app alongside our cluster and it'll also roll all our Kafka pods to run the cruise control metrics reporter along each broker as an agent to pick up those metrics that cruise control will use so let's go get the let's look at the pods to make sure that the cruise control has been deployed we can see cruise control is right there so that's been deployed the brokers have already been rolled they're up and running and they're ready to go so now we need a way of interacting with the REST API for the cruise control REST API and so doing that we use the way we do that is using the category balance resource that I was talking about in the slides so let's go ahead and create one of those like this you know applying a YAML file and this will basically this resource will act as the medium for cruise control take a closer look so here you know as I showed in the slides earlier we just have a Kafka rebalance resource the purpose of this resource is just for generating optimization proposals based on restrictions that we pass cruise control and executing partition rebalances based on those optimization proposals that are produced so here the goals restrictions we were talking about earlier like we have the CPU capacity goal which will make sure that the general proposal will keep CPU utilization for any broker in the cluster under a given threshold and so you know see we create our rebalance resource will be used by the cluster operator you know to interact with the cruise control recipe API to create an optimization proposal and once it's completed it'll pass it back to the rebalance resource where we can take a look and and you know approve it if it looks good to us so let's just take a closer look at the status section we can see that right now cruise control is creating a proposal for us remember the rebalance resource follows kind of a state machine so it's right now it's a proposal pending proposal cruise control is hard at work and now we can see cruise control is completed it's you know optimization proposal so because we're now in the proposal ready state so and it also provides some nice information as i show in the slides more detail information about that rebalance and if it looks good we can annotate that resource and that will tell cruise control to execute the partition rebalance based on the proposal that's passed back to us so we can see now that since we've done that the CAFCA rebalance resource is now in the rebalancing state so CAFCA cluster is currently rebalancing all the partitions across our brokers and once that's complete we'll be in the ready state so our cluster has been rebalanced so we're all good to go so the fort we passed our four states our cluster has been rebalanced now all we need to do is just check to make sure that our partitions aren't all on broker zero anymore and as we can see they're spread out now there's still some partitions on broker zero but we also have some on broker two and broker one as well so our cluster has been rebalanced i'm using cruise control and streamz integration and that's the end of the demo so what's next for cruise control integration with streamz we're looking at securing the cruise control API with authentication authorization right now it's only the API is only secured by Kubernetes network policies but you can get around that by using you know Kubernetes port forward or pod-execuing so this will secure that further intra broker debalancing as opposed to inter broker balancing so this will support balancing data not only between brokers but between disks of the same brokers and then also support we're looking to support cruise controls ability to change topic replication factors but that is the end of our presentation i hope you've enjoyed i'm now we'd like to open up for some questions that's all right so great kyle that that's just amazing and a lot of rebalancing magic there so this has been a pretty wonderful conversation and demonstration of the power of cruise control and it's also i'm really thrilled to see yet another end user donating an open source project it totally makes sense that that you know with the popularity of Apache Kafka that a tool like this is absolutely necessary and it's wonderful that it's come out of LinkedIn with huge huge numbers of Apache Kafka's to manage and we all know brokers do die so that must be really painful i'm wondering you're both redhatters have you seen a lot of adoption of cruise control at redhat customers have you been with stremzy deploying it or is this how new is this all to to the universe so paul do you want to take this one or shall i oh good good good so we've introduced this integration i believe last year and i don't know about paul but what i've heard it's our customer's favorite tool to use to to you know speed up their clusters but that's what i've been told i'm paul i don't know what you've heard but i'm not sure about customers we had this feature for tech preview for long time now it's a ga right yeah to be honest i guess that customers are just playing with this right now because the the integration works pretty well in stremzy but we want as kyle already mentioned we want to add more in order to have authorization authentication so the security side i i i don't think we have customers using this this in production but for sure we are customers trying this in order yeah to move to production as soon as possible awesome so um where does cruise control in the pantheon of of foundations and everything live is it in apache is it's in cnc of sandbox where is what is the plan for the cruise control or is it going to stay just in the linked in repo and be something that we link it out to i think it's well yeah it's still in the it's not part of the cnc f for anything or apache it's still kind of a standalone github repo i see kind of the popularity it's getting more popular as i see more people more companies are starting to contribute to cruise control which i'm assuming that's meaning you know it's getting more adoption and maybe once it gets more traction like that um maybe we'll see it move to an open source foundation yeah i think that that to me that would be once we have the integrations um the what that are in your roadmap for stremzy it would be almost a natural thing to to nudge them um towards um getting into a foundation and maybe more open governance and oversight because along with the security having that extra panache or whatever the extra seal of approval helps our customers adopt things as well so and other people's customers and folks in production but it's it is um you know you talked about paolo in the beginning getting out of the woods i mean this gets us a huge step forward in managing rebalancing all of these kafka workloads and and stuff so that it's pretty amazing what do you see is the timeline for some of these things that are on your roadmap i know that's a terrible question to ask engineers and other folks but is this something that's coming in with you know perhaps the next six months or so or is this sooner well i i i guess but kyle can be more precise than me that the authorization and the authentication is coming really soon because uh yeah actually it's kyle working uh on this kind of feature uh maybe the other stuff uh i don't want yet to to call me to have 10 in six months but yeah for sure the first one is coming maybe the the the topic uh replication factor could be later uh the last one should be the one about yeah rebalancing inside a kafka broker right kyle yeah the the as you said the api authentication authorization is coming that should be in the hopefully not this pending release that's coming out in a week of stringy but maybe the next release once we thoroughly test it um intro broker balancing is hopefully coming soon too maybe we'll have that in the next two releases uh the work there isn't too too hard the changing the topic replica replication factor is not trivial it's going to require a lot of work and a lot of testing um so like paulo yeah i totally agree it's going to take a little more time um to get that ready for customers and and people in general using stringy yeah so um uh we're at the end of our hour so i want to respect people's time and really thank you guys for um giving us a great tour de force on explaining kafka stringy cruise control rebalancing and all the theory and practice behind it so i really appreciate the update and um we'll definitely have you back for the next release and we definitely want to hear if you're a customer then you're playing with this or an end user that's playing with cruise control any feedback you have we'd love to hear it so um please keep in touch and kyle and paolo where is the best place for people to reach you guys in terms of community interactions well for sure we have a stream easy select channel on the cncf workspace there is a dedicated stream channel because trimzy is under the cncf foundation so this is the best channel for reaching us even yeah the user and devs meeting list in stringy and you can find a lot of information on stream is dot i o documentation and there is a really a lot of blog posts about stream easy uh even on twitter you can reach us so there are a lot of ways i guess that it's really simple to engage with us perfect all right guys well um thank you for everything today i'm going to let you all go and um we'll have this uploaded shortly um thanks to bobby kesler and chris short who are our producers and we'll talk to you again tomorrow with another talk on data science from audrey resnick so if you're listening join us tomorrow same time same place openshift tv um for openshift comments thanks guys