 My name is Ned Boverich and I will be sharing with you how we run and manage Kafka clusters in Kubernetes at Amadeus. And Amadeus is a provider of IT services for travel industry. So for those of you who came by plane and staying in a hotel, you might be using our services without knowing. We're celebrating 30 years this year which makes us much younger than our biggest competitor which is a taxing company. We have been running Kubernetes since its inception, so since three years which is basically 30 years equivalent in Kubernetes years. On our own premises and in public clouds, I'm solution architect at Amadeus. I'm helping our business units, I develop new platforms or migrate, in 30 years we have quite a number of applications. And today's migrations are almost always towards Kubernetes. So we use Kafka. We use Kafka outside Kubernetes. We use Kafka next to Kubernetes. We actually did it since we started using it for logs and for events collection. We have been installing it using puppet which is an experience we don't like to repeat. And recently we had this idea to start using actually Kafka for more functional stuff and we are building a streaming platform where we have a number of events, operational events from the airlines or things like bookings or boardings which go into the platform. There is a whole number of microservices which process this in the pipeline and then some actions executed at the end. And we use Kafka as an underlying messaging infrastructure. So it's supposed to be advanced session. So I guess you all know Kafka, but to be on the same page always Kafka. It's a streaming platform. You have a cluster of servers and servers are called brokers storing streams of records and topics. Topics are split into partitions which are spread over all brokers which allows for horizontal scalability. You can add new brokers and new partitions and accept more traffic. You can replicate those partitions to have a higher ability if a broker dies you have a backup on another. And you have producers and consumers as clients of this platform. So can we run it in Kubernetes? Well when you run a normal replica set or you do deployment in Kubernetes you get something like a pod which is called with a randomish extension at the end which is not fits well with Kafka. In Kafka, in Kafka cluster each broker has its own unique identity which is both an ID but it also has to have its own unique network address so that brokers can talk between themselves and also that clients can talk with brokers. They also need persistence to store this partitioned log files. In addition you need another thing next to Kafka cluster, you need a ZooKeeper cluster because Kafka stores a lot of metadata inside ZooKeeper which is basically exactly the same thing. You need another cluster with identity and persistence. And luckily we have now stateful sets in Kubernetes. Only when we started those were called Petsets. They evolved and they will soon be exiting better. So what is a stateful set? Unlike traditional pods if we can say something three years old traditional is stateful set pods have stable pod identity so they will be called pod zero, pod one, pod two. You also need to create a headless service which is a kind of subdomain within your namespace so your pod address would be pod zero dot subdomain if it's the name of the service dot namespace. They provide stable storage, they provide ordered startup and shutdown so they will start per zero, then one, then two and shut down in reverse and since recently there is rolling updates. So to run Kafka and ZooKeeper we need two stateful sets and actually when we run Kafka and ZooKeeper deployments we always run one Kafka and one associated ZooKeeper. For those who operate Kafka you know you can share ZooKeeper across several clusters. It's not what we're doing in Kubernetes we're deploying one to one. And in addition there is a discovery service so unlike headless service which doesn't have a cluster IP the discovery service has a cluster IP and it actually allows clients to just say okay I want to connect to Kafka, you bootstrap them, you don't have to tell them go to Kafka zero, you tell them go to Kafka and it will fall on one of the brokers and then learn about full cluster. And of course if you want to deploy things in Kubernetes first thing that you need is containers and you need your descriptors and then you need probably things like I want to facilitate this and to replicate and to do several times deployment so you need things like charts. So there's a number of projects out there on Github I think most are inspired by Iolians Kubernetes Kafka those in bold are the ones which are inspired by how we operate inside the medius. You have a chart and you have the operator over there. So I'll be trying to do a small demo so I hope the demo gods are on my side today it worked on plain coming over here. The first thing I will do I will deploy to a producer and the consumer and of course we'll see because I haven't deployed Kafka cluster they will probably fail rather. So they're failing as there's no Kafka there. There's nothing in the cluster. So let's delete those two and now deploy cluster. At the medius we are actually running OpenShift as a Kubernetes distribution so we are using OpenShift templates to deploy. Here I'll be using Helm charts. Helm is a package manager for Kubernetes and it's a great way to reproduce your deployments or to have multiple deployments if you want in an environment. So I will here deploy my Kafka cluster. It should take just a few seconds and we'll see how we'll deploy all the elements necessary. So several stateful sets and services that are needed. So let's have a look. First we have our staple sets up there. Two of them Kafka and Zookeeper. I requested three of each and we have our services that was mentioning just before. We have a headless services with no IP no cluster IP and we have cluster IP services for the discovery. There are a few other things that are deployed here but I will talk about those later. So while it spins them up and just share a few of the practices we use actually in a medius when deploying Kafka. Kafka is pretty much disk IO and network performance of the Kafka disk IO and network base. So we actually want to land Kafka brokers on the instances which have good performance disquies. So basically it means SSD and we use node selectors for this. All the brokers have node selectors that want to deploy them on the nodes which have a label disk fast or something similar. But we also want to do something is that we don't want that all our Kafka brokers land on the same node with this label because if we lose that node then we lose all of the clusters. So we are using anti affinity which is featuring Kubernetes which allows us to tell take this a pod which have this specific label and spread them across some topology key and here we are using host name key. So we are saying basically to Kubernetes when you deploy these Kafka brokers please make them across different machines. We are using preferred you can also enforce it you can say it has to be on different machines. Another thing which is necessary for Kafka is persistent storage. When you're using stateful set you can use volume claim templates where actually you specify a kind of volume claim you want and each pod would get exactly the same volume claim type. So you have six pods you will have six different claims. And for those of you who are reading the slides instead of listening of me it's written things there are completely different because that's a common waste of when you're running Kafka you want to keep your logs you want to get persistent volumes you get it's provisioned it's attached to your pod it's stored your pod dies you will get it back again. In our particular case we are building a streaming platform which has pretty strict SLAs from the moment that an event comes inside it should be all the action should be taken within a few seconds and most a few minutes. So we can keep the amount of data fairly limited on the brokers and we want this high performance. So we could use host path but that's not good security wise. So what we actually do is we for each of the pod in a pod set we simply say okay use empty volume it would be a local disk it would be an SSD it would be fairly fast. If the container crashes we get actually the same empty there and also fine if the node crashes what we lose it but we are running Kafka and one of the selling points of Kafka is this replication you have a copy of your partitions replicated so when a pod is spun up on a different node eventually it will get up to date in sync with the current leaders and it will be able to serve it. Of course to be able to do it you have to have enough brokers and you have enough replicas so if you have five-bolt broker cluster and two replicas well you can afford to lose one of the brokers. Well if you have two replicas you can only afford to lose one brokers if you lose two you might be in trouble. What's coming soon in Kubernetes it's actually in alpha already is local persistent volumes which would be volumes which are on a local machine and the pod would then be only scheduled on this particular machine it's something that will be looking into future in future honestly currently behavior with the empty there was sufficient for our use cases. You have to monitor what's going on you won't have to see. There are several approaches to monitor Kafka you can see those also in GitHub you might use Kafka scripts what we actually do in our videos we use TCP socket we actually will open the sockets on Kafka brokers because that's what's telling us it's running it's accepting connections and we use Prometheus and JMX. We are packaging ourselves containers so basically we're deriving from what has been done in fabricate projects which for everything which is JVM automatically exposes Prometheus and JMX entry points so we can have a nice dashboards and if the operators needs to do something they can go directly and connect to the pod and look into the JMX. Let's all diving into operators let's have continue with our demo part so let's deploy again our consumer and producer let's see what happens there oh it's not working so well actually it's connected it's no longer in a little bit of a jala exception we see there it's connected but we are running in a multi-tenant environment so we have microservices which are published there so we have dozen of teams we're publishing those independently and we want actually to control who connects to the which cluster we don't want it okay you go there you just that Kafka 1992 and you're connected to the cluster and you start publishing we want to use to identify our clients and we are actually before I have installed as a process we create a separate secrets and there are secrets published for each of the Kafka clusters which allow clients to connect so there's a jazz file inside it so let's do it let's deploy the secured version has to spin up eventually it will hopefully yes it's there and it's not running it it's still failing but it's a different error it's failing with unknown topic because if you want to run things with Kafka you need to create topics and there are two ways basically how you do it you can do it by saying okay anyone can create the topics which basically leads to Celtic thing there you have hundreds of topics or you may use Kafka scripting Kafka scripts to do it so either there is a person who is typing or you have Ansible or whatever you use to create those topics which brings us to the operators so what are the operators so it's a pattern of transposing the domain knowledge of SRE operations or releasing things into executable code to automate behavior based on some kind of descriptors you describe what you want to have and the tooling will do it for you and I'm a there's actually level operators we use couple of those there is open source prometheus so it is there is the blue ones are actually the ones that we have written and some are open sourced already workflow and some will be like radius cluster and one of the first things that operators do is they provision clusters that's what prometheus for example does or that's what that's where our radius cluster operator does among other things and see I can actually fairly easy provision clusters using all hand charts or open ship templates or apps with sort today at keynotes and once the platform is up and once Kafka is running for us it stays in place it will be there for fairly long amount of time we can scale it up if there's a problem and that's usually what we need scale up scaling down and evacuation if you have to do upgrades of the nodes and upgrades are tricky some of these things work because of the Kafka replications talk about at the end I have this question of topics as we're deploying dozens of microservices we need dozens and even more of topics being created and for us we want to have the topics present in target environments when we deploy microservices so we can start immediately running we want to delete them if microservice is not there because particularly they can be rearranged and for different customers want to have the same behavior in environments in development in QA in production but also cross different production clusters as we may have clusters running on our premises or in the public cloud we want to be able to react on things like okay this node is no longer having disk space let's reduce redemption time for Kafka and we actually want to deliver all this as a code so when developers actually finish their coding they do the pull request there's a container bill we actually from their description of their project generate deployment YAML file and generated this descriptor of the topic which then goes in Kubernetes and then there is a process in Kubernetes Kafka operator which looks into it and applies it so how does look a topic as I go it's a config map for us it might be soon customer resource it's a config map which basically says I want a topic like this name which is the name of the config map where the partition count with the replication factor like this and maybe someone knew what how to configure more into details the properties and whenever this config map is created in Kubernetes the operator that's managing a cluster will create a topic for it and whenever a config map is deleted it will delete it and it's actually exactly the same behavior as service catalog provision and provision behavior were intentionally mapped it because we are going towards offering internally IT services as a service catalog so let's do it let's create to this config map so there and let's have a look at what's going on here so we'll have long list here oh and it's working so as soon as the config map was there topic was created and publisher started working in the consumer started consuming can just show you what happened inside the operator so inside the operator there there's a log which basically says okay I've seen that you have the config map and I've created a topic for this config map so that's fine that allows us to manage all the topics but I want actually to go a step further there I will be conscious about security we want to control the access to the topics we don't want that if a developer hard coded the topic name in the code that suddenly the microservice which is deployed starts publishing to a topic which shouldn't be there so we are actually using access control where we have each deployment comes with a notation saying okay listen I'm consuming this topic and I'm publishing to this topic it can be multiple actually and what what the operator does it monitors this behave this deployment and will actually apply what actually choose first one random user assign it to this deployment create a secret and assign the rights into Kafka to use it so let's do this part of the demo deploying here the ACL and if we look at the operator logs we would immediately see that operator has seen that there's a new deployment there called Kafka producer and has assigned a user for it and the same thing for the Kafka consumer lower and assigned a user for it and if you look at the secrets actually see that we now have credentials created for our producer and somewhere there should be like here should be credentials created for all for the consumers and those credentials contains users which in this particular case for consumer it can only read from one topic and producer can only publish to one topic previous use previous case where we the credentials it was it could publish to anything okay a little bit of a time so it's a thing that we wanted to build since the inception inside Kafka operator during our work on this I think there were like three or four changes in Kafka that had to do upgrades and basically each of them had a little bit different scenario which is like not really looking like something that's easily can be easily automatized so for example if they change into broker protocol you might need first when you upgrade to use the old protocol so you roll out one upgrade then you change do the configuration change they roll the second upgrade but maybe the Kafka change storage format hopefully now they are one zero so those things will happen less they change storage format so you might have to first to update consumers then go on the servers I had this idea it's like don't upgrade but instead recreate cluster so basically it means we have one cluster running the old version we create a new cluster with a new version they're running next to each other we can publish actually to both both clusters and then we switch it's kind of blue-green deployment it comes with its own set of problems which is actually wouldn't suggest that anymore we'll more looking into automating that in in the future finding a way how we can do this the first step simply and performance as I said the Kafka performance is basically dominated by disk IO that's what we have experienced and having a good disk we are helping for SSD but having a good disk even if we have network storage maybe it will work if a good network storage then the second is by network and it's almost never by CPU or by memory will work fairly low even for throughputs like hundred K messages per second we have issues there's a couple of things which appear strange when doing tests things of some sometimes Kafka brokers and a broker and a zookeeper land on the same node so it's actually reduced quite a lot network through there because they are talking to themselves on the same node sometimes like actually when clients land on the same node so you might see that you have like 20 instances of a microservice pod and then two of them are having this super high performance and all the others are behind the same so they landed actually on the same node as Kafka brokers and they're talking between themselves things to be ready in a cloud environment that's not everything behaves always the same and I think that would be it for representation if you have any questions can you repeat is your Kafka clusters zone away or how are you handling each okay so we are our our clusters are a spread Kubernetes clusters are spread across our across our our infrastructure and we basically rely on the fact that they are they're not there or nowhere we rely simply on fact that they will be split across different zones in our data center but not we don't make them aware of the zone they're not there's nothing like rackware or things like that we have our we operate our own data center and we would deploy to water clouds but proper on data center mm-hmm okay you can say and I repeat as I okay the question was do we have any constraints on number persistent volume we create per node as it said when we run Kafka we are running them with empty there so we don't create persistent volumes we are not using much persistent volumes at the moment we use for some of the monitoring tool we used it for the Kafka and but they hadn't put in place any specific constraints there okay the question was how do we handle a Kafka operator upgrades it's Kafka operators completely stateless so it's actually upgraded out of the band it can be updated in any moment in the time and actually all the updates it does it does is by Delta it's like Kubernetes it's checks what is the Kafka cluster what we're having Kubernetes and applies the Delta yes yes it's watching changes on the coffee the question was it does Kafka operator listens to the changes on a coffee to access the questions do provide the way to get to Kafka from outside cluster meaning clients are outside of the class no if someone has this solution for that we were very interested in okay the question was did we write operator with a Java client for the Kubernetes with the Java client for the Kubernetes and Java client for the new Java client for the Kafka admin client question here can you repeat or confluent suggested to run Kafka cluster on bare model for performance reason what do you think about that okay our approach is that we tend to run as much as possible things in VMs and then let's run it on Kubernetes maybe we will be running Kubernetes on bare metal but currently we are running on the VMs we had the same exactly the same suggestion from confluent to run on bare metal but we went for VMs I think there's an error yeah it's needed for a state the question was why is this headless service is needed it's needed for a stateful set it's by design that's how stateful set behaves you need to provide the headless service and then the pods will be in with the DNS name which is pod name state headless service and I think I had one question here our in this platform our typical cluster would be five brokers we have bigger ones but I'm not sure I can share this information if you can either come or because I the order of deployment is that the two keeper has to come up and running before you deploy yeah thanks we don't enforce order of deployment we actually I don't think I put get pod let's do just that CTL get actually seeing that the Kafka zero restarted once I was restarting because it's waiting was it there's no zookeeper and it crashed and we started again you're asking the Kafka to crash and recover so until the zookeeper is up and running can you just speak into the mic please you put the link up of your github operator yeah do you have any other custom code that used to run it or is that kind of the everything that is needed now it's not what we're on inside house and that's your question are you able to describe any of the the customizations that you had to do we have few customizations around security mostly which are inside it's pretty much a similar thing also we are less up-to-date with the recent version of Kubernetes and open-shift inside house compared to this one so you don't see any restrictions on running the published version and so that is the well actually we are in talks with some open source players that to make this really open source well if doesn't get anything we will be open sourcing it from a material side but and if there is to be a community we think there should be someone who's more into this kind wonderful let's talk sorry can you just repeat what kind of storage do you use when you run it on public cloud so we don't run I said we don't run this with persistence volumes at the public cloud we use instances we have which have us is these even on public cloud yes can you describe your general strategy for scaling down no scaling down general strategy to scaling down would be a lot of manual work by moving partitions and replicas and then scaling down and one of the things is that usually such this may be linked to usually most of the problems that we experienced were due to the humans so when the humans scaled it down without taking steps with they have to do before so it's what we really great if that can be automatized questions well thank you very much I hope you have a nice