 Hello everyone. My name is Karan Singh, Senior Architect at Red Hat Storage Business Unit. The topic of the day is Red Hat AMQ Streams as known as Kafka. Running on top of OpenShift Container Platform and consuming Red Hat OpenShift Container Storage will cover some concepts followed by live demo. We will quickly go over some concepts of Apache Kafka and Red Hat AMQ Streams touching upon some Kafka use cases and then storage use cases for Kafka. At the end we will go over a demo where we are going to deploy Red Hat AMQ Streams using operators and then launch some sample Kafka producers and consumer applications and then in the end we will do some failure injection testing. Let us begin. Apache Kafka is an open source project initially developed by LinkedIn and then later it was contributed to Apache Foundation. Kafka underneath is a highly scalable distributed messaging system which is high performance and fault tolerant. Kafka does have a notion of producers and consumers where producer produces messages and events which are then ingested into Kafka on the other side. Consumer apps could consume the messages from Kafka topic and do the post processing. Kafka comes with a stream processing API which makes Kafka a good fit for real time streaming engine as well. Kafka could also connect with other tools using the connectors which for example like NoSQL, MongoDB, MySQL, S3 could connect to Kafka and then use it. Kafka comprises of various use cases so varying from audit logs, messaging, web activity tracking, click stream data all the data could directly ingest or dump into Kafka topic and then used by applications. Kafka is also a good fit for metrics, log aggregation and stream processing engines so lots of streaming metrics data, streaming logs data could land up in Kafka and then used by apps later as in when needed. Database is Kafka app like web apps could simply write data to Kafka which could later be ingested into database. This is pretty popular architecture these days. GPS data, real time mobile tracking and base coordinate sync up data could also come to Kafka. IoT is a good use case where soft devices and sensors could send the data to the application into a Kafka topic, Kafka clear and then later it could be moved to their respective person storage systems. Red Hat AIMQ Streams is a Red Hat product which is an enterprise distribution of Apache Kafka. So Red Hat AIMQ Streams aims to simplify the deployment of Kafka on top of OpenShift and this is all based on top of an open source project called it's Strimzy. Basically AIMQ Streams provide containerized, hardened and secure images for Apache Kafka and Zookeeper. It also provides operators for managing, deploying and maintaining the cluster. So operators like cluster operators, user operators for users and topic operator for Kafka topics. So this all comes bundled with AIMQ Streams. Hence making Kafka simple on top of OpenShift. How does storage plays in the world of Kafka? So storage plays a very vital role for Kafka because retention of messages, the rate of the message processing, it all depends on the storage type used underneath. Starting OpenShift container storage, which is based out of Cef, provides a fault tolerant, highly scalable storage system for Kafka. Here the Kafka brokers, all the Kafka brokers, basically the Kafka parts could request for PVC and person volume from OpenShift container storage. At the same time Zookeeper parts could also request for PVs from OpenShift container storage. So this layer provides persistency in Kafka. If you don't choose to use persistent layer like OCS, then the retention of the topic are ephemeral. So if a part destroys or if a part goes offline, your data is lost, then Kafka needs to do the replication and rebalancing of the data from the other part, which is not very convenient. So in the first place, use PVs from OCS backed and make Kafka kind of a high available. In this case, if any of the part goes down, Kubernetes will spawn up a new part and it will attach the same volume to the new Kafka part, which means the recovery is way faster compared to ephemeral storage. Kafka also comes with Kafka Connect, another side tool, which could move the messages from the Kafka persistent layer on to the object storage layer, like Ceph in this case Ceph or OpenShift container storage. Another type of storage which is under development in the upstream community is the tier storage where based on the retention period of the messages, the Kafka itself move the messages, ship the messages, like the older messages on to S3 in this case. And when needed, when application request for even older messages, Kafka could go and fetch the messages from S3 and serve it to the application. So which means it's a tiered storage concept in Kafka. So here's a fun fact. This is a slide I borrowed from PayPal. So PayPal is processing 400 billion messages a day with 50 Kafka clusters running using 300 plus topics. And overall, this system was consuming seven petabytes of storage capacity. And this data is not new. This is based on Kafka 1.1. Currently, we are on Kafka 2.3, which means the data is one year old and I'm very positive that the storage requirement for PayPal would have grown higher as we speak. So to all the sales rep out there, Kafka could be a serious consumer of storage, which means storage plays a vital role in Kafka, and it has to be treated nicely. So let's move on to demo number one where we're going to provision our Kafka and Zubriper cluster running on OpenShift 4.2 backed by OpenShift container storage 4.2. And then we're going to launch a sample Kafka producer and consumer app. So let's go. First create a project called AMQ Streams. And within this project, we will install the AMQ Streams operator using Operator Hub. We'll select streaming and messaging and AMQ Streams. Make sure that the project is AMQ Streams. You could install this globally across OpenShift platform, but for the sake of simplicity, right now I'm installing this within a single project. I will select my project from here and I will subscribe to this name space. So this should install my operator. Once the operator is up, I can go and watch my parts and the part is coming up. I can switch to my CLI, OCE Project Streams, and the container is running. OCE should tell me my deployment units and my parts and services if they are already. Okay, so my part is running. My operator is running. Next step is to install or set up a Kafka cluster. But before that, we will make sure the storage class is set to OpenShift container store. So get OCE, get storage class. It should tell us the storage class and the default storage class is self-RBD. Now we will deploy a Kafka and ZooKeeper cluster. By the time this is running, we'll go over the contents of this file. This is running. Meanwhile, let's go and talk a little bit about this configuration file, what we have in here. So this file is the basic. So I'm installing a Kafka cluster and I am assigning a persistent storage claim of 100 GB to my Kafka cluster. And I'm assigning 10 GB storage to my ZooKeeper parts. So as you can see this, Kafka cluster is coming up. The container is coming up. And it should take a few minutes. All right, so the Kafka and the ZooKeeper clusters are up. And we should be good to go with the next steps of this demo. We will verify the storage claims that Kafka and ZooKeeper has requested. As you can see this, we have three parts of Kafka and each of them has requested 100 GB of GB from OpenShift container storage. Similarly, 10 GB for each ZooKeeper cluster. So this is good. All right, so next we will create a Kafka topic. But before that let's get Kafka topic. So this is not for Kafka topics. So we don't have anything. So create a new Kafka topic from this. So this will create a topic called as My Topic in my cluster. It should be here in a moment. Now we have Kafka topic. The next step is to create a producer app which will write contents to this Kafka topic. So we will use OCLI file. And meanwhile this is running, we will go and look at the contents of this OC file real quick. Simple file, it's a Hello World producer application which will write to my Kafka topic 1 million messages. So there is a message count continuously to write 1 million messages. So this is up. OC get pause. We can also go to our CLI console and look for the messages. So you can see this Hello World producer is up and it is running. We can go and let's look at the logs of this. Switch my tab and I'll screen the logs of my Hello World producer application which is generating 1 million messages to my Kafka topic. So you can see this. Continuously it's generating Hello World messages to my Kafka topic. Now time to launch consumer app which should listen to my Kafka topic and start reading messages from the topic. So now my OC get pause. My consumer app should be up. The contents it's creating should be up and running at the moment. Once the content is up, it would be reading messages from the same topic. Alright, so the content is up. We now scale the logs of my Hello World consumer app. So as you can see this, there is a slight latency. However, you can see this. The first window is generating messages in the Kafka topic which is our consumer app. So the producer app, the second window we are continuously receiving the messages from the Kafka topic. Alright, so let's induce some failure into the system by destroying a Kafka part which is backed by OpenShift Container Storage. So there will be no glitch if we do that because it is backed by a personal storage layer. So change the shell. So this is the list of this continuous watch command of my existing cluster parts. I'm gonna delete the Kafka part like this. Kafka delete parts for deleted. We should see some changes here. So look at this. This is terminating. So Kafka cluster node one has gone. But at the same time, my consumer and producer apps, they are functional as they are. There is no outage in here. What Kubernetes will do is it will spawn up a new container for Kafka node one. And it will mount the same precision volume which was mapped to the previous container. And migrating it with a moving data. So look at this. Container is Kafka zero is now coming up 25 seconds. Should be here. Now it's running. My consumer and producer app didn't notice about it and they were functioning as they showed. So this was the end of the demo that we planned for. And I am done with this presentation.