 Hello, welcome to this DevNation live. My name is Rafael, and today our guest is Marius, and he will be talking about Kafka strings for event-driven microservices. Please post your question as you think of them, and we'll try to address them in real time. Some questions will be held for Marius at the end. So Marius, it's up to you now. Hey, Rafael. Can you hear me? Hello? Yeah, Marius. We can hear you. Fantastic. Thank you very much for the introduction, and I'm happy to be here and talk a little bit about Kafka strings and event-driven microservices. So let me start by sharing my screen, and let's go with the presentation. First, I'm gonna talk a little bit about what Kafka, like the features of Kafka Streams, and how they help event-driven microservices, and then we're gonna talk a little bit about how to run Kafka Streams applications on Kubernetes and OpenShift, and we're gonna see a quick demo about how that works. As Rafael said, please feel free to ask questions. We're here with Rafael and one other colleague of ours from the Streams team, Paulo Pacquero. He's also gonna be able to answer some of your questions as we go along. You have questions about Kafka, and we're gonna save some questions for the end. So first about me, my name is Marius Bogovici. I'm a principal specialist solution architect at Red Hat. I specialize in integration, messaging, and data streaming. I'm an open source contributor since 2008, and I've been very active in the Spring community ever since in particular, and also in the Javas community as well. So I take a deep interest in event-driven microservices and in Kafka, and I'm happy to share some thoughts here with you. First, the first question that we're gonna try to answer is, you know, event-driven microservices are a big topic. We're not gonna go and exhaust all of them to like really exhausted today. It's kind of, it's worth a talk on its own, but we can take a look at a few points that sort of help us understand where Kafka Streams got into play and what specific problems does it solve for event-driven microservices. So why do we build event-driven microservices in the first place? Well, event-driven architecture in general helps us reduce friction. So it really helps us build robust and resilient distributed architectures. And if you kind of think of microservices as distributed solutions, then the relationship becomes apparent. It helps us from a development process standpoint because generally speaking, event-driven architectures are very composable and that encourages agility and experimentation. And from a business standpoint, modeling systems with events is a very natural choice because the world around us, its health is really event-driven. So in that context, the specific context of microservices, event-driven architecture helps solving problems such as state propagation. And this is where we have solutions as change data capture for streaming changes from a database, event sourcing for propagating state between different systems, CQRS for separating the read and write operations and allowing to optimize the different types of stores across a microservice fleet. So those are, let's say, this is where event-driven architecture mostly works as a technical solution for problems that are created by the distributed nature of these types of architecture. But there's also event-driven, like there is also domain-driven design. There's also event-driven microservices fit very well because they allow us to model the world in by creating events as first-class citizens as a central part of your domain, right? So in both these cases, there are a number of challenges. The, one of the things that's kind of known about microservices is that coordinating state is hard. Like one of the basic tenants of microservice architectures is that it's, each, it is composed of a set of applications, each one manages its own state. And these applications interact with each other, right? They can do that in a request-reply fashion like synchronously, ensuring that a state, for example, a change is propagated across an entire set of components. And that is hard because it really required, like it works under the assumption that nothing will ever fail. It works under the assumption that all applications are available. So event-driven architecture in this case allows to decouple these interactions and allows to decouple the change state, the state changes, I said. And essentially allows us to avoid doing things like distributed transactions. So that's kind of, this is how you've been driven architecture, for example, solves this state coordination problems. At the same time, if you look at the way these types of applications work, you end up with these pipelines of applications interacting with message brokers, interacting with databases simultaneously, processing inbound and outbound events and essentially being coordinated and being grouped together into these large complex graphs. And that creates some challenges of its own. For example, a microservice has not only to, for example, to manage its own state, but it also has to do that in coordination with its communication with the broker. It also has to do that by making sure that all the events, all the events, for example, that all the inbound events have been acknowledged in conjunction with the state changes that it has been made. And also it has emitted all the events that are required. So in a way, you know, while this type of architecture is pretty common, there are solutions for this, but they add another level of complexity to the problem. So essentially this is where Kafka streams comes into play and basically builds on top of Kafka as a distributed messaging system. So if you're not familiar with Kafka, I'm just gonna do a very quick overview and two weeks from now at the next DevNation Live events, one of our colleagues, Schultz will present our, we'll talk more in more detail about Kafka and what we do with Kafka Red Hat. But in summary, Kafka acts as a distributed messaging system based on a distributed log approach. So essentially it adds more advanced, distributed public subscriber capabilities that mediate the interaction between a producer and consumer applications. As we saw in the diagram before, a lot of these event-driven microservices require to both coordinate events, require also required to do processing, and they're both emitters and consumers of events. They need to do that in coordination with their state management. So Kafka Streams comes in and proposes a solution where a client library is essentially embedded into the application itself. It comes as an alternative to, let's say to stream processing engines that sort of require its own, their own platform. And it actually allows you to embed stream processing features into regular job applications. The big change and the big advantage in the case of event-driven microservices is the approach that it takes to managing events and state. Everything is done using Kafka to Kafka semantics. So because it essentially works with a single, like with a single component, with a single just because it essentially interacts just with Kafka, it allows you to do this event-state coordination. It allows performing like the stateful processing support that the types of applications that event-driven microservices require. And it also allows to implement more advanced patterns such as transactions and exactly once interactions. Again, the approach here is that the, instead of creating topologies as part of the, like instead of creating sophisticated topologies that are deployed on a separate platform, by embedding the library into the application itself, these topologies can be created out of independent applications as microservices. It comes with its own high-level functional DSLs. So essentially every, like it maps on a, it maps over a set of inbound and outbound topics. And it allows to describe transformations as data flow from a topic to another. So it is a functional DSL which basically describes the transformations of the data streams as they occur. And also allows like more complex operations like grouping counting or in the case of tables, for example, operations such as joints. It also contains primitives for time windows which allows to perform windowed based aggregations. The main abstractions essentially reflect this topic-centric nature of Kafka streams. So one of the abstractions is the case stream which is essentially the flowing stream of data. And essentially it maps one to one to a topic. The other abstraction is the table abstraction which takes the ordered and durable nature of topics and applies a change log semantic to it. So essentially, as you know, a Kafka record essentially has a key and a value. This key table abstraction will read a stream of data, for example, a topic and will apply these changes into an internal key value map. So it's view of the world is essentially as a key value map that reads the changes that reads the records of a topic and applies them as updates on the map itself based on the record key. This table comes with its own set of complex operations, joints and aggregations. So this allows to create, for example, complex operations such as creating materialized views over ever-changing data streams. And one of the fundamental ideas in Kafka streams is the stream table duality. So given any stream, we can construct a table from it, understanding that the stream contains changes that are applied against the key. On the other hand, we can stream out the changes to a table as a stream of updates. So these kind of these operations, this like this, let's say, interplay between streams that contain the events that are flowing into the application and outside the application and the tables that contain the material state, the materialized state, the application observes are actually critical for understanding how Kafka streams works. And of course, as I said, aggregate operations usually need a finite time window and Kafka streams essentially contains sophisticated time windowing primitives for describing how the aggregate operations take place. For example, calculating the average of a given value, for example, over the last second, calculating the average of a given value over a sliding time window, for example, the last five seconds updated every two seconds and so on and so forth. So these are key for building Kafka. These abstractions are key for building Kafka streams applications and they are reflected in the high level in the functional DSO. And we'll see later on one example of such an application actually, this word count application at work. The other challenge that happens with these like sophisticated distributed topologies besides managing the event handling and states together and cohesively is because these applications are, like because this model encourages create like grouping together functionality into distinct applications, we can end up in, this is kind of a design pattern of create, we can end up with complex topologies where applications aren't like, we're not only with a long, not only with a large number of applications, but actually we have requirements of scaling. So running this type of solution on your own makes for a very, very complicated approach. This is what Kubernetes and OpenShift, which is actually the enterprise Kubernetes platform help by managing applications by providing the resilience and reliability that applications run by making sure that applications are restarted in case of failure. Also it provides inbuilt resource management. So scaling, so creating new instances and scaling the number of instances of a given node, for example, that's what the platform provides. It provides monitoring and failure, right? And as I said, more advanced features in OpenShift or example, routing and load balancing, rolling like continuous integration capabilities. And these are critical for removing the complexity of running the topology from the application developer and handing it over to the platform. And again, this kind of mindset, the idea that these complex topologies are not only restricted to applications, but they actually apply to the Kafka clusters themselves, which are fairly complex distributed systems. So it follows that we can run and we can think of running our entire solution on top of Kubernetes. So what are the ingredients for that? First, how do we run Kafka on Kubernetes? Like at Red Hat, we have a colleague from the StreamC team. We develop an open source project called StreamC and which is focused on running Apache Kafka and Kubernetes on OpenShift. And as a solution is part of our, so StreamC is let's say the upstream component, it's also available as a product, the part of Red Hat and Q. And its goal is to essentially ensure that to ensure the management of both clusters and like both Kafka clusters and topics and kind of take that away from, like take away from the complexity of running these individually. The idea is that you can, for example, containerize your Kafka, you've got Kafka brokers, you can containerize your ZooKeeper instances and running them on their own and coordinating them. But StreamC takes a different approach. It basically takes the operator, it uses the operator pattern to oversee a set of configurations that and declaratively update the state of the cluster, start to install new brokers as necessary, start ZooKeeper nodes or size the cluster as necessary. You will not operate the brokers yourself, you will actually update the declarative configuration and it's the responsibility of the cluster controller to manage that configuration for it. And of course, the same thing, the same logic applies to controllers. So the configuration of a, the configuration of like your topic configurations instead of kind of going and, you know, opening the administrative console and administrating topics directly into Kafka, you will declaratively install like resources, you will create a dynamically, you will create a custom resources and in OpenShift and in Kubernetes. And the topic controller will be the one that will take control and create these topics for it. Conversely, it will make those available as a Kubernetes native resources. So essentially the management of your Kafka cluster is reduced to managing your OpenShift cluster. Right. So in this topology, essentially the topology of running Kafka Streams applications in Kubernetes consists of a containerized Kafka Streams applications that they're deployed using various mechanisms like source to image, for example, which is one feature that OpenShift supports for creating image streams and deploying these applications that interact with a Kafka cluster provisioned on OpenShift as well. Everything essentially runs on the platform and you think everything is managed automatically and the responsibility of the user essentially is reduced to managing the resources that control the size and the topics in the Kafka cluster and deploying applications as individual, as individual as containerized apps. Both the events and the changelog in the applications themselves are essentially coordinated with a Kafka cluster using the Kafka Streams API. So let me walk you through a quick example of Kafka Streams running on OpenShift. So I have installed in my OpenShift web console. I have the, I already have installed Streamsie. I have the operators running and as you can see, I have my Kafka cluster already installed in my current project. So what I would do next, I have three different applications that will interact with this cluster to create, let's say, a data stream. The first one is an application that will produce messages that the Kafka Streams application will process. It's essentially, it is a RESTful endpoint written using camel that will output like every line, submit every line posted to the service to this Streams plain text input topic. Then my Kafka Streams application will take this data from this topic and will apply a series of transformations. So let's take a quick look at how the application works. First, it will transform this by splitting this into separate words. Then it will group, it will also convert these words to lowercase. So that when we're counting words, we're actually not affected by the case of the word. Then it will apply a group by transformation and then it will count the elements of the different groups. Now, as you notice, the result of this, like the result of this, this is essentially a continuous transformation. So with a certain frequency, which is controlled by this commit interval configuration, we will output and stream the changes in this table. So essentially we will stream what changed since the latest commit as a pair of words and word counts. So that pair of words and word counts. That pair of words and word counts is actually received by another application, which is another camel application in this case, which reads it from the output topic of the Kafka Streams application. So essentially we will log the messages that we receive and we will also retain them in an internal map so we can display them using a RESTful endpoint. So let's start first by deploying these applications. I'm gonna deploy this, like I'm gonna deploy the different applications using the Maven Fabricate Plugin. So the Maven Fabricate Plugin essentially takes the Uber jar, let's go through one of these examples. It takes the Uber jar created by the Spring Boot Maven Plugin in this case. And it sets up an image stream in OpenShift that builds the container image. It also will set up a running application and a route. So we're gonna do this for all the three components of the application. We'll wait a little bit. Once as the image streams are set up and the applications are started, as you see the images and the builds start being created. And this is quite a powerful mechanism for creating out of your Maven project a dockerized image for your application by delegating the creation of these images and the process of containerization and installing those images as part of the internal docker repository to OpenShift. Now all my builds are complete. So I should see the three applications starting and they do. They all started. Let's take a quick look at the Kafka Streams application to look at the pod. It just starts. If you take a look at the, for example, at this application, the configuration option here, and then using Spring Boot and the Spring Kafka support to actually kind of push the environment variables and config map values into the application itself, it refers to this URL. This is a URL of my Kafka cluster. This is the bootstrap URL of the cluster, my cluster managed by Streams. So as you see, all the addresses are internal and everything is contained into this project. So the applications now are now active. As I say, let's take a quick look at the Kafka Streams application. I can see that the Kafka Streams process has started by this current active tasks. These are the tasks that manage the processing of the data from the inbound topic to the onbound topic. And let's start monitoring the demo and word count rest application. So as you see, there is a lot of stuff being logged right now because this is the current state of the topic that it monitors, which I remind you is Streams word count output. Of course, we can also check the current values in the map. So I can just do a curl. And I can see that it has counted the number of words. This is basically the result of trials that I made before we started this presentation. So let's start posting some data, right? And let me put this separate. As I'm posting this data to the inbound microservice, I can see a sequence of the updates being sent, which means that the Kafka Streams application has counted the words, the table has been updated, and the downstream service from that has received these values. And of course I can go and now check the new set of values and I can see the new words basically being like accounts for definition, for example, being incremented from three to four. And I can try a different value, just do it again. And of course I can see new words basically coming in like again, and I see the value for definition, for example, being updated to five, which is again the result of the update into the internal table of the Kafka Stream, the Kafka Streams application. And of course I can see a set of updates coming in here which contain like the new counts for the new added words. So I'm gonna stop now and this was the demo and I'm gonna turn it over to Rafael and asking if there are any questions. Some, a couple of questions that you might be able to answer. We are a little bit over the time, but I think you can get some of them in the chat box, I'm helping you by sorting them. So please feel free to answer. Perfect, give me a second. I'm having some trouble getting to the chat box. Maybe we can, maybe you can beat them to me. Yeah, yeah, I can read to you. So one question, do you suggest ICO and the MIX to meet application resilience requirements? I think ICO serves a different purpose here. First and foremost, ICO right now, so what we're describing is a process through which applications interact with each other through Kafka. ICO is kind of right now, ICO is based on a request reply pattern and mostly handles HTTP calls. There is a project going on and there is a desire to extend ICO to support Kafka. In this case, it wouldn't necessarily add the resiliency because the resiliency of the interaction right now is handled by the resiliency of the Kafka broker and the Kafka cluster itself, but it would help with observability and discoverability and security, for example, by managing the features such as mutual TLS at the microservice level. So to summarize the question, I wouldn't necessarily say that ICO would add much for resiliency, but it would help with other aspects once that integration is available. Okay, and do you have any throughput comparison? I assume that this is referring to the throughput of Kafka, so and maybe this would warrant a clarifying question. I think the throughput like Kafka streams itself is natively based on Kafka producers and consumers. So the performance, let's say, of the system is comparable. There are some, for example, documented like for features, events, features such as transactions, for example, or exactly once processing that might decrease the throughput because of the additional steps that are taken to ensure those features, but generally speaking, I think they're within comparable levels. If it's the question is about comparison within between Kafka and other systems, then surely there is an entire literature about that, so. Okay, perfect. So we have that question. I'm happy to share some of them. Yes, since we are out of time, I would like to guess that people add those questions directly to you, do I have any suggestion like going on Twitter and mention you or do you prefer by mail or what's your suggestion? Please, I'm happy, like feel free to ask those questions on Twitter. I think my front slide, can we, let me see how we can share my, let me share my screen again for a second. Okay. This is, okay. This is like, this is my Twitter contacts, so feel free to ping me there. And what I would like to suggest is that we take these questions and do a follow-up for this once we publish the recording and take some of these questions and answer them offline. And I'm gonna do the due diligence to answer everyone online because we're at the end of the talk. Perfect. So once more, thank you so much, Marius for this great presentation. And also thank you for all the people that were here live and that are watching this record. And please sign up to developers.redhat.com so you can be aware of our next definition live events. Thank you so much and have a nice day. Thank you.