 Thank you. So my name is Dave Klein. I'm a developer advocate with Confluent, and Confluent is a company that was founded by the co-creators of Apache Kafka. And so we're going to be talking about Apache Kafka today and how to use that with Python to build event-driven microservices. If this thing all works. There we go. Okay. So we'll start off by talking about what do we mean by event-driven in the sense of microservices or event-driven architectures in general. Well, I don't want to cover all that, so there's a guy already did, Martin Fowler, who's brilliant. He's got a great blog post and a corresponding conference talk that's recorded that you can check out. And he goes over four patterns that he's identified that you can see that are commonly associated with event-driven architectures. And I will mention them briefly. Wrong button, sorry. The four that he came up with were event-driven notification, which is very simply just an event happens, is posted, and something else consumes that event and says, okay, I have to do something now. It's like a trigger, basically. And then that responding application has to go then and get whatever information it needs, usually from a database to get to do what it needs to do. The next one is event carries state transfer. And this is Martin's own term, kind of poking a little fun at rest. But the idea here is that event is raised and the information that corresponds to that event goes along with it as a payload. And then the consuming application receives that event, has all the information it needs, theoretically, to do the work it needs to do and doesn't need to call back to the calling application. Then he goes into event sourcing, which is a pretty big deal. That's where basically all of your state comes in as events. And you can rebuild your state at any given time from your event store. And then CQRS, which is Command Query Responsibility Segregation, which is on the fringes of events stuff, and that has to do with having two different models, a read model and a write model, so you can optimize each. We normally try to optimize everything to do both, but this idea is that you have one model, one system for commands, which is things that are going to change your state, and another one for querying that. And that's another really big topic that I'm going to talk about today. Today I want to focus on the second one, event carries state transfer, because that's the one I think is very valuable for building microservices. Yeah, so, but we're going to be doing this in the context of Kafka. And since this is a Python conference, I figure you all know Python, probably better than I do. I won't be working for less than a year now. After many more decades, and I'd like to mention you doing Java, I have to say that working with Python has been a real breath of fresh air. But since you may not be familiar with Kafka, I'm going to give you a little bit of an intro to what Kafka is and how it works. So at the core of Kafka is the event. Kafka is an event streaming platform. And so an event is a logical construct, but what it contains is basically notification and state, like we mentioned earlier, right? So notification tells us something happened, like an order was placed, and the state is the information about that order. And that's the information that we would need to do something with that. Or temperature reading is another one. Now, the interesting thing about these are that either of these components have value in and of themselves, right? For example, the temperature reading, we might want to just have like a heartbeat where every time the temperature is red, it pings some system say, okay, my temperature sensors are all still working. We don't care what the temperature is, we just want to know that they're working. So the event is all we need. The notification is all we need. Or we might be logging temperature over time, and we may not need to know each time it's red, but we want to have an aggregation of all of them. So these things have value on their own, but together, they're even more valuable. So this is what an event might look like in Kafka. Well, actually, in Kafka, it's just bytes, which is one of the nice things about Kafka. It's a very efficient binary format for transferring the data. But in our applications, it would normally be something like this, like a JSON object or perhaps some serialization format like Avro or protobuf. There are two main components to an event in Kafka. There are other things like headers and timestamps that you can add, but the main components are the key and the value. Now, the key is generally a simple data type like a string or a long or perhaps a UID. And the value is where you have your payload. So value would be some kind of an object that's going to have fields. And again, it can be in any kind of format. It has to be serialized to bytes to send it to Kafka and then deserialize on the other end. So anything that could be serialized to and from bytes, you can use for Kafka events. Now, these events are written to what we call a topic. And a topic, it's not a cue. Some people do think of Kafka as a message cue. It's not that. It's a log. So it's an append-only, immutable, ordered log of events and durable as well. The events stay on the log. When they're posted to the log, they stay there as long as we want them there. We set limits either by time, seven days is the default, I believe, but you can go months, years, or permanently, or a day or a second, whatever you want to do. You can set the time limit or you can set size on disk. But the key thing to remember is that the events stay on the log as long as we want them to be there. So they're not removed when they're consumed. And that number you see at the bottom, that's what we refer to as an offset. So every event lands in a topic and has a specific offset. And that offset is associated with that event permanently, as long as that event's in the system. As it gets older, if you have thresholds set, it'll drop off after a while, but that number will not be reused, the number just keeps on growing. And I also want to point out, I think we'll see this again shortly, but I'm referring to showing a log here, a topic of the single log. Most topics about multiple logs are called partitions because Kafka is a distributed system. So by having multiple partitions in a topic, you can spread those partitions across different broker nodes in a Kafka cluster. And it allows you to scale horizontally quite well. All right, so the thing that writes the events to the log is called a producer. And a producer is part of the client library. The client library that comes with Kafka is the JVM library, since it is written in Java and Scala. But there are two really good libraries for Python, as well as libraries for Golang, .NET, Node.js, different languages out there, even REST, I think they're going now. But all those will have these same components in. They'll all have a producer. And the producer is just a class that you're going to include in your application that's going to do the writing of events to the topic. And it just writes them from the beginning, works on where it keeps on depending on to the end. It's immutable. You can't change these events. It's kind of like real life, right? You can't change something already happened. You can add a new event that corrects what happened, but you can't change what happened. And then it comes along the consumer. Now the consumer is also part of the client library. And this is the thing that will be reading the events from the topic and doing whatever we want with them. That's our processing. So it's important to note that the producer and consumer are completely decoupled. They don't know anything about each other, right? The producer doesn't know who or if anything is consuming the events and the consumer doesn't know where those events came from. It just knows they're landed in the topic that it's listening to. And so the consumer will read and process the events. I don't even like to use the word consume because it implies removing them, but it doesn't. They stay there. The consumer will read the events and process them. And it will periodically record the offset of the most recently completely processed event, right? And that we call the committed offset. And then that offset value, that committed offset is used in case the consumer goes down. If the consumer does go down, the producer will keep on going. It isn't affected by that. So you don't have any cascading failures or things like that. And then when the consumer comes back up, it will use that committed offset to say where it left off and it will pick up at the very next event and keep on going forward. It might do some catching up, but it won't skip any events. Two key things in Kafka that are really important. If you configure it appropriately, which is the defaults, it's very difficult to either skip data in Kafka or lose data. There are ways you can do it if you want to, but it's generally pretty hard to do. Kafka's going to be very reliable and very durable. Okay, so as I mentioned, the producer and consumer are decoupled from each other. They don't know about the existence of each other at all. But they do need to agree on the data that they're going to be working with. The producer needs to be able to serialize the events to bytes to send them to Kafka. The consumer needs to be able to take those events and deserialize them to do something useful in your application. And the best way to do that is with schemas. You could just do it all hard-coded, but that's not recommended. So usually people use schemas. And the schemas that Kafka supports out of the box are Avro, Protobuf, or JSON schema. And so if you're using a schema, then it's really recommended to use a schema registry, which is not part of Apache Kafka itself, but it is a free tool from Confluent. And if you configure your producer to use the schema registry, which is simply just adding in the address of it, because schema registry is a separate application that you would run. Or if you're using a managed service, it would probably be hosted there for you. You have a Confluent Cloud. So you just configure with the URL of your schema registry and any security credentials you need, and then it'll automatically start using it. And what will happen then is that the producer will, before it sends an event with a certain schema, it will send that schema to the schema registry. The schema registry will store it and return an ID. The producer will stick that ID at the beginning of the event. It's the first five bytes are used, the first bytes are the signals that there is an ID and the next four bytes are the ID. So that ID will be attached to every event with that type of schema from then on until the schema changes. In which case the schema changes, the producer will send an update. Really? Already? Okay. I'm going to move a little quicker. And then the consumer will see the ID and they'll call the schema registry, hey I've got a schema, give me the schema for this ID, it'll get it and will use it to deserialize it. Oh, I keep hitting that wrong button. Alright real quickly I'm going to go on this is consumer groups. Consumer groups are a really cool tool in Kafka that allow you to have multiple instances of your client applications. This is good for scale. So we can have two instances here right now. They have four partitions in our topic and the partitions are the unit of scale in Kafka. So if you want to scale really big horizontally you're going to have to have lots of partitions which is fine to do. Some applications have thousands. This one has four. So we have two going to each of these consumer instances. If we increase that to four they'll each get one. The balancing of those is done automatically and using the committed offsets it makes sure that nothing is skipped or nothing is lost. And if we scale back because things are slowed down it rebalances again automatically for us. Okay so moving on from that. I'm using a Kafka with it with Python. It said there's two libraries out there, two packages that you can use. The first one is Kafka Python. It's a community-supported library and it includes the following main classes and other stuff in it too but it's got the Kafka producer, the Kafka consumer and the Kafka admin client and you can find that there. The other one, the one I'm going to be using in my demo if I have time to show it to you, is Confluent Kafka Python. This is one that's free to use but it's produced and supported by Confluent. It also has the producer classes because it's called producer and a consumer. It's called consumer but it also has an Avro producer and Avro consumer and those allow you to use the Avro serialization format and work with the schema registry and also as the admin client and you can find that one there. Alright so now back to the events. Here's what we're talking about. Basically we're not we're talking about replacing standard request response, HTTP based communication between microservices which is a very common way to work in microservices. It's the way I've done it for years and it basically starts out with this. We have our client who's using HTTP for obvious reasons. It makes sense there, right? You've got a web app or a mobile app or something like that. It's calling our server and then our server, in this case the blue service, makes a call to the green service. The green service makes a call to the red service. The red service makes a call to the pink service. Pink service calls the gray service and then red service also calls that yellow one for some reason. And then it does this work and then it goes back to client, right? So and this works well and I built lots of applications this way and they usually start out great. Another variation of this might be that the blue service calls the green service and acts like an orchestrator and fans out to the other services. And that works too. Most of these systems will work and there's a lot of them in use. There's a lot of tools out there to help us with that things like Swagger or OpenAPI stuff like that. But I've been in so many of these projects and they end up looking more like this after a while, right? Usually before or even in production. And it's because you have so much coupling. It makes it difficult to make changes to or add a new feature. So it's like, well, I don't want to mess with this existing service to add this feature to have to change everything else that's connected. So I've got another service for it and you start is making some sometimes important decisions, but you end up often ends up with a mess. And that most of that comes back to that strong coupling that this type of connection gives you. You also have an issue of cascading failures with request response. If we go back to that picture here, if something happens and say that that red service, then everything connected downstream from that will crash too. Right? So you have a lot of cascading failures and things like that that will happen with this type of structure. So we're gonna propose using events for this instead. And it's not a silver bullet, but it does make it easier to keep our applications cleaner. So again, we start the same way still use a HP from our client to our initial server. But instead of making another HTTP call, this service is going to post an event to a topic. And that green service is going to consume from that blue topic and do its work and post an event to the green topic. The red service is going to consume from that topic, do its work and post another one down the line. And then the end in this situation, we're also our blue service is also going to consume from to produce the blue topic, but it's also going to consume from the great topic so it can get that final state. This is one way to structure there's other ways to do this as well. And then as far as community of the client, you could either have a two step process, which is what my demo have if I tend to run it. But or you could use some web sockets to really this back directly to the client. This will all happen just as quickly if not if not faster than the request response type system. It's just it's just asynchronous and it's less decou, it's more decoupled, less coupled. Also, so these things don't now these services don't know anything about each other at all, right? Now they're completely decoupled. They all they do know about is the events that they're working with. So again, they have to agree on schemas, but that so schema becomes kind of the new point of coupling. So it's a possible to have systems interact with each other without any coupling, right? You're going to have some. I think that a scheme is a much cleaner point of coupling than an API. Also with this system, you can if you ask want to add another certain there's that one too. So I want to put this in here just to show briefly that you can also mix and match. So we still say we have another service that's an external service. It doesn't speak events. So we can still make an HTTP call it to that. That's not going to hurt anything. So it's not like either or so I want to show there. But also if you want to add another feature later on, it's easy to work with that by adding another system, another application that might consume that some of that same data. And it can do that. And so this, this burgundy service is going to consume from that green topic. Maybe they have a different point than the red service does. It doesn't matter. They're completely decoupled. They don't know anything about each other, but the events are still there. So they can be consumed by multiple applications. So we can add, for example, this could be like a reporting or analytics application that we're tacking on just to read some data from that topic to do something, some BI or something like that with it. And it won't affect anything else. So we can easily add or remove applications from, from a system like this without affecting the others. So that's what we're trying to get to. And how do we get there? Well, first thing is to find the events. Now if you're doing a new application, a Greenfield application, then event modeling is a great way to go. There's a great conference video at the URL there that talks more about it. But it's basically a modeling system for doing, for doing everything, for designing your entire system event first, it treats events as first class citizens. So that's, that's for newer systems. If you're finding, if you're working with existing systems, finding your events in those can be as simple as taking a look at what starts your services working, probably an API call or something, right? That could become an event that's raised by that, by that calling application instead. And then what information does that need, service need to do its work? Well, that could be the payload for that event. And then think about what actions did your service perform? Is it doing some kind of work? Is it making something happen? That's probably an event. And then the information that results from that action could be, again, the value for that event. So they're just some ways of just thinking through, thinking through about the story of your application, what is happening, what's doing what, and when. And that's, that's what events are. Okay, next to find your topics. One topic prevent type is one option you can go with. If you do that, then you don't really have to have like a field in the event telling what the type is because the topic tells what it is. But then you need to put a lot of topics and sometimes it gets your ordering can be, can be harder to manage if you have events that are related to a certain domain. So one topic for domain might be better, like for example, an order. You might have a topic for all orders and every event that happens in order order placed order paid order shipped. These are all separate events that will end up in the same topic. And those situations you'll probably have at a field to the, to the value that will tell what the event type is there. Or you can mix and match, right? There's sometimes there's one works better for one part of the application than the other. The other thing with, with topics is partition sizes. It's a really important thing to think up front. If you ever want to expand your system, you want to aim high on the number of partitions. There's not a high cost to having lots of partitions, but it is a high cost to changing the partitions later on. And then don't forget schema. Schema is really important. You want to have a schema for every event type, whether you have them in their own topics or not. And you want to make sure you keep in mind that schemas are going to change. So think about compatibility rules. The schema registry does help you with compatibility. But you can set rules on it. Say I only want forward or backward or compatible or both. And it will enforce that for you. But the schemas do change and that's an important thing to keep an eye on. All right. So I think we have a little bit of time. So what we're going to do here is we're going to build, I'm not going to build, sorry Britton, don't worry. A random pizza generator. If you ever order pizza for a group of people, it's a real pain to figure out what everybody likes, right? So we're going to solve that problem for people that are, you know, crews that are working late or some pizza to get the job done. We're going to let them just tell us how many they want and we're going to give them a random selection of pizzas. And we're going to do this with events. So our customer is going to call into the pizza service. The pizza service is going to post an event to a pizza topic. The sauce service is going to be consuming from that topic. It's going to pick it up, add some random sauce selection and post to a pizza with sauce topic. And then the cheese service is going to pick that up and add some random cheese selection and post that all the way down through until we add our veggies. The source code for the application is all that URL down below. And yeah, so now let's take a look at it. Let's look at some code. Is that readable? Okay, great. It's not up here. Oh, I got my brightness down because I forgot my power adapter. Batteries going to find someone to make it a little brighter for myself. Okay, so this is our first service, since it is talking to the client through HTTP is a flask application. So it's got the built in HTTP server. And it's got two end, or it's one endpoint with two methods that tend as a post method and a get method. The post method just takes in a number for the number of pizzas that our customer wants. And then the get method will take an EU ID, which will be returned from that first method. And so it's by a two step call, right? One call the order their pizzas and make another call to get them. Like said, this could be replaced with something using some of that web sockets to make it more seamless to the user. And then all the work is done in our pizza service over here. So just to show you how to use Python in a flask application, we import the producer and consumer from the Confluent Kafka package. We also import this config parser. And this is for our properties that the consumer producer needs. We're going to get that from the configuration. And then we're going to create a producer instance, giving it the config. This pizza warmer is just a dictionary. We're holding some data in it for the completed pizzas. I'm not going to go into the details of how the application works because it's not time, but... So then we have the endpoint that calls it order pizzas. So we give it the count and we're going to order the... We set the pizza order which holds all that data. We give it the count. We're going to start... And we're going to loop through and for each of those we're going to create a pizza. And for each pizza we're going to post it to a pizza topic. So to post an event to a topic with a Python producer is just the produce method. So the pizza producer that produce, give it the topic name. It's basically like the address and here's the value. So the key and the value. Those two are the event and this is the topic it's going to. And that's what we call flush to make sure it gets written. And then we're going to return that order ID, which is just a random UID that's been generated. Now the order ID is going to be the same for all the pizzas that are ordered. So basically it's for the whole order. And on the git we're going to take that... We're going to grab that order ID from the pizza warmer which has our pizza order and we're going to return that as Jason. One thing I want to point out here too is that our consumer is started up down here. So the consumer we're going to create the consumer with the config. So this is really easy to work with these classes and from Kafka. It's just the consumer and the producer you just construct them and use them. The only extra step for the consumer is that we have to subscribe to a topic. So a producer can produce any topic because you give it to it and send in the produce method. The consumer has to be subscribed to a topic and it will only work with that topic. Or it can be more than one topic though because it's takes a list. We're giving a list of one here. So then it's going to pull that and it's going to keep on checking for any new events. And whenever it finds one it's going to do our logic here and add it to the pizza order. So that's yeah. So that's just the business type logic I'm not going to go into here but that's how to produce and consume from the Kafka application. And then just to show the next application here is this the soft service. I'm only going to show this one because the others are all identical in what they do. Basically this one is just a plain Python application. It isn't a Flask application because it doesn't have any HTTP. Normally in a request response based thing we probably all be Flask or Django applications. They'd all be making HTTP calls across the network. Lots of chatty stuff with lots of JSON flowing through the pipes. These are just simple Python applications. They're each going to have a consumer producer in them. So the consumer is going to subscribe to the pizza topic and whenever it gets one it's going to do its work. It's going to add its magic sauce. I'm running out of time. And then it's going to produce to the pizza with sauce topic. So that's how that exchange happens. And the same thing will happen all the way down to the final one. And then as we saw over here our pizza service, the first service consumes from the pizza with veggies which is the final one. So that will get to completed pizza. And then we're just going to run this real quick and see if it all works. I think I have the other ones already running. So I hope I do. We'll find out. So we can order five pizzas. There we go. So we've got to take that. And then we can do that one. But we'll do these guys. All we do is just post that UID into this get request here. I'm piping it to JQ, which if you're not familiar with that, is an excellent little command line tool for working with JSON. And there's our delicious pizzas all ready to go. And I got one minute left. So with that one minute, I'm going to show you one last thing because like I said, it's really easy to add extra services. So what we did down to the end here is I added a cheese reporter. This application is a separate application we added later on while our other system is existing and running already. Didn't interfere with it anyway. And that's going to allow us to do is to check what's the most popular cheeses that we have. And we have that running on a different port, 50-50. So we run that. And then we see a list of our and if we were to run some more or order some more pizzas and then we run our report again, we see the updated values of our most popular cheese selections. Our cheese producer or cheese supplier wanted that information. But the point I want to show there is that you can add these extra applications on consuming from the same data but not affecting them in any way. And I think that's the last thing I had was just some resources there to leave you with. There's some video courses on developer.confluent.io which goes into a lot more of the stuff and way more depth because I didn't have that much time. Kafka 101 is great if you really want to learn Kafka better. And then there's also these other applications. There's actually 12 different courses out there now. So check those out. And there's some Python and Kafka resources. And then an excellent book of designing event driven systems just talks more about event driven architectures and using Kafka in those types applications. It's a free book. You can get that link there but it's an excellent book too. And that's all I had. Thank you so much. So is there time for questions? Oh there it is. Okay good. I thought I ran out of question time. Come on up to the microphone. I promise to listen to your question. I don't promise to answer it. Yeah. Thank you. The title mentions microservices. And I mean those often have their own database. Each service. And then we're going into the territory of event sourcing. I mean this Yeah. Example you show is more like lambdas that you couple over over Kafka. Yeah. But how I mean event sourcing is known as being complex. What is your opinion? Event sourcing is known as being complex and some people are using it are getting some great success with it. But it is taking a lot of work. And I actually had a slide in there earlier. I don't know how it got removed but actually had one last slide at the end of the talk about the event-driven stuff. And it basically said this doesn't mean no databases. This what I'm focusing on is a replacement to synchronous request response connections between some microservices. Each of those services could have had a database. And that would have been fine. In fact Kafka works really well with databases using Kafka Connect which is something that allows you to connect with external systems including any database known to man. Yeah. One of our challenges in this is actually the testing side because like you said, Kafka messages are there forever. And we have kind of trouble to put it in a nice testing framework. You have to spin up new Kafka instances to write test messages and look at the output of other Kafka. I mean, do you have anything? Well, the story is getting better. I know the Kafka team, the engineers are really working on trying to improve the testing story because they've heard that complaint a few times. So they're working on some smaller, lighter way instances of Kafka that you can run more quickly in containers to do your testing. But you also can because of the data is that way, you can have applications hitting your even production data for that matter and it won't affect it anyway. So you really can. There's some flexibility in the way you structure your tests with Kafka. There's no other questions. I want to point out one last thing. See this structure here that where we go, it's kind of serialized. But think about it. It really is an exercise for the audience for later. What you could do is this pizza service could post this topic here. I should point over here. The pizza was composed post that topic right there. And then these ones could all pick up from that topic, do their own work independently, right to their own topics independently. And then then a separate application at the end could pull from all those topics and aggregate the final result, which could be a lot more efficient and kind of fun to build. I just haven't had a chance to try that yet. And that's really all I had.