 And thank you everybody. I'm Francesco Diziot, developer advocate at Ivan. Ivan is a Finnish company providing open source data platforms as managed services on top of the cloud of your choice. This choice of open source data platforms already includes Apache Kafka. And I'm here today in order to basically work with you in order to understand how we can build together event driven applications with Apache Kafka and Python. If you are in this basically stage, listening to me, you are probably familiar with Python. But on the other side, you may ask yourself if you didn't see all the shows before, why Kafka? Why should I use Kafka and what is Kafka? Well, in order to understand that, I usually position developers in kind of two words. On one side, you are a developer which is working on a new shiny app that you just invented and you are basically developing yourself. On the other side, you are still a developer, but you inherited an old application that you have to support, maybe extend, or maybe take to the new word. Still, both new app and old app, I've never seen an application working in complete isolation. You will have components within the app that needs to talk with each other and maybe you are taking this app and exposing it to the word. So you will have this app to talk with other apps out in the world. And believe me, this communication, you want it to be in kind of real time and reliable. So we know where we are going now, but let's do a step back. Let's understand how things were done until a few years back. We had this kind of old application that were most of the time talking and pushing data, reading data from a backend database. Now, this wasn't done for every event, for every row, for everything that was generated in the app. Most of the time, the app was batching the rows and moving that to the database in batch or reading from the database in batch. This was adding a consistent delay between when the data was available in the app and when it was pushed in the database or when it was available in the database and pushed back in the app. Now we live in kind of a fast word where we cannot wait, really, for this delay to happen. We need to build an event-driven application. But without those, an event-driven application, an application that as soon as an event happens, my application is to know about it, start parsing it, and probably taking the result of the parsing and push it to another application as well that will be as well event-driven, creating a chain of those applications. But now let's do a step back. Let's try to understand what is an event. And I believe we all are familiar with events, and specifically we are more familiar with notification. When we receive a message, we receive a notification that someone sent us a message. When we use our credit card, we receive a message that we use it. If our credit card details get stalled by someone and the someone makes a payment, we want to receive the message immediately. We don't want to wait five minutes, 10 hours, or just 20 seconds. We want to receive the notification immediately and react as an event-driven application in order to stop our credit card from being used. And going more into the mobile world, we are used specifically now that we are in kind of a pandemic to do food orders with our mobile phone. By the time that we open our phone, open the app, select the restaurant, select which pizzas we want in our order, we are creating a chain of events. Once we confirm the order, the application, in this case, the restaurant will receive the order and will act as an event-driven application in order to make the pizzas. And when the pizza is ready, well, we receive a notification about it. So, again, based on what we said before, why should we build an event-driven application? Well, because we live in a fast world. And specifically in a lot of cases, the value of the information is defined by the time that it takes to be delivered. If we take the example of knowing the delivery position of the delivery driver of our pizza, we need to know it with a minimal delay. If the time that it takes from when the formation is taken from the mobile phone of the delivery driver or the delivery driver, till the time that it comes to our map in our phone, it's like five minutes, that information becomes irrelevant. We need to act, we need to know, we need to pass the information as soon as possible. And we need a method in order to transmit those information, which is reliable, and it's in almost real time. And this tool is Apache Kafka. What is Kafka? Well, the basic of Kafka, it's really simple. It's the concept of a log file where we write events one after the other as messages. The log itself is an append-only and immutable data structure. So, we write event number zero, and then event number one, and then event number two, and then even number three. Even more, it's immutable. So, it means that once I write event number zero in the log, I cannot go there like a database, a row in the database, and change it. Once I write the event zero in the log, it will always be like that. If something changes the reality of event number zero, I will store it as a new event in my log. And of course, Kafka can handle multiple logs of multiple event types that are called in Kafka terms topics. Even more, Kafka is resilient because it's a distributed system. So, this means that when you create a Kafka instance, you're actually creating a set of nodes which in Kafka terms are called brokers. Now, the log information is not stored only once across the cluster. It's all multiple times following a setting which is called replication factor. In our case, we have three copies of the sharp pages log, so replication factor of three, and two copies of the round edges log, so replication factor of two. Why do we do that? Well, because we know that computers are not entirely reliable. So, we could lose a node and still we are not going to lose it there. Now, we define Kafka as technologies to store events, but we need to understand what is an event for Kafka. For all that matters to Kafka, an event is just a key value pair. And you can use simple events like I want the temperature max as key and the value 35.3 as the value. Oh, I could go wild and I could include in the key the shop name receiving the pizza order and the phone line used to make the order. And in the value, all the order details like the order ID, the name of the person making the order, and the list of pizzas within the order. By the way, I'm using pizza. I'm Italian. It's one of the few of the little things that I can talk about when creating technical content. So, now this was all the theory about Kafka. How do we write to Kafka? If we want to write to Kafka, we usually have in this case a Python application that is called a producer and usually writes to a topic. In order to write to that topic, all the application needs to know is where to find Kafka, hostname and ports of the brokers, how to authenticate into Kafka. There is no authentication at all. It's the SSL authentication, SAESL, it's your choice. And how to encode the information from, for example, the JSON representation into the row series of bytes that Kafka talks. Because for Kafka, a message is just a row series of bytes. On the other side, once we produce the data into Kafka, we want to consume to read the data from Kafka. And we will do that with, again, another Python application, which is called a consumer. The consumer reads one event at a time and communicates the offset back to Kafka. So, message zero is read, offset zero. Message one is read, offset one. Two and three and four and so on. Why communicating back the offset is so important? Well, because we know, again, computers are not entirely reliable. So, our consumer could go down. The next time that we recreate an instance of the same consumer, we want to parse the data, but we don't want to restart parsing from the beginning of the log. Since Kafka knows the offset that the last consumer was reading from, it will be able to send us the only the new event that wasn't read before. In order to read the data from Kafka, all the consumer has to know is, again, similar things like the producer, where to find Kafka, how to authenticate, and because we were encoding before, now we need to decode. The last thing that the consumer needs to know is from which topics the consumer wants to read. So, as of now, it's a lot of talking and slides. I want to show you something live. And I want to show you with a nice demo. Let me show you what I created as of now is based on Ivan, the company that we work for. We offer Kafka and other tools like Postgres as a managed services. And what we will see today is all based on a set of Jupyter notebooks that I created, and I will share the URL at the end of the talk. So, let's start by creating a producer. We are going to use the Kafka Python library, which is the default Python library that you can use in order to interact with Kafka. Let's install it. And then while we wait for the installation to happen, we can start understanding how we can create a producer. We import the Kafka producer, and we create an instance of the Kafka producer, pointing to our set of brokers, passing the security. In our case, we use the SSL security protocol with three certificates. And then we take the data that we will see later, that is in JSON format, and we will basically turn that into the row series of bytes that Kafka understands. So, let me create this Kafka producer. And now we are ready to send our first message. What we will send is the first pizza order with ID equal to one. The name is Francesco myself ordering a pizza Margherita, my favorite. And I'm flashing it to be sure that the message arrives in Kafka. Let me check. Okay, this executed. The message landed in Kafka, but you could trust me or not. So, let me demonstrate that everything is working with a consumer. Let me put the consumer side-by-side with the producer. Let me close a little bit this one. And what I'm doing on the consumer side is, apart from importing some configuration, I'm creating a consumer. The client ID and the group ID is something that we will live for later. The rest of the settings are pretty much the same as for the producer. So, well, to find Kafka and the security. Before we were serializing, now we are deserializing, taking the raw series of bytes back to JSON. Okay, let's run the consumer. Now, we created the instance of the consumer. We can check which topics are available in Kafka. And we can see that there are some internal topics together with a nice Francesco pizza, which is the topic that I use to write my pizza order. So, I can subscribe to it. And now, I can start reading all the messages. Okay. My consumer started, and you could notice something strange. My consumer is a never-ending loop, and I'm expecting that because it's a kind of streaming application. So, it will check with Kafka if there is a new event coming and immediately display it. But, as I told you before, I created an order for Francesco ordering a pizza Margherita, but I don't see that on the consumer. This is because, by default, when a consumer starts, attaches to Kafka, starts reading from the point in time that it attached to the Kafka topic. It doesn't go back in history. This is the default behavior, and you will be able to change it with a parameter. But just to let you know, this is what you get by default. I've been spending a lot of time trying to understand why I couldn't go back in history, and it was just because of a parameter. So, to demonstrate that everything works, if now I send other three orders, one from Adele, one from Marc, and the one below, let me check why my mother... Well, it's only two. Let me send those two orders. I'm immediately receiving that on the consumer side. So, both the order from Adele and the order from Marc, I have them on the consumer side. Small thing to mention. Again, I'm Italian. You can add the pizza as you wish, but I suggest strongly, if you come to Italy and order a pizza, try to avoid the Hawaiian one. We usually don't put pineapple on pizza. Again, it's your choice, so you could do differently. It's just an advice. Okay. We understood producer and consumer. Let's go back to a little bit more slides. Okay. One of the things that we said is that we can push data into Kafka, but of course, we could keep the data in Kafka forever, or we could say, I want the data in Kafka only for a limited amount of time. How can we dictate for how long the data is stored in Kafka? We have two options, either by time or by size. We can either tell for six months, two hours, 10 minutes, or we can tell until the log reaches the size of 10 gigabytes. If when it arrives to 10 gigabytes, Kafka deletes the old chunk and then lets the log grow again. We can also use both. But going back to the size of the log, one of the things that I said initially is that we have this log that contains all the information, all the events, and the log fits into a broker. But it would be sad if we had to basically purchase huge disks in order to store a huge amount of data, or on the other side, if we had to limit the data that we want to store because we have only small disks in our cluster. That is not completely a trade-off with Kafka because Kafka has the concept of topic partitions. Topic partition is a way to divide events belonging to the same topic into subtopics called partitions. In the case of my pizza orders, I may want to include the pizza orders for Francesco Pizza, the restaurant Francesco Pizza, in the blue partition for Luigi Pizza in the yellow partition, and for Mario's Pizza in the red partition. Still, all those are pizza orders, so all belong into the same topic, but I divide them into different partitions. Now, if we have a look at our three-node cluster, what is stored in a node is not the full log, it's a partition. So if we want to store more data in a smaller set of disks, we just need probably more nodes and more partitions. The other beauty of this is that still we are storing multiple copies of the partition, so even if we lose a node, we are not losing any data for our topic. In the example that I was telling you before, I was saying I want to store Mario's Pizza in one partition, Francesco Pizza in the other partition, Luigi's Pizza in the third partition, but how do we select the partition? This is usually done with the key part of the message that we send to Kafka, and by default Kafka ensures that messages having the same key will end up in the same partition. You could also send messages without a key and you get a round robin selection of the partition, so you will be never sure in which partition your message will be sent to. Why selecting the partition with the key is a wise choice, it's because of ordering. Let me show you a little example. You have your Python application writing to a topic with a couple of partitions, and then you have a consumer. It's very simple, this example. It only has three events, three messages, a blue one first, a yellow one second, and then the red one. Now when writing to the Kafka topic, we could end up with a situation like blue message in partition zero, yellow message in partition one, and red message in partition zero again. Now when reading from Kafka, it could happen, it will not always be the case, but it could happen that we will end up in a situation like this. Blue message first, red message second, and yellow message third. If you see the order of the events that we produce to Kafka and the order of events that we consume from Kafka, this order is different. This is because once we start using partitions, we have to give up on the global ordering because Kafka only ensures the correct ordering per partition. So we need to think carefully for which subset of our events we care about the related ordering and put all of them in the same partition. Probably using the restaurant name as partitioning mechanism makes sense, because I want to know if the order from Mario comes before or after the order of Johnny belonging to the same restaurant, but I don't care if the order from Mario and the order for Johnny are in different orders if they are for different restaurants. So as of now, we understood that partitioning is good because allow us to have a better tradeoff between this space and amount of data that we want to store in Kafka. Partitioning is bad because we have to give up on global ordering, but partitioning also allows us to scale out. If you think about the concept of a log, you can minimize it with just a thread writing one event after the other. And in a very simplistic view, you could think that the throughput is given by the thread writing one event after the other. If now we have more partition, we have more threads that are independent that could write, in this case, possibly three times faster the data into Kafka. So we could have a lot more producer writing data into Kafka. And we can also have a lot more consumer reading data from Kafka. Still, if those two consumers belong into the same application, they want to read all the events coming from a specific topic, but they don't want to read the same event twice. How can we do that? Well, Kafka, if this is the case, assigns to each of the consumer a non-overlapping subset of partition. If these big words don't really make sense, well, let me show you. If C1 and C2, in our case, belong to the same application, what Kafka will do, it will, for example, assign the blue partition to the first consumer and the yellow and red partition to the second consumer, making sure that still all the events are read, but none of the events is read twice. Let me show you again with a little demo. Let me go back to my, there we are. So let me now create a new producer, a new topic in Kafka, and what I will do here, I will use Kafka admin in Python in order to create a new topic with two partitions. Okay, I created a topic with two partition and error code is zero, error message is none, so it's successful. Now, before pushing the messages, let me create two consumers, consumer one at the top and consumer two at the bottom. So they, those two consumers belong to the same application. And if everything I told you so far is true, since I'm pushing to the same topic using a slightly different key, ed equal to zero, ed equal to one, my guess is that those two messages will land onto two different partitions. And since I have two consumers, Kafka will assign one partition each. So I'm pushing two events in the same topic, but they should read one event for each of the consumer. Let me check. And this is happening. At the top, I'm reading from partition zero offset zero. So the first message of partition zero, the order from Frank ordering pizza Margherita. And at the bottom, partition one offset zero, first message of partition one, the order from Adele ordering a pizza Hawaii. If now I send other two messages reusing the same key, I'm expecting that the order for Mac will land in the same partition as the order for Frank. And the same goes with the order for Adele and the order for young. So let's check this out. As expected, the order for Mark is landing in the same partition as the order for Frank, because they share the same key and the same for Adele and young, all as expected. So also partitions work. Let's go back to a little bit more slides. One of the things that makes Kafka different from a lot of other streaming application is the fact that once we write, once we read a message from Kafka, Kafka doesn't delete the message, making it available also for other application to read it. So in this case, what I told you so far was having multiple threads of the same application that they wanted to work one against the other in order to read all the events from a topic, but don't read the same event twice. If we think about the pizza analogy, those two could be the pizza makers that they want to read all the pizza orders and they don't want to make the same pizza order twice because they would lose money. How can we deal with this with Kafka? Kafka is the concept of consumer group. So we just need to define the two pizza makers as part of the same consumer group and Kafka will basically split the orders between them. But now we could have another application, let's say the billing person. The billing person is a completely different application that wants to receive a copy of every pizza order because it has to make the bills. But wants to read the pizza order at its own pace that has nothing to do with the pizza makers. If we can define and if we define the billing person as a part of a new consumer group, what Kafka will do is basically submit a copy of all the events belonging to all partitions also to this consumer and will basically allow each consumer group to read at its own pace that the last bit that I want to talk to you about is basically taking Kafka and putting it into the word of not greenfield implementation. As of now, all the examples that I gave you were using Python as producer and Python as consumer. However, we know that most of the times, Kafka will not be the first technology that will be available in a company. We will have the data sitting somewhere in a database. We will have the data sitting in a file store. Still, we want to integrate Kafka. And we can do that with Kafka Connect. Kafka Connect is a framework that allows us to take data from existing sources. And if once we have the data in a Kafka topic, push the data into a lot of sinks. So we can, for example, take data from a Postgres database and send it to Elasticsearch or to BigQuery. Kafka Connect also allows us to evolve existing application. So if, for example, we go back to what we said initially that we had a Python application right into a database, well, if we want to take this into the new world of streaming data, we could, with Kafka Connect, do change data capture using, for example, the Visium in order to track all the changes happening to the database and propagate them into a Kafka topic. At the same time, if we have data, if we have, for example, an application right into the Kafka topic, we can use Kafka Connect in order to distribute the data. For example, if we want a copy of the data in a topic into a database, we want another copy in BigQuery. And we want a third copy in Amazon S3 for long-term storage. Those are only Kafka Connect threads. And the beauty of Kafka Connect is that it only needs where the data is coming from and what is the target technology. And Kafka Connect will take care of syncing Kafka with the target technology. Let me show you this in practice. And this is the last demo that I want to show you. If I manage to move my screen, there we are. So we will now create a third producer. Let me show you in big here. And we will create a new topic. This time, we want to do a trick. We want to push into the topic, not only the pizza orders, but also within each message, we are going to send the scheme of the message itself. So our downstream Kafka Connect can understand the schema and create and populate a target table. So let's define the schema. I have a schema for the key, which is made by a field called ID and is an integer. And for the value, I have two strings, one that is called name and one that is called pizza. Very, very simple. Now we can create a topic and send some data. Sorry, but my mouse is behaving a little bit weird. There we are. So let's send schema and payload together in order for Frank ordering a pizza Margherita, in order for Dan ordering pizza with fries, and in order for Jan ordering pizza with mushrooms. Okay. So this data is now in a Kafka topic. What I want to do is to basically take this data, which is in a Kafka topic, and move them into a table in Postgres. And I want to use Kafka Connect to do the job. All I need to know is basically where to find the data and when to push the data. So the data is in a topic called Francesco Pizza schema. And I will create a Kafka connector named sync Kafka Postgres, using a JDBC sync connector, pushing the data to a Postgres instance that was created before in Ivan. And with a new PG user and new password 123, very secure. So let's take this configuration file and let's go to Kafka. We can go to the connectors and create a new connector. I create a JDBC sync. And I could fill all the details here or let me check if there we are. I can edit the configuration file, copy and paste. And once I click apply, Ivan will pass all the results for me. So before showing that, before showing Kafka Connect in action, let me go here and let me go to the terminal and let me open a new window. And I will connect to the Postgres database. Okay. Let me check if there are tables around. There are no tables around. Now, let me create the Kafka Connect connector. The Kafka Connect connector is created and is running. So now if I go back to my Postgres database, I have my Francesco Pizza schema table. If I select star from Francesco Pizza schema, I have the three orders that I just sent in Python. And Kafka Connect connector created a table on the fly and populated. If now I go back to Python and, you know, I want to send a new order for Giuseppe ordering a pizza Y. Again, not a good choice from my point of view. This executed. If we go back here, let's refresh. And we have also the order for Giuseppe with a pizza Y. So Kafka Connect created a table populated with the first batch of rows. And now every time there is a new event happening in the Kafka topic is sending this back to Postgres. All working. Some final thoughts that I want to share with you. First of all, my Twitter handle. You can ask me questions about pizza, Kafka, Postgres, and Python, possibly in this order. Second link is if you want to replicate what you saw with the Python notebooks, they are open source. So you can clone the repository, you can play with Ivan resources and the Python notebooks. Third one is if you want to start with Kafka, but you don't have a streaming dataset. Kafka is a huge beast because it's a streaming data technology and finding a streaming data source is not really easy. If you don't have one, I create a Kafka fake data producer in Python that produces fake pizza orders. You can use the fake pizza orders or you can take the code, change it to create your own fake producer for Kafka. The last link is if you want to try Kafka, but you don't want to install and manage it yourself, try Ivan.io, the company that they work for. We offer that as managed service and we have a free trial that you can use to try. I will be here for all the questions that you might have. That's everything for my session. Thank you very much. It was a pleasure talking to you. That was fantastic. Thank you. Stick around for a little bit. We have a bit of time for questions. Okay, so we have our next speaker on deck. We do have a bit of time before he starts. I'm going to tell the audience the same thing if you want to use chat or if you want to use Q&A. Apologies to the question that was asked about the previous session. I just my eyes glazed over the fact that it was a Q&A section. So anybody has any questions, feel free to post them. I'm going to use this opportunity as an MC here to ask my own question first. Can you go back to the code for the topic partitions? The one part I was fuzzy on was the actual declaration of the topic and where you said it has x many partitions in it. Okay, yeah, I believe let me go there because it's something. Okay, so this is the partition producer. Okay, so one of the things that you can do with Kafka is enabling what is called the auto topic creation. The auto topic creation parameter allows you to write to a topic. If the topic doesn't exist in Kafka, it will automatically create it for you. It's good for testing purposes, but for a production system, it's a nightmare because you write a typo when you write the first message and it will create a new topic. So what I suggest always is to use the Kafka admin client in order to create a topic on purpose before writing into them. In my case, I created a new topic and I gave it a name. It was, I believe, Francesco Pizza Partition. What I'm saying here, the number of partition is two. So I'm telling to Kafka, create a topic with two partitions. By default, I believe Kafka creates, in our case, creates all topics with one partition. In my case, I wanted two, so I'm writing that down. And also, I can write replication factor. In this case, it's one. Again, it's not something that I would suggest for a production system. But when you create topics with the Kafka admin client, you have a lot of power into defining everything about the topic. For example, the retention period. Do you want to keep the writing Kafka forever? Or do you want to keep the writing Kafka for two hours or 10 minutes or six days? I hope this answers your question. Yeah. So the part, this absolutely answered my question. Where I was fuzzy was, I can't remember if this example used ID or restaurant name or if you just, the restaurant name is an example. I'm seeing this ID. And the part, my brain was too slow to catch up on is if you had to declare ID was somehow involved in the partition. No, you just declare a number. And then it's based on the keys. Okay. By default, what Kafka does, it hashes the key and use the result of the hash in order to select the partition. You can do that. And you can play with that if you are keen to do that. Or if you have specific needs, let's say that I want always to send the data for Mario's pizza to partition zero, I could with Python also select which partition I want to send the data to, or I could write my own partition with my own logic that will not do the hashing, but will use another logic in order to push specific messages with a specific key to a certain partition. In my case, I took a shortcut and I used just slightly different keys that I was sure that they will end up in two different partitions. Gotcha. Okay. Good answer. And you know, that's the kind of answer that's indicative of any mature product where you're like, well, here is the default, but here are all the different hooks you can customize the daylights out of it as you want to. Yeah, exactly. You have a lot of options here. And I'm just touching the surface with with Kafka and Python here. You have a lot of option for partitioning. You have a lot of option for pushing. You have also on the consumer side, you have kind of two methods. One is subscribe that allows you to basically subscribe to the world topic. And it's good because if you subscribe with multiple clients belonging to the same application, Kafka will manage this kind of partition to consumer assignment, making sure that each partition has an associated assignment, associated consumer, but none of the partition is associated with multiple consumers. There is also another way. Let's say that you push all the orders for Mario's pizza to partition number zero. From a consumer point of view, you can say I want to read from that topic, but only from that partition, because I know that all the data will be in there. This is also another way of basically announcing the power and announcing the speed of consuming data, because you are only filtering for a specific kind of dataset that you know are located in a specific partition. Sorry, I couldn't unmute. Okay, that's fantastic. That actually cleared up my entire question. We're looking good. So I'm not seeing any other questions. If you do have a second before you log out, I would recommend you scroll through the chat because it was very entertaining as we delved into pizza and some very geographical arguments and discussions on it. I joked in there. I'll probably make this joke again during the closing ceremonies, but we in the Python world, we don't have these curly braces, kind of religious holy wars, but the pizza discussion is getting quite vicious going on in chat right now. I refrained because I am actually a Hawaiian pizza fan, but I seem to be in the minority there, so I just kind of backed up from that one. That aside though, you got some amazing feedback. So definitely check it out before you leave. Everybody seemed to really enjoy it. I myself absolutely love this. I know Kafka is something that, like I said in the beginning, we as a company are putting more into and I think getting yet another explanation of it, especially something so clear with a very tangible real example, some great slides, some great demo content was fantastic. So thank you so much. I really appreciate your time on this. Everyone in chat, you can see all of his links floating there and I pasted a fair amount of them. So by all means, follow him on Twitter. Reach out, let him know how he did. And thank you. Thank you very much.