 We are back. We continue with the Hello, how are you? Hi, Santiago. How are you doing? I'm good So today you're gonna talk about the high performance data processing with Python and Kafka and elastic search If you are ready we can start All right, okay Sounds good. So let's begin with the my talk So hi everyone. I hope everyone is doing well Today I will be speaking on high performance data processing with Python Kafka and elastic search to provide an overview like how Like to provide an overview all kind of applications these days work on data Data is used to represent a set of information and sometimes this set of information is either so large or Crucial that it needs a better infrastructure to perform updates faster to the data store So in this talk, we will be further looking how this can be achieved All right, so before beginning with my talk I will better tell more about myself I am currently working as a software engineer at pro first. It's India's largest equals to shopping platform I completed my bachelor's in technology in electronics and communication recently last year from LMS information technology It's in India I have been past participant in open source programs like Google summer of code where I have experienced working with the open source organizations like for secia and son Both are like open source based organizations. I've been like active contributor in these organizations in the past Yeah, so before proceeding with my talk Let's talk about What will be the overview of my talk? So first of all, I will be first discussing about the problem statement related to when we process high data and Their messaging cues can be helpful which will give an idea what we are going to talk about in upcoming slides Then secondly, then I will be discussing about the solution to solve the problem since as my title hints about the messaging cues so we will be discussing about the producer consumer design model and And I will be taking one real word example How these actually work and I will keep it really simple to understand like how things will be connecting with each other Afterwards, I will discuss more about end-to-end workflow like how the components of the solution are connected with each other and How we are solving problems to process data more efficiently and faster and Then at the last I will be giving a quote walkthrough how things are working in terms of code and Will end with conclusion learnings and qs session. All right so let's begin so Let's start with the problem statement first So imagine you are creating your own retail shopping app Let's say you are an entrepreneur and you want to create a retail shopping app Now you deployed this app on an app store so it can be available publicly to the customers and They can use it to place an order from your app now Let's consider how this app is working currently at the moment We have a customer who will place an order through your app Once this order is placed successfully. It will Send an acknowledgement to the customer like hey your order has been placed. We will deliver it to you by the next day So after sending the acknowledgement to the customer The app should make changes in the data store internally to provide a better user experience By what this is I mean, so let me explain a bit further more Here we will be using our data store to store product related information So let's say since you are creating a retail shopping app. So your data store must have a Product related information in your data store like and it can have information like product inventory its price its name etc in our case behavior should be Like once the order is placed for the product as you can see it should update its inventory in the data store to avoid unnecessary orders for the same product because let's say if you have a same product having a stock of five units and One customer orders it like all the five units and another customer is trying to place an order with the same product Having a zero stock out. So in this case Things can go wrong. So in this way like It should so that is why we are using a detest like this is we are using to avoid any unnecessary orders for the same product now assume that the things are going well and Current workflow is working fine for you You have let's say receiving an average order rate Let's say 10 orders per day Now based on the customer experience like things are going well your app is doing great And in the market people are talking about your app and they are like hey, let's try this out. It's a really cool retail app Then more customers will get on board on your platform So based on the customer experience More people will register on your app To place an order for a product they want Let's say it gets scaled from 10 customers to 10,000 customers Alright, so that is like thousand times for daily active users on your app So in this case, this can be a bit challenging to provide them a better customer experience so Like as I mentioned earlier like the main issue can be like products getting out of stock and Customers are not getting what they want right so If we look this problem in more detail We can see the how our services are working the overall app workflow how it's working So if we look inside the app component, we can see how it's working internally the there are two services handling the request Service one is responsible for providing the content on the app and the service to is responsible to update the data store Based on event like order play successfully So now this is what a problem looks here. This flow is having a many drawbacks First is you don't know what are the number of updates your data store might be receiving since Now the customers are from now the customers like thousand times on platform then you really don't know like at like what are the numbers of the updates of the different different products you are receiving as An update on your data store Second is what is the rate of events being received by a background service to from service? So basically my background service to is acting as an update as I mentioned earlier and The service one is actually acting as for the product visibility All right. So in this case, we don't know How much rate of events we can expect sometimes there can be a traffic spike sometimes there can't be any traffic spike So how we can handle such situations? third is The another third drawback is slow and synchronous updates This can happen due to like since the number of customers are really high. This can cause low throughput in the overall systems and the low like the Low throughput can cause the overall increase in latency for the services working and handling all the external requests this can be constantly in terms of business since it's disturbing the user experience and Not able to provide a correct information on the app And another final proper I can think of is like you might face time of errors if the systems will be receiving too many of requests at the same time So also there can be a chances you can also develop some of these conditions as well if your service is working as concurrent So to solve such drawbacks. We use a famous design pattern called the producer consumer model Let's understand how it's working technically Here is a messaging queue Which is responsible to process Like the data coming from the producer to the consumer. So basically a messaging queue is Can be a process where it can be it would be having a data structure in memory to store the Upcoming messages and consuming those messages It can be the process running on the same machine as the other components interacting with the queue or It can be running on a different external machine and other components in traffic with the queue In short like on the other external machine Messages are like a small size data, which tells what task has to be done. So here we have two components producer and consumers and Producer is responsible to generate the message and push it to the queue and the consumer is responsible to pick the message and do the processing as per the message producer keeps on pushing pushing the messages and the consumer keeps on consuming and those messages so in this way like we can Use the messaging queue and it is actually helping us to decouple the load on the overall systems one another good reason to use such pattern is that somehow let's say my Consumer so the consumers which are basically taking the updates and stool and Sending those updates to my data store somehow they get stopped working Then in this case producer will keep on pushing the load to the messaging queue and our queue will hold it until the Consumers are up again and stable in the production this will actually help us to keep data more persistent and will and will help us to receive important important updates to our data store without any loss of any information so This is how our solution looks like so let's say Now you are facing a lot of challenges in processing the high data on your app and you were facing let's say a App experience was really slow or race conditions were being developed which were causing a lot of dead-downs in the production Now using this pattern. This is how a solution looks like here Here we will be using a messaging queue in our app flow as you can see so I have taken for this example. I have taken a no SQL database Which is a elastic search where I will be storing my all the products information Here Kafka will be used as a messaging queue to keep track of information from producer that is from retail shopping app Service will be console the service will be consuming the information at some rate. There is some rate. Let's say it's Some imaginary rate we are doing since we I don't have much information So let's assume. There's some kind of it by which the information is getting consumed here We can define our business logic also and can process it accordingly So once that data is processed the service will be responsible to send that update to the elastic search Which is my basically product data store The app will request this piece of information again and the new information will be visible to the customers So let's go through it once again. So basically I'm using this app. I Just brought one product having an inventory. Let's say one or two Then this inventory update will go through my Kafka then it will be consumed by the service Then the service will be responsible to take that update and process it and then after processing It will send this update to my data store that is elastic search and once it will be updated on my elastic search again my app will be like all the products will be Visible to the customers with the fresh information Without any loss or any downtime at the production So the advantages we can see here of this flow The process will be kind of you can implement in implement it as asynchronous Which means that you can queue them and consume it when you want and you can have multiple consumers as well it also helps you like You can separate out the business logic from the service and messaging queue will be working separately third is like performance monitoring and the metrics which I think also basically By performance monitoring and metrics, I mean So the rate at which the bits will be coming you can clearly estimate that things like how much traffic spike you're experiencing or how much your app is handling the load And the metrics you can see if the app is performing really well how much is the throughput and the latency being occurring on your app Also, it's a more resilient solution. That means you have very, very less chances of application going down. Okay, so Let's see how this is working in form of basically code. So before like proceeding to the code walkthrough Let's discuss like what kind of data are we going to update in our data store that is elastic search This is how my product data looks like in a in my elastic search data store in our example, we will try to update the stock of the product without affecting other properties of the product and last document All right. Okay. So with one thing, like two more things actually. So I wanted to mention like in the during the code walkthrough I've implemented both producer and consumer In a same service just to develop understanding like how the workflow is working. But in real world applications, these can implemented as separate microservices and they can be like Combined and orchestrated in a way in a production in a real world. So I have used Python 3 fast API framework and the async Kafka and async elastic search client We will be going through each step like producing and pushing data to Kafka and again consuming from Kafka Okay, so I'm sending consuming from Kafka and then possessing it and sending it for an update to elastic search in my code. I've defined a retail streamer which signifies my overall data streamer having two components producer and consumer So this is the code for producing and the pushing the data to Kafka. First, we initialize our fast API application as you can see, then initialize our Kafka client. I'm using my like I'm using the AIO Kafka client to support the async and await operations. So this can help me to like basically distribute process my updates asynchronously on the application level. So main function begins from the line 21 where I have defined method by name Kafka produce which will take the data payload from the request. So as you can see it's taking the data payload and also it is taking the topic name as well. So it will assign some message ID to it. Then it will just send this to the AIO like to the Kafka producer via like as you can see AIO producer is actually sending my message to the Kafka. Okay, so in this way we are actually producing the message. So overall, like this is the code which will be work like once the acknowledgement is sent to the customer like the order has been placed. This code will get activated and this will be this code will be responsible for sending that update all the like all the summarized things in the message that what has to be done. So this is how overall request payload and the response looks like if an like order is placed on that as I said it will send it is sending the payload to the Kafka. So this is how the payload looks like. You can also observe that I am separately creating a message ID which is unique. This will like this message ID can be helpful to trace my updates if they were successful on the consumer side or not or did it got failed due like on the producer side. So this is like the trace ID can be helpful for me. All right, so that was for the data producer. Now we are looking at the code for the data consumers. So once my messages has been produced and pushed to Kafka, this is the code for consuming data from Kafka and updating data to and this code is will be also responsible to update data to elastic search. So the overall implementation is same as it was for the producer code. You can see that the main function begins from the line eight where I have defined method by name Kafka consume. This will take data from the Kafka queue and just basically actually holds the queue every time to see if any message has been received or not. So this is kind of a real time. So let's say my data set is really small and how should I say like it's in 1000 2000 or so this will be like consuming very fast like within one second or two seconds. So there will be a like it will be a real time based. So once we receive this message, as you can see we process them before sending it to elastic search. So an elastic search also we are like sending this as a bulk update. So from Kafka let's say we have received some kind of 1000 messages and I am sending them and batch of 500 500 to the elastic search, making my code actually like better performing. So this is how overall like request looks like, which it has consumed it from the Kafka the response basically tells that data was consumed and updated to elastic search successfully. And you can also see that it is also like telling me for from which what topic it is coming, like from which topic it is actually consuming the messages. So let's look at the data update code. So once we have received the data, now my service, which is separate from the Kafka queue will be responsible to update this this like the information coming from the message to the elastic search. Now, this is the code for possessing the data and sending update to ES from the message update to the ES from the message we get product ID. And in context to our example, I am storing document with underscore ID, which will be like, which will be my product ID and the underscore ID here in in the elastic search shows it is a unique document. Then it fetches the document from the ES on the basis of the underscore ID and then we update the only necessary document which has to be updated. So this is how a final update to the ES looks like after going through producer consumer flow. So as you can see, like earlier mass, like the product stock was five. And let's say I shopped on the app. And then I basically checked like I put those two items and then checked out successfully. And after that it sent like knowledge to my system. And it sent the update back to that micastore. Now the stock has been changed here. So you can see you can change the any kind of information you want. And this can be handled on the service layer actually. So conclusion and learnings from this talk is first of all, like we learned about producer consumer models, how do they work, their advantages to process millions of data. Second thing is we had a deep dive in the example of producer consumer, how they can be used by taking a real example. Then third we understand understood like how the data store is being updated and like how we can achieve a better performance and other benefits using a messaging queue. So these are the conclusion and learnings. Most of the learnings I have covered. If anyone is interested to see the Python code implementation, how things are working together as whole system, they can like find the whole code on my repository. I also have a proper documentation for that to if you are interested to take a deep exploration in my code and want to run it locally, then it's all there. All right. If you have any questions, you can also reach me out on matrix email or LinkedIn. Those are my social handles. All right. That's all from my end. Thank you everyone for joining and thank you to the EuroPython community for having me here. Wow. Super nice. Thank you. Now we have one question for you. Yes. And it says, are you using files to process Kafka topics? I mean like first, no, so basically I'm using fast API application, fast API to process those Kafka like Kafka topics are generally like coming from the AIO Kafka client and the framework which I'm using is the fast API. So fast API is like working. I'm using those inbuilt clients of fast API and the Kafka client like supporting those async operations. Yeah. Yeah, I understand. And some people here says that they did have repo might be private. Yeah, I will just make it public after this talk. Perfect. Yeah, I forgot to do it public actually. Okay. Well, if people have more questions, they will ask them in the breakout room in the parrot. Thank you so much. Thank you so much. Thank you very much. Thank you. You too. Bye bye.