 Hi, good morning, all of you. I hope everybody can hear me at the back, too. Cool. My name is Konrad Modi. I work for a travel company back in India known as MakeMyRed.com. Today, I'm here to present a topic which I call design considerations while developing, evaluating, or deploying your own distributed task processing system. Why I say developing, evaluating, or deploying is because you might start with your own stuff. You might want to write a framework right from the scratch. You might be evaluating a lot of frameworks that are available in the market for distributed task processing. Or you've shortlisted one, but now you want to deploy one of them. So these are basically my learnings over the past year, past year and a half that I've been playing around a framework known as Cilry, that what are the essential components a distributed task processing system should have that one can evaluate in other systems, or while designing one from scratch, one can keep in mind. What I'll be doing today is I'll be defining the nitty gritties of what jobs is, how you process tasks is. So for example, this example that you can see over the image over here, these are the few workflows that I designed back at my office using Cilry precisely. We'll be talking about how to design these stuff, what are the considerations that I take for all these tasks, what Cilry is all about, a brief overview. I would not dig much deep into Cilry, probably because I think you might want to evaluate different solutions as well, but I would love to talk about Cilry after the talk as well. Design choices, in terms of design choices, I've tried and formed four different components that you should evaluate within a system, be it scheduling, be it task management, be it worker management, be it admin part of it, and the reporting part of it. And finally, we're gonna talk about different workflows and the other tools that are available right now in the open source community. I firmly believe that everyone sitting over here has a use case for distributed task processing, so it would be great if you could think of your own use case and then try to connect to my talk at the moment. I have my own set of use cases that I've tried and solved with distributed task processing systems, you might be having your own systems. So it would be great if you could just connect with your use cases and then think about this as a talk, so that will really help you understand what we're gonna talk about in the next 30 minutes or so. So what is a task and what do you mean by distributed task processing? Here are a few examples that I've written that I am taking care of using the distributed task processing systems, but essentially for me, task is a subset of a job, all right? So let's say a job is I wanna send email to one million users, right? Now that's a complete job for me. How I will break that job into smaller tasks I need to fetch a list of one million users that I need to send an email on. I probably need to apply a filter that what email needs to be sent to what customer, and then probably I'll use my SMTV server to send those emails, right? So what I've done over here is I've picked up a job of sending emails and distributed and divided that into smaller, smaller tasks. Now, as we divide them into tasks, we now get into a scenario where we'll start running into problems. For example, what you need is a asynchronous task queue because each email that you need to send does not depend on the previous customer, right? Each customer is an individual user for me, so I really need not wait for one email to be sent completely and then the next one to be sent. I can fire them in parallel. The other thing I need is distributed message processing. One server will not suffice for me, so I need different number of machines to fire all those things for me. Support in real-time processing plus schedule tasks. What that means is I might want to send all of them right now, or I might want to schedule them basis say next week, next month, after 20 days, after 25 days, and stuff like that. So what Ciliary lets me do is it lets me do all these things very easily. It is very simple to get started with. I'll show you a small demo of how easy it is to install Ciliary and then get started with it. It only takes you five minutes to do that, to write your first task and make it asynchronous. That's what I'm going to talk about. Flexible and reliable. All the design considerations that we're going to talk about today are configurable via arguments in Ciliary. So that really makes it very flexible and powerful for me. Everything is message parsing. So what that means is you have a broker. I usually use RabbitMQ for that. So everything that I need to communicate to my workers in terms of tasks or, say, inspect the state of the workers, I do it via RabbitMQ. So all I need to do is throw a message on the message queue and the workers then take care of that. Out of the box for operational and management of the system. What that means is I know there is a system and now I deploy it. It is working fine for me until it breaks down, right? Now I need to figure out what point did the system break on, why did it break, and stuff like that. So Ciliary itself comes with a lot of tools that help you manage the system better. Plus, people have written wrappers over it and they're open source, so you can adapt to them and then see how the inner working of Ciliary is performing for you. For example, how many tasks is a worker performing? What's the latency of per task that the worker is performing? Do I need to improve that? Do I need to, or is it working fine? Is my message queue dying? Is my message queue right or not? So all these things come out of the box and you can use them to manage the system better. Then, these are the few use cases as I was mentioning earlier. So for example, I was writing a monitoring pool system wherein our use case was to poll the database, get some queries out of it, and trend it on a trending system. Now, we're talking about hundreds of queries per minute over here, right? So, and these are all business matrices, so per minute value is very important for all of them. What we needed was a solution where I can find fire hundreds of queries on my database, assuming my database would scale to that extent. What I wanted to write was a layer that can fire 300 queries, get the result, and trend that. So that is one of the major use cases that we are solving using Silverie right now. This is a very rough architecture of what Silverie looks like. So you can consider that as a web application. You can consider that as one of your tasks that is basically doing asynchronous processing. So think of it as you are uploading photographs to Facebook, right? Until your photographs are done, uploading you are not stopped from using Facebook, right? You are still able to use Facebook and other features. Similar is the case with YouTube, right? So that is what is happening is the backend processing. That is what is happening in the async at the backend. You need not want the users to wait until you're done with your processing. So for example, you perform an action on the website. So you perform an action on the website. The task gets into the task queue. You have your workers that will execute the task queue, and probably you want the result back in the DB or some other layer, or you don't want the result. And that is where the feedback loop for the web application will come. We'll be majorly talking on the green, purple, and the yellow parts today. This is how easy it is to install Silverie. What it does is you just do a pip install Silverie. I'll probably show a demo of this. So I have already installed Silverie on my system, right? Now I have to define a Silverie config that basically tells me which broker I'm going to use. So I'm using AMPQ right now and RabbitMQ for that. So I've defined that it's on my local host and use the AMPQB broker for that. Over here, I'm not concerned about what will be the result of my task. All I'm concerned is the task should get fired in parallel and in asynchronous nature. After I'm done with the configuration, I will then define a task.py file, wherein I write my functions, what need to get executed. Before that, I'll just load my configuration file, and these are a few demo tasks that I've loaded over here. These are simple Python functions. So it is very easy for you if you have a current application that you've written in Python and you now want to migrate to Silverie. So Silverie, because of the nature of how it absorbs tasks, you can simply convert your Python code into a Silverie task very easily. You need not rewrite much of the code for that purpose. So let's start Silverie instance. So now I'm telling Silverie to start the instance with task.py as the task file for it and with a concurrency level of one. So this is what starts a Silverie instance on my machine. Now how do I execute tasks? Let's say task is equal to add is the function and I want to add two numbers. So now I fired the command and it says, I've received a task.add. This is the unique ID that it assigns to each task that comes in Silverie so that it can keep track of that and it basically added two numbers and printed the results. Over here, I can see the status, what happened to my task. It has successfully completed itself so it would give me a success. If I say what was the result of the task, it would give me four. So this is how easy it is to get started with Silverie and do stuff for you. And this is all it takes actually to write your own task. So what you need to take care of is task.py file and then you can call them. Your frontend from where I'm calling the task will not wait for your task to finish. So for example, if you put a sleep of, say 60 seconds in your add function, when you do a t is equal to add.delay, 2 comma 2, it will still fire it and you'll get the console back. So now with the ID that you got, you can then call back the message queue and say what was the result, what was the success. So you can put a check whether the query is still running or has it finished. If yes, then what is the result? Coming back to design choices. So this is what I was mentioning that I have tried and formed four wrappers around it. One is the scheduling capabilities of the distributed task processing system that you have. How do you manage your tasks in the system? What are the worker management and how do you do admin and reporting part of it? Because at the end of the day, you want to see how your workers are performing, how well did the tasks go, how many of them failed, how many of them were successful. So scheduling capabilities can be in the form of immediate execution or in the form of scheduled tasks, right? Now, both of them have their own limitations and their own pros. So you should evaluate a system that has both these qualities. Cilry lets you do both of them. Even if you're scheduling tasks, you can put a crontab-style entry in the system wherein you say, I want to execute it every morning, 7.30 a.m., or you can even do a humanized form of entry where you can say, I want to execute this query every Monday. Apart from that, you can set interval-based events as well. For example, I want to execute this every 15 seconds. I want to execute this every 30 seconds. Even if you're not satisfied with that, it lets you do a countdown approach as well. Fire this query after 10 seconds. So you will just put the countdown over there, so it will start counting down 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, and then it will fire whatever task you assign to it. This comes in pretty useful when you have to do jobs like cleanup and stuff. So for example, I have fired one task. I know after this task has been fired, 90 seconds after this task, I want to fire another task. So countdown approach really helps you do that. We're going to talk about that when we move to the workflow section. Task management is a bit tedious in terms of understanding what kind of tasks do you have and how do you want to manage that. For example, you want to priority your execution. Say, I have two machines. One machine connects to my primary database, which is pretty scalable and pretty fast. One machine connects to my secondary database, which is not that good. And I know that primary database will always give me faster results so I can fire more number of queries on my primary database rather than my secondary database. It's just an example. So you can have those kind of priority executions using Celery. All you need to do is define two queues, probably one fast queue and one slow queue. And when you push tasks, all you need to do is push the faster task on whatever queue you want to have and the slower task on the slower queue. The workers will read one of those queues and pick up the task. You can have a worker that can read both the queues. You can have a worker that reads only one of the queues. So that's flexible in that manner. Based on OS, so say you are a multi-OS environment, you have CentOS machines, you have Ubuntu machines, you have Windows machines as well, and you have workers aligned to all of them. Now, syntactically, your operations might be different. Say, for example, you just want to fire a query on database. It would be different on a Windows box and it would be different on a Linux box, right? So I can have both the workers running, I can have both the queues, and then Celery will help me take care of that. How I do it is I have one queue that is for Windows box and I have one queue that is for Linux boxes, and when I fire my jobs, I will make sure what message goes and what queues, and the worker will take care of that. So in that sense, I'm still managing my different environments with the same system that I have. Based on hardware capabilities, we've already discussed this. You have one machine that can process large amount of data. You have one smaller machine which cannot process that amount of data. So it's basically how you configure your workers. One can be running on concurrency of four, one can be running on concurrency of five. So that's basically how we define all the stuff. Conflict management, suppose I unleash 10 messages on my messaging queue and all 10 of them need to be processed, right? So now there are five workers, five of them will pick one month tasks. Now there might be a conflict. What happens to the sixth task? Who picks them? Whether my task gets picked again by some other message or not. So all those things you can configure in Celery and say that you need to process a message only once. You need to execute a task only once. If not, one of the solutions that I use is put a lock on the task using Redis. So what I do is whenever I pick up a task, I put a lock that I'm executing this task. So no one else, no other worker will be able to pick that task up for me. Once I'm done with that task, I will release the lock and then anybody else can pick that up as well. Exceptional handling is pretty straightforward. So say you are connecting to a third party API and now that's giving you a timeout. What will happen is your systems will then start to fail and you want to put a retry mechanism over here. So what Celery lets you do is it lets you retry a particular task immediately. It even lets you retry a particular task at a given amount of time and even lets you retry it for a specific number of times. So for example, I would say if it fails for five times with an interval of five minutes, then only send me an alert. That means there is a failure which is not intermittent. So I would like to dig down into that kind of stuff. So these kinds of retries and exception handling is available in Celery. You can explicitly expire your task. Say, I'm firing a task, but I want you to expire after 20 seconds. If you take longer, expire for me. Don't wait for me to come and kill you. So Celery will let you do all these expirations and all that stuff. So you have time limits which are soft and hard. Soft is wait for the task to finish and then expire. Hard is just expire irrespective of the state of the task. You can also set that at the worker layer. So for example, I have a worker that is processing a lot of tasks. Now my worker goes down. What happens to the messages that the worker was handling? So Celery lets you do that with something known as acknowledgment later is equal to true. So in my task, if I set acknowledgment later is equal to true, that means whenever my task will finish, I will send an acknowledgment to the message queue that this task has been taken care of. If that task is not completed and my machine goes down, all those messages will get replayed via Celery. That means something went wrong with the worker machine. They were not acknowledged. Hence, they were not processed. So Celery will let you do that via configuration change itself. When we say task, I want to send tasks that is what we said when we did add or delay. When a task is processing, I want to see the various states of the task. I want to perform certain actions. For example, I want to see whether the task was received or not. I want to see whether the task is running or not. I probably want to revoke that task. I want to pause it or I want to permanently kill it. So these are few attributes with a task that one should take care of in mind. Your system should allow you to do that. You should even have controls like pause. For example, you're firing some stuff and you want to pause that for a moment. I don't want to do that. Probably I know that something has went wrong with one of my layers, so I want to pause this set of tasks and continue later. I want to kill them for now or I permanently want to delete them from my queue. So Celery has something known as purging. So say you have one million messages on your queue that need to still get processed, but you know that this task no longer needs to be done. What you simply do is do a Celery purge. It will purge all your tasks from your message queue and make it empty so that they don't get processed. So you really do not need to restart your queues to make the queues empty. You can simply do a purge that will help you do that. Now comes worker management. So each task flows to a particular worker. A worker has multiple processes running it where it will process them. Now, when we say workers, I want to start a worker, I want to stop a worker. That's simple. I want to do a warm shutdown. That means when I say shutdown, finish all the tasks and then go down or I want to do a cold shutdown, just shut it down right now. I don't care what state you are in. So in case you don't have acknowledgement true, then your message will be lost. If you have acknowledgement true, then your message will be there. If a task fails, I want to have the trace back with me. Why did the task fail? So Celery as a system lets you do that. It lets you import the trace backs, whatever happened to your task, what line did it fail and how it failed. Heartbeat, which workers are online, which workers are offline. So Celery maintains the heartbeat mode. Where in I know this worker is up or this worker is now. That is very important in a distributed task processing system. Say you have 10 workers and you want to make sure all of them are up or one of them is failing so that you can assign the proper tasks to them. In case you don't have an online offline model, then it becomes very challenging for you. Worker inspection, Celery lets you broadcast message to all the workers for its own monitoring and status health purposes. For example, I would simply do Celery inspect worker one. It will go to the worker, tell the worker, give me all your stats, I need to collect it. It will get me all the stats and then send it back to the command center. Why it helps is it helps me broadcast messages to a lot of workers. So even if I want to kill a particular task, this broadcast message will let me do it. So worker inspection comes into play when you have to see what number of tasks are being performed on my worker, how much time is each task taking on my worker and stuff like that. Autoscale up and autoscale down. So for example, currently I'm running with a concurrency of two because I don't have much message on my message queue, but now there are a lot of messages coming on my queue. So you can configure Celery to reach at its maximum level. So how Celery does it, it says I will only run at a concurrency level which is maximum for me as the number of cores you have in your system. So if you have eight cores and you're running at a concurrency of two in the autoscale mode, it will be able to run at the concurrency level of eight and then come down back to two again after it sees that the throttling is not there and the peak traffic has gone down. It will not let you scale across machines so it's not that you can auto deploy a machine of Celery and start running over it. Celery does not come out of the box with that but you can have solutions to that as well. Assigning a new worker machine is very easy in Celery. All you need to do is just start a Celery de-instance like I did on my machine. So say you have a message queue with four workers, they are processing a lot of tasks and you suddenly need to spawn four new machines. All you need to do is do a Celery de-instance start on all those four machines. Celery will take care of reading the messages from the stream and allocating the tasks to them. Celery does not ship your code to the machine. So if you have four different machines, it will not take care of shipping your task.py file. You have to make sure before the workers come up, your task.py file is already on the machine. If it's not there, it will start giving you errors. Also, if you change something in the task.py file, probably you add a new task or make some changes, you have to restart your workers. Otherwise, it will not get infected. So you'll have to probably do worker by worker restarts or however you want to do it. That totally depends on what kind of changes you are making. Then is the admin and reporting. I fired a task from console but that does not stop me from going to admin section and scheduling a few tasks. I'm sorry about the font that might not be clear to people sitting at the back. So what it's basically doing is, I give a name to my task and then I say, okay, this is the task that you need to run. Run it at a particular frequency or run it immediately and these are the arguments to my particular task. So that helps you do scheduled jobs from an admin perspective. Somebody, so you've already written your task.py file like an ad function you've already written. Now you just want to control it from the admin section. You need not write Python code for that. All you need to do is select your task from here, say ad, give it arguments, specify the time when you want to run and it will run it for you. In a full production system, you want to see how it's running. So this is a utility known as Flare that comes with Cilry. You have to do a pip install Flare for that. What it will give you a list of workers that are up and running right now, the number of tasks that they have completed, the number of tasks that they are currently doing. You can have more details on that in terms of at a worker level, what are the tasks that are getting executed, what are the arguments, and what is the result. Also, what it lets you do is historical trending of how my tasks have performed in the past and how much of them are going slow, how much of them are going fast. Coming to workflows. Now, all the tasks that we've talked about like ad function or test, they are like one task and you're just simply adding them. When you're designing workflows, there comes a situation when you want to do one task, after that task you want to do another task, probably do two tasks together and then do a third task, do four tasks together and stuff like that. So Cilry comes with what is known as Canvas that lets you design your own workflows. So for example, these are the five kinds of workflows that you can do. One is Chains. So what will Chain do is it will link tasks together. So what that means is add four and four, after that add five to it. So that means until the output of first becomes the input of another. So add takes two argument. Over here in the first add.s, I'm doing two arguments, four comma four. But in the second part, I'm just doing five. So that means it's going to take input for four comma four, which is eight, and then add it to five. So the output of this would be 13. So this helps you chain your various tasks. You might come to a situation where you want to say, I don't want to do a chaining. These are my four tasks, just fire them in parallel. So all you do is add s, two comma two, and add s, four comma four. It will fire them in parallel. It will not wait for the first one to finish or the second one to finish. You can do a chord. So chord is nothing. It's basically doing a set of tasks, and after the set of tasks has been done, do another function for me. So it's more of a header and photo kind of thing. So you have five tasks. One of those five tasks have done executing. Execute the six tasks for me. So it's more of group and change together that forms a chord for you. Chunks is very interesting because it helps you divide your jobs into number of smaller instances. So for example, recently I was working on an exercise in my office. What we had to do was we had to process four million documents. But four million documents could not be processed together. That's obvious. So the system that we had to process on had a limitation that it could process only six documents together. Now I wanted a system where I can release all the four million documents on my message queue, but execute them six at a time. So chunks is what it lets you do. So first argument is, in chunks, basically a tuple. So for example, you will release all the IDs for four million. Say if you're processing four million users, so you will release four million user IDs. And then after the comma, you process them with the concurrency level of, for example, in this example, it's 10. You can say, in my case, it was six. So what it will do is it will pick up six user IDs, process them. Whenever one of them gets finished, pick up another one. So it will not wait for all six of them to finish. But as in when one finishes, it will make sure, at a given point of time, only six of them are getting fired. So that is what chunks lets you do it. One of the examples that I had pasted on the front slide was task trees. So cilry by default does not come with something known as task tree. You have to add a module, which is known as cilry task tree, that lets you do a task in this manner. So you define a tree. In the tree, you say, first perform task A. When their task A is done, I want to do task B. Two instances of task B. When both of them are done, I want to do task C, task C, task C. So this is a simple task tree that you can define. So cilry task tree is what comes as a savior for you. Tools available in the market? There's a new tool that is known as job task stick. It is majorly for web applications that require user action to be performed asynchronously. It is built over cilry itself, but gives a lot of other functions that help you manage user-facing asynchronous jobs. So job task is something that, if you're a web developer and looking for an asynchronous thing, you should definitely go and look. Major thing that is missing in cilry right now is a DAG kind of workflow. I cannot define a workflow in DAG. So if you're specifically looking for something in DAG definition, then you should go and look at DAGO bar. I haven't used it in production. I've just done a simple Hello World program for that, for at least that purpose. It works fine. If you're into Hadoop and writing a lot of MapReduce programs and you need a framework like cilry, Lugie is the one. It was open-sourced by Spotify a couple of years back. It is a pretty decent tool. It lets you do MapReduce jobs and define task workflows over that. So do give it a try. First three were from the Python world. The other three are majorly on the Hadoop world, but they are majorly on the Java side. But yes, they perform similar functions, not in depth as what cilry performs, but somewhat helps you define the workflows in that. That's all for today. Thank you, and open for questions. Yeah, thank you very much for the presentation. If you have any questions, please come up to the microphones. Go ahead. Yeah, hi, thanks for the talk. Is it possible for the publisher of a task to update the task later? And not just workflow updates like suspension, but that actually the parameters would be updated? And conversely, is it possible that if the task, while processing, finds something out, gives status updates to the publisher without actually finishing? OK. So the question is basically playing on the intermediate state of the task. Probably when I publish a task, I want to kill that task at the running state. Or if it finds some anomaly or a business logic, then you need to perform another set of tasks. So how I would do in terms of cilry is, I can call a task within a task, right? So for example, say I'm running a task that defines the second part of your question, where you want to perform something else while you're performing a main logic. So I'm adding two functions. But if I say, if your sum comes to be 10, then perform this particular function. So from there, I can redirect my control from the main task to the sub-task. While executing the function, there is nothing that I can do based on the ID and kill that task at that particular instance of time. So that's pretty much it. Thank you for the talk. I've seen in your slide that there's a web interface for adding tasks and that they can have a schedule and repeat. Do you think what are the pros and cons of using this instead of a cron tab, for example? So the question is more on using the admin part of it while scheduling jobs. And why not use crons for that? So crons are good up to some extent. But as I mentioned, when you have to define complex workflows, cron will not suffice for you for that. For that purpose, you have to use systems like cilry for that. And crons, necessary, will, at a given point of time, not help you scale in a lot of parallel processing that you need to do. Given on pros and cons of using admin interface, so I might not be using admin interface to submit new jobs, but I'm surely using admin interface to control my jobs. For example, I want to pause a certain job at the moment, so I would go to my admin panel and pause the particular job or say, I don't want to execute you for the next five days because I'm running into some problem. That is where the admin panel helps me a lot, rather than going on the console and doing that stuff. Yeah. You've talked about tasks that act late. So let's say we have such a task and it's in the middle of processing and the worker dies. When will the task be recued? Yeah. So as I mentioned, so his question is when does the task get recued if while processing the task the worker dies? So now there are two sides to this question. One is handling retries. So you explicitly have exceptions that you know can cause your task to fail. If you've taken care of that, then you have to, then all you need to do is just add a retry mechanism in that particular exception block. Scenarios where you do not know when it's going to fail, what will happen if the worker dies and that is not supposed to happen, you obviously have to do a manual digging and figure out why the worker died. In case you do not explicitly mentioned that acknowledgment should be true, in that case, you will lose your message. If you define that acknowledgment has to be true, then worker will always tell the messing layer that I have processed this message. Or if you want to intermediate logic that I was telling about, so whenever a task is being performed, I keep a tab in my radius that this task has been performed and whether it has finished or not. So if it has not finished, I know something went wrong. But obviously if your worker dies, then you have to go to your system and see what happened. If your worker dies, no new message will come to your worker, but yes, you'll have to see what happened to all the messages that they were on your system. So if you're aware of these situations, what exceptions can come, it's better to have exception handling and put retries manually. For scenarios where you do not know what's going to happen, so you write your own there. Your slides mentioned that you can kill a task. How does that actually work? And can you ask the task to gracefully clean up the resources it was using and things like that? Yes, so basically each task has an ID. And when I'm firing that ID, so I usually do it by the admin panel, I go there, select, and kill my particular task. So each ID is assigned. Each message is assigned with an ID in Cilry. So each ID, what will it do is you have one ID and you can always track what is happening with that particular ID. Now, once you delete that particular task, deletion means removing it from the system itself. It will never get executed again. Killing means for that moment I'm killing it, but whenever it is scheduled to happen next, it will then happen again. So you have to define that whether you want to delete your task permanently or you just want to kill it for the time being. So a task that is running and I kill it, what actually happens is the process killed or? So it's not the process killed. The process will only get killed if you kill the worker. So a worker will have, say, n number of processes. If you kill the worker, all your processes get killed. If you kill a particular task, only that task basically gets killed. The other tasks will still continue to happen on your worker machine. And how is the killed task terminated? So it's basically it gets float on the message queue and then Cilry takes care of it that I don't want to execute this message anymore. Probably you might run into a situation which I am not aware of that where I kill this particular task when I'm done execution of the whole scenario. So for example, say your task takes 20 seconds to get executed. You say that, okay, let it finish and then do not re-execute that again. So I have not played with that kind of task but surely that there should be an option to do that. I think the question also is, what about tasks that have already been started? Sorry? So can you kill tasks that already have been started? I think that's the question. Yeah, so his question was more on what happens when I kill a task and how do I restart it? If the task is scheduled, then it's easy to kill that. I can do that. But if it's scheduled and it's already running. Yeah, so that is, so my answer to that is there can be two ways to do that. One is wait for the task to finish and then kill it. Intermediate task, killing, I'm not very sure how Cilry would handle that and what would the state of your, so say, take an example when you're doing a SCP of your file, right? So that really leaves you in the middle of the problem that what has been transferred and what has not been transferred because you cannot remove the copy from your server, right? So it's gonna be an intermediate state and that is where you have to take a call that how do you perform these kind of actions, whether you kill it right there and then or you wait for the copy to finish and then kill it. Hello, thank you for the talk. I just a clarification is related to the earlier question from on the other side. You're using RabbitMQ as your broker, yeah? So if you pick up a message and you don't hack it and then if your client dies on RabbitMQ, that message will be re-queued. I don't understand why you're using Redis as a locking mechanism so no one else picks it up. So yes, so I was using Redis as a locking mechanism because I wanted to make sure that while I'm executing that task, no one else would do it because it is still on that layer. Now, because so how RabbitMQ by default works is if anybody picks up the message, it will not be there on the queue itself. So no one else will get to understand why the ID is over there. No, if it's on a queue and something picks up from the RabbitMQ broker and doesn't hack it, that message is sort of like, okay, I've sent it until I get the hack, it's in a state. Acknowledgement is not by default in Cilri, right? Acknowledgement, you have to explicitly state that I want to make this worker a task as a knowledge task, right? So earlier when we were using Cilri, that acknowledgement feature was not there and so I was not doing that acknowledgement is equal to true. So I will not wait for every task to finish and then replay the message again back in the queue. What you are saying is makes sense that in that scenario, what you'll do is I will wait for every task and the acknowledgement and then the other tasks will get picked up. But in normal scenarios where you are okay with few failures of a task, you would say let it flow and acknowledge if it gets into the exception block, I will retry it. If it gets fails, let it drop by. I think the mechanism is the same. If your task dies, if the worker blows up, that's basically it's a disconnect without an acknowledgement, a task is recued. If I'm explicitly stating acknowledgement is true. Okay, all right, thank you. Okay, we have time for two more questions, so. Hi, for example, I have a task that does synchronization of the content and this task spawns another task that does the publish. How can I be sure that the publish task is picked up by the same worker as the synchronization was done? Otherwise, maybe it will be picked up by other worker and it will be some mess because publish can be done before the sync is done. Right, so that usually tends to be a problem. When you're doing a lot of tasks, you want to make sure the same worker should pick up the same task because probably you have intermediate data or whatever stuff you have. And when you have multiple machines, that becomes more of the problem because your data is local on that particular machine. Right, usually in Cilry, when you're firing the query, once a worker machine picks up that task, the whole set of tasks will be performed by the worker machine. It is not that I have four tasks to be done. In a chain, I have four tasks. One worker will pick up one task, the second task gets picked up by some other worker. So Cilry will take care of you in that sense that the complete chain has to be performed by one set of worker and not by multiple workers in that particular sense. So your data, if you are generating some intermediate temporary files and reading from that intermediate temporary file on the other task, that will be taken care of. But otherwise, for that scenario, if you're doing that kind of stuff, you have to have a common caching layer or stuff which everybody can read then. So for example, you say that each task returns a result, I will publish back that result onto some message queue and whenever other task picks up the same task, it has to read that message queue. In that scenario, you have to bring a common layer across all your workers for you then. You mentioned RabbitMQ. Do you have any experience with Redis Broker or other brokers? Yes, so Cilry is not limited to just one broker. You can have your own choice of brokers. So you can use Redis Broker as well for whatever I'm using RabbitMQ for. So you can have your Redis running there and you can have architecture. Cilry also does not limit you with one broker. So you can have multiple brokers as well. So you can have a mixture of RabbitMQ, Redis, and other stuff to do that stuff for you. What was your highest throughput through a salary system? Like normally this task queuing looks like, OK, send out emails is quite a few tasks with a long time running. So majorly, what are the challenge and task for us was a long running job that I was mentioning about. So 4 million users that we need to process. So that job ran for about, say, three days. So we had to make sure that the Cilry workers are processing correctly and they're not dying because if they died, we were in an intermediate state, what happened? What happened wrong? So there was not high throughput involved, but a long running task involved in that scenario. The other scenarios that we've seen is I want to make sure that all my tasks run well in the same amount of time. So again, if you're talking about throughput in terms of number of events per second, I would have a number of that because we are not reaching that scale in the current scenario. All these Cilry production uses that we have is more on long running tasks or tasks that are very complex in terms of data flows. But for events per second, I don't have a number, but yes, there are some benchmarking available that prove that. Thanks for the presentation. Thanks for the questions.