 Hello everybody. Now we will see a presentation about Selenon Distributed Computing with Python. So thank you. My name is Fridolin Pokornin. And as said, this presentation will be about Distributed Computing with Python. And let's have a look at the agenda. So first I will talk about salary. Salary is quite popular project. If you don't know it, I will discuss some basics. Then I will show some pitfalls you can find with using salary. And basically why Selenon was developed. I will show you the key idea behind Selenon and also some experiences with Selenon I've got. So salary is quite popular project. You can find it on salaryproject.org. It was written by AskSolm. And he's the current maintainer as well. Salary is a distributed task queue. So you can run your task written in Python distributively. And it's quite widely used. For example, it's used in Django as Django salary for background tasks. So how does salary work? As in any distributed system, we have some kind of message that describes what should be compute and the arguments. We queue this message into a message broker. Oops, sorry. We queue it, message broker. And there are connected some workers to this message broker. Every worker listens on some particular queue. And once there's available worker and there's message queued, some worker picks that message and starts to execute. As we are in distributed system, there can be multiple messages queued. So when there's available worker, it starts to processing it. There's also result backend. This result backend is used for storing results of tasks and also for storing state of tasks. So once a message is processed, there's written result to result backend. And the worker is available again. The same applies for the worker one. So this is the key idea behind salary. Now let's discuss some flows. In the real world, you have some tasks that have time or data dependencies between them. So let's model something like that. We have six tasks and there are dependencies between these tasks. And these dependencies are as shown in the picture. So with salary, there are some salary primitives that allows you to group tasks into some primitives like group where you can execute tasks in parallel or chain where we are executing one task after another. So in our flow, we have these dependencies. Let's say that these tasks have various execution time that means there needs to be, for example, one minute to process that task. If we use salary primitives, we group it into a code and we can run task one to task four in parallel and task five and task six in parallel. As you can see, the first code takes 30 minutes to process and the second, it takes 10 minutes. The result of task five and task six is available after 40 minutes. And if we can do better, we can, by removing these codes, we can find our result after 31 minutes in case of task six and in case of task five, we can find it after 40 minutes. So we are no longer blocked by tasks that take a lot of time. Also, when you use salary, there are some downsides like you are hard coding your task logic and dependencies on this task into your source code. So adding new tasks, it's complex. You can, a lot of times, you just have to reorganize your task and dependencies between them. And what about task failures? If a task fails, we want to recover somehow from this failure. We also want to reuse these tasks. And we don't want to be blocked by using one storage at a time. So that's how Selenon was introduced. Selenon means salary in Greek. Greek is quite popular with naming distributed systems such as Kubernetes, so I picked Selenon. And what Selenon offers you, it offers you to separate flow logic from your task logic. And that means you implement a task and then you provide a configuration, a simple YAML file, which states how this task should be grouped. It also allows to model dependencies between these tasks. It also allows you to use different storages and also to do something like recovering from failures. So let's take a look. Here's an example of Selenon task. You just import Selenon task. You derive from it and you provide a method that is called run and that's the input of the task. Then you return your result. That's all. Then you provide the YAML configuration file. Let's say we have three tasks. And in this YAML configuration file, we state, hey, we have this task. So you provide a list of tasks. You provide an import where this task should be, where the implementation of task sits. You provide a name of the task. And you also provide optionally queue where this message should be queued. Then you provide flow definitions. So you name your flow and you give edges. So basically a flow is described by edges. In example, there's started task one. We don't have source. So the task one is run initially. And then after the task one is done, we run task two and task three. You might be wondering what are these octagons on the picture. These octagons are conditions. So what Selenon allows you, it allows you to state conditions and execute task conditionally. So, for example, let's say we want to run task two and task three after task one. And we give condition. Like, hey, I want to run if field is equal. That means the result of task one has key proceed. And this key has value yes. Or if there are some environment variable. And this environment variable is named testing. So by providing this condition, we have conditional execution. As you can see, you might be wondering, okay, we provided a condition and we are inspecting the result of task. But we didn't provide a way how to say, hey, I want to store some data base. Selenon offers you this as way. What do you need to do? You just need to provide a definition of database adapter. So, for example, if you want to use Redis, you just provide, hey, how to connect to a database, how to retrieve a result, or how to store a result from database. Then you provide this definition into your YAML configuration file, where you state where the implementation of database adapter sits. And what is the name? Optionally, you provide a configuration. And then you assign these storages or databases to your tasks. And as visualized, these results are stored in Postgres already in this example. Now, we want to have some granted control in our flow. So we want to have something that will give us a way how to recover from failures. And this can be done using fallback task or fallback flows. The configuration is pretty straightforward. So in your flow definition, you provide failures. And you state, hey, if nodes task 2 and task 3 at the same time fail, I want to run fallback 1. Another failure could be that only task 2 failed. And in that case, we want to run fallback 2. If we visualize it, we can see that if task 2 and task 3 failed, we run fallback 1. In case of task 2 failure, we run fallback 2. Now, you want to, in many cases, you want to reuse this. So you want to run some flow from another flow. Selenon offers you subflows, and it's pretty straightforward. As flow is another node in your dependency graph, you can directly state your flow in your YAML configuration file like this. So after, in this particular scenario, after the init task is done, we can run flow 1. After that, we run flow 2. And now you might be wondering how does Selenon work? The key idea behind Selenon is a special task that is called dispatcher task. And this task basically is scheduled on some nodes, and it periodically checks the state of the flow. So we have a dispatcher that is dedicated for each flow, and it checks the state of the flow. It checks which task failed, which task succeeded, and it schedules a new task if needed. You just provide YAML configuration file, and this YAML configuration file is automatically parsed. There are also some additional checks that your YAML configuration file is correct. There's also a way how to visualize flows, so you can just provide your YAML configuration file, and Selenon will give you these graphs as shown on pictures. There are also other features like cacheys. If you don't want to receive a result of a task every time, you can use Selenon cacheys. Also, there is a way how to use cache for retrieving the status of tasks. You can also do task or flow throttling. You can say, for example, hey, I want to run these tasks no more than two times per minute. Selenon allows you to do that. You can do also task prioritization, like, hey, I put a lot of workers that listen on some queue, and this way you can prioritize tasks. There's also a way how to optimize dispatcher scheduling, so you can schedule dispatcher, for example, once after two seconds, or you can do it after five minutes. There is also a trace point mechanism where dispatcher will tell you what's going on in the system, so you can, for example, inspect what's going on in the system, and you can do it by inspecting JSONs. They provide some unique keys such as dispatcher ID or task that will run, and you can listen on some particular event such as scheduling new flow, scheduling new tasks, or retrieving results, or failures. So, to summarize it, Selenon is built on top of Celery. It uses Celery to communicate with broker. It provides you a way how to easily define your tasks and flows just by providing YAML configuration file. It allows you to separate tasks logic, so you have Python code, and then you have logic of storing tasks, and it allows you to conditionally execute some task, group tasks into flows, and it offers you advanced flow handling, so you can recover from flow failures, and you can do system diagnosis by trace points. This is an example, Selenon is currently used at Red Hat, and here is an example of flows that we have. As you can see, they can be complex, and they can be nested one inside another, so things may be quite hard to understand at the first point. So that's it from me. You can find Selenon on GitHub. If you have any questions, feel free to ask. So, the question was how to do throttling, like we want to execute a lot of tasks in parallel, but we want to have throttling mechanism available. So there's a trick to do that. You can schedule this picture on one particular node, and this node will take care of how many nodes are being executed. So that's it. Another? So I don't know if I... So the question was regarding fallbacks. So the question was if there are some dependencies between task two and task three failure, and when the fallback is run. So in this particular scenario, Selenon will take care of it. It will wait for all tasks to finish, and after they finish, they will look, hey, I have some failure, and if I can recover from it. So if task two fails and task three is still running, we wait for task three to finish. Any other question? I don't know if I understood it correctly. Can you repeat it like louder? Okay, so the question was if we can add task dynamically, like you have deployed your system, and we want to add task dynamically. No, you cannot do this. It's not possible. Yeah, go on. So the question was about failures, and I didn't hear. So the question was how do we monitor, like if task fails or so. This is reused from Celery. So when a task fails, it basically states the failure state in the result backend, and then you have a mechanism how to ask, what was the state? And based on the state, we proceed with fallback. Yeah, we query it. Any other question? The question was? I don't use it in Django. It's like distributed system where we run open shift, and there are nodes, and these nodes are communicating with Selenon. No, there's no Django integration. Any other question? So, okay, thank you.