 I joined Catalysts a year ago, and this is my first time publicly speaking to a big auditorium, so bear with me. I've also forgot to bring my teleprompter in, so I'm just going to read from the papers. Okay, let's start. How many of you know what the cron job is? Oh, great. Okay, for everyone that doesn't know what a cron job is, in a nutshell, a cron job is a command to an operating system or server for a job that is to be executed at a specified time. In other words, a system to schedule the execution of various scripts. Just about every framework or system has a main cron script. Moodle does too. How many of you know what Moodle cron does? Okay, great. Let's have a closer look. Cron makes Moodle fun. Cron is Moodle's heartbeat and bio rhythm. It generates reports, sends forum messages, and just about everything you can think of that happens under the hood without your direct interaction. Thanks, cron. But look, oh, he doesn't look very happy down there. Does he? No, he does not. I was given the task to try to make the poor guy's life better. And to do this, I first had to find out what's wrong. I began talking to our team to try to collect more information about the symptoms as possible. I then looked at the Moodle tracker to see what other people were experiencing. And finally I started looking at the code to see what causes those symptoms. This is what I found. As seen in the previous slide, Moodle cron is a one-man show. He does one task after the other in a serial way, meaning all tasks wait for the one in front of it to finish before it begins. A way to better visualize this is through a metaphorical real life example. Let's imagine that instead of double-decker buses, we would use a one-person bus that would pick up people every minute or so. Inevitably huge waiting lines would form at the bus stop, and the people would experience delays. This is generally okay for systems that don't require real-time results, and the time aspect of a task execution is relevant. But this is not the case here. We want those foreign emails to be sent as soon as possible, instantly quite. We need those reports and those grades done fast. Nobody likes waiting, especially the young people. Unfortunately, there's more. Moodle cron relies on starting multiple instances of itself one every minute to handle the amount of work, to prevent it from running the same thing multiple times a locking mechanism was implemented. Meet the lock API. The locking system to code the official documentation is not meant to be fast, it's meant to be always correct, and hence it will never be super quick. So apart from dealing with one thing at a time, cron also locks and unlocks each task, plus locks and unlocks itself each time it does this, trying to ensure consistency. Just imagine the time lost just fidgeting with the keys. But that's not the only issue here. Now imagine two instances of cron fighting for the right to use the keys. One usually wins, and the other one waits for the first one to finish. But waiting is a limited process. If the time runs out, an error is thrown and the task is blocked. Sometimes, for some reason, usually something bad happens in the task, an instance of cron forgets to unlock one of its locks, in other words, fails to release a lock. Guess what happens? An error is thrown and the task is blocked. And these are two cases that are very frequent. This double locking and unlocking system is slow, plus isn't very efficient after all moving on. As said before, model cron relies on starting multiple instances of itself one every minute to handle the amount of work. If you were to start two instances of cron at the same time, they would have the same amount of work they would see and receive the same task to execute. Yes, they could not work on the same task because of the locking mechanism. But the whole situation would be an errorful mess. There's no clean way of dividing work between two or more instances. Plus, some of the scheduled tasks handle a huge chunk of data each time and as the number of users and courses increases, those chunks of data increase exponentially. So the more data a task handles, the more time and resources it needs to finish and the more time spent on a task, the bigger the delay on the next task is and so on. Currently, there's no easy way of dividing workload hence no provisions for sharding. So this raises a big question. If sharding is not an option, how do we scale? What happens when the number of users increases when we have thousands and tens of thousands of courses, activities, forum posts and so on? What happens when there are multiple plugins with Chrome tasks to be run as well? What happens when there's a busy period and a huge amount of ad hoc tasks are created? Will Chrome fail or become so clogged up that nothing will be done without delay? The current way Chrome is built will have serious difficulties under load. But don't worry, there's still hope. So we set out to fix all of the above and manage to do so. The solution that best fit our requirements was the implementation of a queue and the use of a queue management system. In simple terms, a queue is a waiting line and a queue management system is a process that dictates how that waiting line is dealt with. We have so far identified two major elements of our solution. The queue, a list of queue items and the manager, the system that deals with the queue. But in order to work on the queue items, the manager, like any real-life business manager, had to hire workers. Workers are entities that following a certain job instruction execute a task in a specified environment and in a particular way. One way of better visualizing this would be to imagine the whole solution as a parcel distribution company. The manager hires drivers to deliver the parcels. Each parcel requires a specific type of car for the delivery. We have large items, small items, et cetera, and different transportation conditions because we have fragile items, extra-long items, volatile items, and so on. With this picture in mind, let's look at how this solves the crown's problem. Knowing that in the future there will be a huge load of parcels to distribute, the serial execution would bring the debt of our business. The alternative is parallel execution. Instead of one worker delivering one parcel after the other, multiple workers deliver multiple parcels at the same time. Serial execution equals one way street. Parallel execution equals multi-lane highway. Which way would you go? But to do this, we needed something more. Having all the parcels in the room and making the workers lock the door behind them and lock the parcel in the box to mark it as due to be delivered by them made no sense. When adding an item to a better way had to be used unique single and three skew items. When adding an item to the system, each parcel was given a unique identity code. The items code would then be kept in the records until it gets delivered. So a new item of that exact type cannot be added to the system until the previous one is delivered. After the registration, the item is then placed on a conveyor belt and delivered to a loading dock. The first worker to get to the loading dock picks the first item, the second worker, the second item, and so on. So each item could only be picked up once by one worker. Having a system like described before can be very useful and can be extended even further. Imagine a client wants a dedicated line of parcels delivery so only their items get on the conveyor belt and get delivered as they arrive. This means an even shorter waiting line. What if you would want to have a crown that only delivers forum emails? Now you can. A crown dedicated to generating reports? Sure. No more waiting in line for the reports starts to finish before working on the grades. But wait, there's more. Busy period approaching? No problem. Hire more workers. Decide how many they should be and with a simple press of a button behold your crown increases in size. Or you just want the task to be picked up faster? No problem. Decrease the waiting time between the worker's visit to the loading dock, aka the queue cycle. Busy period over? Or you just... There's not so much work to be done? No problem. Send some workers on leave or increase the time between the queue cycles. All this is done from the user interface without stopping the queue or changing something on the server. I'm sure that by now you realize how all of this benefits the end user. Here is just the big one. Real-time results. Just try to imagine this in all the specific areas of Moodle that usually take time. Happy students equals happy teachers equals happy Moodle community. But wait, there's more. Having the freedom to innovate, we found more ways to improve the system so we created a bunch of new features. Moodle has a lot of nice modules and components, but not all of them have the same importance during certain periods. Sometimes we would need certain tasks more than others or we would like some tasks to be executed before others. In other words, we need to prioritize the way things are done and these priorities change over time. Like I mentioned before, all tasks are different and sometimes they become more important and thus require more attention. We made it possible for you to customize the way each task is handled by setting their priority, changing the amount of information collected about each execution and what happens next, setting the maximum number of fail attempts so it doesn't... to prevent it from causing issues to the system and also choosing the environment they are running. Each task is executed in an isolated environment, aka container. Each container can have specific configuration like different database connections, global variables, etc. This allows us to fine-tune the way each task is executed, which ultimately increases the performance of the site. We also made it possible for an easy integration of our advanced queuing solutions like RebitMQ, Redis, or Amazon SQS by following their standards and making sure that this does not alter the queue manager's workflow. While designing the system, it becomes clear to us that the queue can be used for other things, not just for the Chrome. It offers a new way of handling the work in real-time, faster than it was previously possible. It offers a chance to switch from a synchronous to an asynchronous way of doing things. Think of all the functionalities that take a lot of time to finish and you can't do anything else but wait for it to finish. Think of the backup and restore process. Instead of waiting in line, waiting in front of a loading page while the server finishes your restore, why not use that time to do something else like finishing an assignment or creating a quiz for your students or even start multiple restores. Imagine a moodle where you start the process like backup and restore and then be able to continue to working on anything else and after a short while you get a notification that lets you know that your restore has finished or that your report has been generated. With the queue, this is now closer to becoming a reality. Thank you. Questions? I wanted to share a question. Where can we get this plug-in? I didn't quite get it. Where can we get this plug-in? Currently it's not released, we're still testing it and making sure that it runs as we expected. It will be released soon, not entirely sure this month or next month and then it's easily available. We will also have a small demonstration at our stand to actually see it in action if anybody's interested. From my understanding, a use case could be to make, say, backups on specific timing. Even with different queues, I cannot see the elimination of locking mechanisms if they refer, for example, to very similar records. So imagine you have a big load of courses and you want to make backups as soon as possible and you run them concurrently. You still do not eliminate the error of the locking mechanisms. Well, basically we have the unique queue items that have a unique identity code. So it's only once, and only one queue item of that type can be in the queue and be started and run and worked on. So basically we've eliminated the possibility of the same task to be run at the same time we started at a different stage to have two of them running in parallel ish. So basically that handles the locking problem because it's based on what you're currently doing. Just one question. You mentioned using containers to spin up each of these different workers. Does it work without containers in the sense of if you've got a more traditional Moodle server with, you know, actually, can you do it? Well, does this plug-in work with that? Well, I use the term container just to make it more clear that the task is run in an isolated environment. They're not exactly containers like you have with Docker or something like that. It's just the script itself, the task execution is isolated. So any configuration that we have, we can specify to that particular task and all the other ones don't get affected by any changes to one task. Okay, so you could still run it? Yes. Yes. Multiple instances of cron calls on a single server. Yeah. Yeah. For this plug-in to be fully functional, does it require you to use the Redis or another queue system or MQ or you can rely only on... No. We have a built-in queue solution that it's basically using the database to store queue items and pick them up from there and so on. Was there a good reason for doing this as a plug-in? It strikes me that a lot of these improvements would be better in the Moodlecore cron system so everyone just benefits. Well, basically we went with the idea that this is optional. Not everybody would require this solution and maybe not everybody has so many users who require such an advanced or more performant solution. We started as a plug-in. We might and we do consider pushing it to core but that's at a later stage. Thank you very much. Okay. Thank you.