 Welcome to my talk, 3,000 Moodles, one million users provisioning all the schools or as the test has to be called now 1.4 million users as of last month First who am I? I'm Markus Samlensky, tech lead for hosting and systems administration at Iledia I have seven years experience in Moodle hosting infrastructure CICD and systems orchestration The situation was as follows Germany is notoriously slow with digitization in the public sector and only big challenges might change that But now we all know The challenge arose So what was the mission? We had to rapidly draft and implement a system to provision Moodles for up to 7,000 schools in one of Germany's biggest state Provision each system automatically by school request and build this front to back in under two months So we had to do some design decisions. The first one is stability was key. There was No chance that the system is there could be no chance for the system to crash on the first day of school when the load peaked therefore we said to us We shall only use proofing technologies nothing too fancy and first and foremost keep it simple This meant for us no microservices no auto-scaling nothing too fancy We did use virtualization for the web servers that that means each school got its own machine But we did use bare metal servers for the DBMS with high priority for IO And the system worked as follows a principle can request a system via a website and Therefore trigger a process which creates these virtual machines via data center API calls And finally the our orchestration tools will then provision these new machines and send the lock-in data to the principle within half an hour of requesting the system Now the bet We have our CEO here on the left saying Let's call him Andre saying nobody's gonna use this Of our other CEO on the right being more bold. Let's call him Ralph. I Bet we have 100 systems by October So how did we do? This is a graph of all the systems provisioned over time. You can see down there. There's a there's a date from 10th July of 2020 and 2023 September in the end the graph goes from on the y-axis from zero to Let's say three thousand Here's our first of October and as you can see this is more about 1800 systems provisioned and not 100 So we were in a way Lucky that our system and our design worked so well for this For this problem because obviously we were very wrong with That's how to how many systems we are going to be provisioned What about the users? I told you about 1.4 million users These are the users per system On a logarithmic scale means you see the the number of the users on the y-axis from one to 100,000 well it should be to the maximum or all the way on the right is actually 20 25,000 users But as you can see most of the systems only have about 10 to 50 to 100 In the middle and then a lot of systems have about Between 500 and a thousand and a small amount have up to 25,000 users You can see in the upper right there the the graph of users in non logarithmic scales pretty much just an exponential graph Yeah, what were the results? We had no performance degradation when the school year started and we had peak load. We were very proud of that No weekend work was needed for our team when we launched very big success What were our learnings? Well No estimate survives the first contact with the customer as you saw we were off by about a factor of 15 to 20 Fighting scope creep was a mandatory skill to achieve in this in this project. We had really really fight Every everything with our customer to see how can we make this work in the timescale we have and And finally for these things we had to change we had to really rely on our keep it simple Standard because By focusing on the things we know and we knew we could achieve in the timescale We could make room for some things that we had to change makes a project more fitting for our customers Finally the green field is nice, but we're using proven tools and pipelines is way more dependable It is very very easy to fall to the idea that on a new project you cannot use all the new and nice and awesome technologies and it was but this is pretty much a trap because if you have such a More timescale to achieve this and to have its have its scale and have it be stable You should only use the technologies you have yourself prove that they are working for any scale So how are we going about this in the future? The system that does have some negative points. It is it might be robust But it's not very scalable as you have seen we had to service with 10 users and we had service with 25,000 users and right now we are only doing horizontal scaling for these machines and Also our naive approach to these virtual machines is very wasteful as you can imagine all these servers that had only 10 to 20 users and They are using the same hardware as all these servers from up to 250 up to 500 users So there are many things that can be improved so that our system uses way less of the hardware Per school and still achieves the same results, but it's very hard to have this In a way I have to set up in a way that is stable without performance degradation when these when these load peaks when the school year starts or when Cruises are happening all all at once But in the end now that there's more time maybe we will go for the Kubernetes Maybe we will do the auto scaling and all the the fanciness where we can use to achieve these performance gains these Scalability gains Yeah, I couldn't go Too much into detail because of the time scale But I hope that we have no room for some questions to go into some more detail Thank you anyone has any questions. We've got a lovely mic runner that can take a microphone straight to Barbara So, how do you manage support queries and those kind of things such a distributed system? You know when something goes wrong What's the what's the part of escalation? How how does your support look for that? Well, there are multiple layers of support These each system as a plug-in installed where they the user of the system can send support requests directly to our partners which who work with the school system of the state if they are For example, how to use the system see these kinds of requests. They don't reach us if there are things that are problems with the system on a server level or on a Yeah, and these are these are then escalated higher up the chain Yeah, just multi-level support Wonderful. Thank you Yeah, I'm interested in how you handle upgrades over so many different VMs. What's your approach to that? Yeah, we are using Ansible SN as our automation tool and we are able to pretty much just Switch the code and do the the upgrades in two nights for all three thousands Systems so we are just doing first all the odd systems and all then all the even systems And it's pretty much just automation tool that starts at the Specific time to change the code and run the upgrade We have the similar situation I'm from Croatia, but we didn't take that approach We have every school thousand and one thousand two hundred school on one system with eight virtual machines Street database servers and load balancing and so on so last year we transferred to open shift Kubernetes and it's much better than for us Where you're able to whether the storm when the school year started and the load peaked many failed when this happened We have this big systems We have no problem with no problems And one question. How did you manage it authentication of the users? Do you have any centralized system or every school for the sale? Right now it's every school for themselves But there are changes to be made that one IDM will be applied to these stores Hi, were you the guys that were in risk of shutdown of the Lannes Regierungs in Germany? That was this start this story where one university built up this infrastructure in very short time but the Government shut them down before that it was not their job to do that and that would be a very typical German stuff Typical Germans. No Okay, thank you Who is responsible for the Moodle administration at your systems? Is it the school or is it some level in between? Right now it is the school principal who is assigned the main admin account So there there were sorts to change the management to the school admin to a manager account, but there were a Lot of problems with restricting these accesses to these Principles so right now they are administrators to the system