 Hello again. Really happy to be here with everyone. Today I'll be talking about cloud native and high performance computing, I think. And the idea Dave was just mentioning earlier about the batch working group that we're forming in the tag runtime of the TOC. One idea here is also to talk about this today a bit. I'll cover these topics. So cloud native, high performance computing, I'm a computing engineer at CERN. I will start with a cool picture that I took recently. So I had the chance to go down to the Atlas detector at CERN. This is the cavern 100 meters in the ground. And I think this picture is quite nice because I like it. And also because in the back, you see the Atlas detector. And this is where we do the collisions. We have this big particle accelerator. We accelerate primes of protons. We make them collide at precise points like this one. And then we generate a lot of data out of these collisions. And close to this detector underground, we also have an online farm that will then take care of filtering the petabytes a second of data that we generate to something we can manage on the order of tens of gigabytes. So in this farm, it's actually a really interesting thing because back when we started doing our Kubernetes journey in 2016, just one year after this team reached out and asked us, can we actually use Kubernetes to modernize our systems as well? So in 2017, we gave it a go. We built a couple of thousand node cluster. We saw a lot of issues trying to do what they wanted. And what they want to do is to have thousands of services running at the same time when they have BIM so that they can do the things I mentioned. But when there is no BIM and we have these periods where we replace the BIM with a new one, they wanted to reuse the farm to do simulation, which is pretty much patchwork loads. So they need to transition from one to the other very fast. They need to schedule pods really fast. At the time we couldn't do this, we reached out to six scalability that we also heard Jasmine talking about earlier. And they actually fixed these issues. And today, we can have a few thousand node clusters. We can have 300 pods a second. We just heard which matches their needs. And actually, the next generation of this deployment will be based on Kubernetes. So I think it was a really nice way to start talking about cloud native and high performance by showing this example. So I'll also start talking about high performance computing with an overview of what it means. And you'll hear different things. But I took this one, which says that high performance computing is most generally refers to the practice of aggregating computing power in a way that delivers higher performance than what you would get with a single desktop computer or a work station. And this is pretty much it. These are the two key points. How you do this can be a bit complicated, but the goal is really to solve large problems in science, engineering, and business. I take a couple of examples here. So on the bottom left, you see weather forecasting. This is a problem that has to deal with really large physics models. And they also have one thing, which is they have to do it quite fast with the deadline. Because if you're predicting the weather for today, but you finish that by tomorrow, it's not so useful any longer. The other area is modeling and simulation. And I take aviation. But you can also think about automotive industry. They do similar things. And then medical research or vaccine development, which is something that is quite recent in everyone's minds as well. They also use a lot of HPC. So the HPC community loves keeping rankings. You will find a lot of them. I took an example here, which is the top 500. You can see the website there. This is the list from November last year. And you can see, for example, Fugaku in Japan, Summit in the US here, as well in the US. And you can see they are pretty large. A single supercomputer like this can have almost 8 million cores. And they measure their performance also in teraflops that you will see in the two columns after. They're pretty old, pretty big. I put a picture here of one that is here in Europe. And it's the other side in Germany. The reason I put it is because one of my colleagues that I've been a cool con earlier with as well, which is Lukas Henry, he just moved to the center and he's very keen on pushing cloud native. And Kubernetes also to transition their deployments there. So I wanted to make a reference to it as well. But coming back to high performance computing, why am I talking about this? Why are we discussing it? And actually, there are things that are quite common. When you think about very large deployments, you think of large clusters. And this is also what we deal with in cloud native deployments. You think about resilience, failover, redundancy. This is all kind of common themes. But there are some that are not so common for most deployments, things like low latency. If you're deploying a lot of different jobs that need to communicate with each other, you need low latency. You're probably talking about deployments within free events. They're kind of special in this respect. Then you're talking about not thousands or tens of thousands of services. You're talking about tens of thousands, hundreds of thousands, millions of jobs being deployed to this cluster. So very, very high throughput needed and scheduling as well. Then if you get your deployment seen and they land on the nodes and you don't get the performance that you need on the nodes themselves, then that's not good. So no more awareness is also very important. Your jobs need to be very close to the CPU and the memory that they need to access and share. And then millions of jobs in your cluster means that also millions of users might or thousands or tens of thousands of users might be deploying very different workloads. Each of them will have a different need in software. So even pushing that software to those nodes is an extreme challenge. And these are things that have been slowly fixed in different projects in cloud native. Finally, I mentioned advanced scheduling. I will talk a bit more in detail about that part as well. Now, if you would be a colleague of mine at CERN, you would immediately say that what we do is not exactly high performance computing for most of it. We do what we call high throughput computing, which is very similar but not quite. The description is usually that it's a computing paradigm that focuses on the efficient execution of a large number of loosely coupled tasks. This last bit is the important one. So we are not talking about tightly coupled workloads any longer that are typically in HPC. We are talking about workloads that don't have this kind of closed dependencies and they are very independent sequential jobs that can be individually scheduled on many different computing resources. What we add here is that we start talking about multiple administrative boundaries, which is not typically in HPC. So we remove one complexity and we add another one, but actually that one is very close to the cloud native definition. If you go to the CNCF website and you check it, you will see that it talks about loosely coupled systems as well. Now, the best example I know of high throughput computing is actually what we do back at CERN. This is a system called the grid that we've been running for almost two decades now. And it's really 200 different sites across the world, all connected to each other, but our users see it as a giant supercomputer. We deploy the jobs there, they can land on any site as long as the data is available locally, but they don't really talk to each other. If they fail, we can retry it on the other side of the world and everything keeps going. So it's really the principle is quite clear in this area. So HTC, HPC, we drop low latency, we add crossing boundaries, so new complexities. The last bit I'll mention was advanced scheduling. So we talked about it. We heard about it also about the creation of the batch working group and the batch system initiative. The reason here is that when we think of pods, you're already a bit too far away in the deployment. When we think about these things, we think of workloads, and workloads are not just pods. They need to be queued earlier. They might have complexity also on the number of pods involved. So what we like to think is really about the workloads themselves. These workloads go into queues. These queues can be from different tenants, different parts of your teams, different projects, and they have to have priorities. So the idea of having queues in the cloud-native system in Kubernetes is something that is really important for us. Then when you have queues, people have quotas, you don't want to maximize usage, then you have to introduce the concept of fair share. So if someone doesn't have enough workloads to fill up their quota, you want to maximize usage, then you allow other people to kind of steal for a bit, but over time, you want to give everyone their fair share. And finally, we talked about sending a lot of jobs that have to talk to each other for HPC, so traditionally, MPI jobs. What this means is that if I deploy a workload that needs 100 pods, and I can only schedule 80, if I would immediately schedule them, they wouldn't be able to do my job and they would just take up resources. So we want to make sure that if these requirements exist, we only schedule them when all the resources needed are available and we maximize usage again. And finally, array jobs, kind of jobs that are common in their characteristics but you might want to have thousands of instances of them with slight variations and independent jobs. When I was thinking about this topic, I actually bumped into an old article at CERN in the CERN courier that talked about the null shift in the paradigm that we were using. And what you see here on the right is actually the same building where we have the data center running today, but it used to be quite different. We had mainframes and the people were actually working inside the building and back in the 80s, the article is written by someone called Ben Siegel, which is a legend at CERN. He's retired, but you can still find him in the cafeteria quite often, and it's very interesting to talk to him. But he was talking about the change where they were looking at how distributed unix boxes could replace the old powerful IBM and mainframes. And in the article, he also mentioned something that says software development would be a challenge for this transition and he says, fortunately, some of us had been working with CERN adding some facilities to unix that were vital for mainframe computing, something like batch scheduler and the tape drive reservation system. So this kind of resonates with what we are talking about, batch workloads, HPC, all these things are requirements. And they ended up adding this capabilities to unix. So that was something that I think they actually called the system back in the days shift, which is really what was happening, a big shift. And I think we are kind of at the stage where we're trying to do the similar things. Ben is also the responsible of introducing TCP-RPE at CERN, which is pretty amazing. So if you would think from here, like a batch scheduler tape drive reservation system, which systems would actually, we would still be using today, we're talking about the 80s or early 90s. And if your answer is batch or tape, let's see, like if you talk batch, this is the same room as it looks today, large data center, much more close to what we all have these days. And 80% is back. So we are using batch everywhere pretty much since then. But if you said tape reservation system, we actually also use tape. And I just put it here as a curiosity. We use a lot of tape, we put all the data as backup in tape and we actually recall it as well from time to time for things like reprocessing campaigns. And to prove it, I actually brought one here. So I have one of the tapes that we use in the LHC for storage. And what I'll do is like, if you bump into me in the corridor and you guess how much storage we can put in a tape like this, you take it home and it has LHC data inside. So it's a nice price to take home as well. Well, what you can do with it, I'm not completely sure. Reading and understanding it might be a much bigger challenge. Now, finally, I'll try to do a quick demo. Hopefully something will work. But when I was on Tuesday, we had a collocated event for batch and HPC. And there were a lot of presentations about schedulers and I talked about advanced schedule needs and all these things. So I actually, there's a project called Q, which is very interesting. There are other schedulers like Volcano or Unicorn as well. But the presentation shows the needs we have, their presentation shows very much the needs we have. So I'd like to show here. The idea is that you have the user queues on top and here I put the four LHC experiments just as an example because we are talking a bit about CERN as well. And that's where the workloads go. And then those queues are linked to what's called cluster queues. And these are the kind of separation between the user domain and the administrator. And then typically a queue will be associated with a cluster queue, but you can actually have a queue associated with multiple cluster queues and you can start getting to things like fair share. So in the example of the Atlas queue, we can see that it has its own quota and it goes to the cluster queue, but it actually can borrow resources from the CMS queue as well if needed. And finally, like if you would be an administrator and you would realize that all the queues are full, everyone is complaining, they don't have resources and everyone is pending their workloads. You could come up with an additional cluster queue integrated into the system in a seamless manner for your users and borrow public cloud resources and kind of clean up the queues and then get back into business. So that's where I will try to very quickly demo now. Thanks a lot for Google for helping me out with the resources as well. I think we can switch. Is it working? Or maybe not, there will be no demo. I won't take too long, I didn't work, didn't work. All right, but we're not there yet. This is only the start. So here I'll be very quick. So basically what you see in this screen on the bottom is this queues and then you can see also the cluster queues on the bottom. You can see the Atlas queue actually has a bunch of pending workloads. It actually has the same quota assigned as the CMS queue, but the CMS queue is basically doing not much to fill their quota. So what we want is to make them share and you can see that these shares are called cohorts in this tool. And I really have to say a shout out to Aldo and Abdullah, which are the main developers of this tool. And you can see that all the experiments belong to the same share, LHC. And then we have this additional one that is still not in the share with Google Cloud resources. So what I will start by doing is editing the Atlas queue and you will see that there's something that is blocking it from going and borrowing the resources from CMS. So I'll drop that, which is this maximum. And if we look here, the number of workloads will be increasing in a bit, hopefully. Yeah, so it will pick up on the number of workloads that should be deployed. Yeah, it will take a bit. But then basically you get the idea is that Atlas would have a number of workloads running. It has a bunch of pending workloads as well. And the idea is that we start sharing resources between Atlas and CMS. And you can see, actually, there's other two queues that are borrowing there. So that's why it's not immediately bumping into workloads. Now we still have like 390 workloads here. So what we want to do, and I'm very much out of time, so I'll be quick. What we want to do here, then I would go to this KubeCon queue. And you can see that actually it has a lot of capacity. So I created a queue with 10,000 CPU cores. So what I will do is very quickly, as a citizen mean, I will just make it go into the LHC share. And this should help us starting to get a lot of workloads being scheduled. So this takes a bit of time. And because I'm out of time, I'll just show you this dashboard. Basically what you can see is that in the beginning we had the yellow queues, which are Atlas. And then the CMS is the blue one. So they're actually not filling up their quota. And at some point we started sharing resources between them. And because I don't have time, I will go to another view here and I will show you an earlier demo. So this is still the live system. And you can see that basically the queues had very little workloads running. And then when I share the Google Cloud resources, then Atlas is able to really spike their usage. They go up, in this case I wasn't using 10,000, I was using 3,500 cores. So they could do it for a bit, kind of get all their pending workloads done and ready. And then I would scale down again. And basically you would get your users all happy again without much hassle. So just one last slide. All right, so kind of worked. But I would like to just to summarize here by saying we are almost there. Actually a lot of the problems we had have been solved. Things like the scheduling problems we had that have been solved by API priority, fairness. And the bits that we missed that we discussed today are being done by different groups. So I would highlight here again, like Dave mentioned, the Kubernetes Batch Working Group, the CNCF Batch System Initiative. And I would highlight also the CNCF user, Research User Group, they also co-lead. If you're interested in any of these topics, please join us and thank you so much.