 So thank you all for coming. And thanks to Nate, one of our regular members here. He's going to talk to us about container support in Slurm. So over to you, Nate. OK, hopefully that'll be quiet now. I have a neighbor like a small wiener dog, and they really want to go and lick it. So whenever the neighbor walks it, they just bark the little heads off. Yeah, I think you can go ahead, Nate. Oh, OK. Yeah. I don't know if anybody even knows me, but I'm Nate. I work for Schedemde with the guys behind Slurm. Today I'm going to present to you some container support stuff that we just were adding in the upcoming 2302 release. So that's just in March. Everything here I am presenting actually exists. I'm not just waving my hands. And if I have time, I'll even do a live presentation afterwards. All right, a quick rundown on what I'm going to talk about. Let's get that here. So I think everybody here knows this. Everybody likes Docker. They want to be able to use it. Lots of users really like it. There's lots of problems with Docker, but we're not going to worry about that for this presentation. One of the things that I've been asked for a lot is users just want to be able to use Docker. And so this is the way that our first real good attempt at actually getting Docker to work natively with Slurm. And by extension, Podman and possibly some love for Kubernetes later. First thing was we had to add support for OCI containers. We did that last release. It runs at works, uses all the existing controls. It's not very user-friendly. Never was really intended to be. But the support was required. And now with our upcoming release, we're adding something I like to call a OCI runtime proxy, which I'll explain in a few minutes. But more or less, it allows us to bolt in Slurm into Docker. So this is the container support that we got added. I don't think it's too relevant for this thing. I just have it here for reference. Containers are now first class citizen in Slurm. These examples, just running against some container that you have in tempFS. This is the current way jobs are run in Slurm. User logs in usually via SSH into a login node. And then they call one of the commands to start the jobs. The job runs on a compute node. This is the way it's been for, oh geez, 20 years. And Slurm's model is the continuation of existing models which are all the way from the 40s. Or, sorry, 50s. Try and intrude, works great. New model that we're adding now is, user will log into a login node via SSH or whatever. But instead, they will interact directly with Podman or Docker. Then Podman or Docker will call our new command, which we're ever so hilariously calling SC run. And then SC run will do all the magic that requires to get Slurm to work. And to the user, it'll look like they're just working with Docker directly. But in reality, their job will be running on a compute node and following the previous model for the job itself. And the expectation is there'll be container like an artifact or something like that where they pull their images from and everything they need. And in this case, swapping out Kubernetes there for Docker is definitely a possibility because everything works by standard, which is really nice. Definitely not perfect, which is one of the things. So for instance, we're only giving documentation for Docker and Podman, but since this is the research group, I also, you know, we'll look at Kubernetes. Definitely not something I'm promising anytime soon. At least not for this release. So I'm calling runtime proxy. I assume everybody here is familiar with what an OCI runtime is. In this case, all this functionality exists in Slurm one way or another. It's just not very cleanly set up for containers. So it's the game of a thousand small edits all over the place to get it to work. My entire goal here is to make it completely boring for users. I want the user to be able to call Docker as they would call it on their laptop and have it work as they would expect. And like all things are competing, there's a cost to this, which means the system administrator is gonna have to do a little bit extra work to make sure that the container images can be pushed around and the default config is what they want. So maybe having a specific queue that they send all the jobs to that are containers. In most cases, I assume most containers will be, you know, a single core or something like that, something simple, but it's definitely configurable. And then you continue to use all the existing stuff that Slurm has. One of the big gotchas here is when Docker runs or Podman runs for that matter, it sets up a mount namespace where it mounts everything usually with overlay fast depending on how you configure it. And that mount namespace has to be copied out or somehow exported to the compute node. And it only exists in that specific namespace. So that's why we have something called staging in and out of the image. It's done via script. I will talk about in a minute. The whole idea here is users be able to just use existing HPC resources and Docker now or Podman. Podman tends to be a little bit friendlier on the HPC boxes. So here's the first example. And I will note here that this is only rootless Docker. Rootful Docker is a absolutely unacceptable security risk on almost every HPC system. That's not just like a per user cloud hosted system. So here, the first line is you're exporting the Docker rootless control. Just telling Docker, hey, talk to rootless instead of trying to talk to the system Docker, the Rootful Docker. And then the second part is Docker, unlike Podman requires that you pass security settings directly over the command line. You can't just configure it out, or at least not a way that I've been able to find. And I have been searching the source code for it. In this case, all the security features don't make any sense because you're not actually running the container on the login node. So we just need to turn them all off. So no advanced networking stuff supported yet. So we just turn that off. App armor and SE Linux and the security containment whatever, all that just needs to get disabled because it doesn't even apply cause you're not running on the node you're actually running on. It's all headed by some anyway on the computer nodes. So I just make a quick little export here to make it explicit. And you just turn all of it off. I really wish it was an easier way to do this and I might end up sending some batches in. I don't know. First command is you're just calling Docker run, all the normal things. Hey, I want to run Ubuntu and just verify the release. And the second one is same thing, but with sent us just proof of that it works. You guys feel free to ask questions at any time. I didn't mean this to be too formal. So here's all the processes that will actually end up happening in the run. So the user logs in via SSH, whatever. Routeless kit is running Docker, which is running container D, which is running the shim. And the shim is calling SC run, which is our OCI runtime proxy. And it handles the work of calling out the slurm and initiating the job and all the other things. And then slurm D, which is basically, which is close what slurm mode is basically the cubelet for a slurm is running on the compute node. And then it's actually calling C run or run C, whichever OCI runtime you want. That's definitely configurable. So there's a lot of extra processes involved in this, but to the user, it shouldn't be visible. It should just work as expected. And most of these processes don't actually do much. So it doesn't slow down anything. At the end of the day, the user is actually communicating directly with S run. And container D doesn't actually have to do much. Quick example of the conflict that's required to activate this. I do disable everything that can that I don't want in the conflict that they let you do it. And then even activate no new privileges because why not more of the security the better. And in this case, the most important part is new runtime, just call SC run, which is the new binary provided by slurm, which is the OCI runtime proxy. Same thing for podman, although podman actually has configuration choices that disable all the extra security stuff, which is really nice. So in this case, normal podman command, I want to run Ubuntu, run CentOS, and then just for verification, I'm just having it print the slurm job environment ID so that you know that you're actually running as a job each time you call it. And for people who are unfamiliar with slurm, every time you run a job, it gets a new job ID number. In this case, just verifying that. Podman's a little simpler. It just has podman, which calls codman and codman calls SC run. Not too fundamentally different, little less abstraction. The conflict for podman is simpler than Docker. You can just disable all those extra security things. Say, I just want to run the host for everything because it's not running on the login node. Your job is actually running all the way on a compute node somewhere far away. And just telling it, hey, use SC run. I assume you guys can see in my mouse. Yeah, it looks like you can. Yeah, we can see your mouse just fine. So one of the gotchas is container staging. I mentioned this a little earlier. When SC run starts, it's actually around inside of the user namespace, mount namespace. It's its job to get that image and push it out to the compute node via whatever means is most efficient. Now, since every single HBC system I've ever gotten my hands on or seen is different. File systems are different, storage locations are different. This has to be really customizable. So in Slurm, it's done via a plugin which calls LuaScript so that a sysadmin can do whatever they need. I expect this will actually have different differences between different hardware types on certain clusters, especially for the cloud bursting ones. They'll even need to be some calculations of paying egress and ingress fees. I imagine. So there's just a LuaScript. Lua tends to be super friendly to sysadmins. I mean, if you don't like it, you can just call exec out to whatever script or code you want. It doesn't really matter. Nice thing about Lua is it has built in JSON support so you can edit and look at the images, spec file and all the other fun things and they change it as you please. And then there's callbacks inside of the Lua thing to tell Slurm what you have done. Like, hey, this image was here, but now it's over here on the shared file system or on this S3 bucket, whatever's required. Here's a incredibly simplified example. In this case, I'm just calling rsync and pushing out the image root to a shared file system. And then I'm telling Slurm, hey, the bundle is now here and the new root path is here. And it modifies the config file too. Very simplified. You can look at our documentation later for the full one, but I hope this gets the idea across. And then in the case of my example, when it's done with the job, it just deletes all of it on the root file system. First sites that have lots of rules or stuff like that, reproducibility requirements, they can always just send this stuff off the tape or something like that or just release the S3 buckets directly. Something fun like that. Or on mount, they fuse file systems. There are lots of limitations involved with this and I've been glancing them over. For instance, we don't support any kind of network game spaces, it's only host for now. And most of course, that's what it's probably gonna be forever because already made drivers and other things like that don't play nice with network spaces or C groups. Although there has been some efforts in the infinite band area to make that play nice, but it's not covered for now. Stuff like C groups, ARPOM or SE Linux support on the login host don't actually make any sense because you're not running the job there. So we just disable all that. Finding out where things fail can be difficult. I'm still working on that, but there's lots of places where logs can happen. I mean, every single demon has its own logging and you have to be able to look at all those depending on when things fail. I have found mostly once you get working, it tends to be pretty rock stable or rock hard. Minor details, you gotta make sure you compile things in the right order. Swim has lots of customization for sites. Something called CLF filter in Spank. Plugins that let sys admins insert plugins that modify user requests at certain locations. Really giving admin lots of power and control these things because in a lot of cases, users are not basically power users, but it's definitely configurable and if it's not done anything then user can do as they please. Authentication, we currently only support the Mange off. There's a lots of work done in the background to make sure that the username spaces are translated correctly, which is currently only possible via Mange, but the future release is JSON web tokens and stuff like Oauth can be set up to work. It just hasn't been done. Anybody have a questions? These are just minor technical limitations. I just wanted to list out in case people have questions. I mean, I have a couple of questions, but maybe I don't know how much more you have. It's a bit of a question of understanding if I forgot that right. So, I mean, first of all, I think this is really cool. Oh, I'll wear a question officially. Yeah, sorry. So, the socket, like the Docker socket then some, they are not available on the login nodes, but they are only available on the compute nodes. So the like the stem cluster, is that correct? No, actually Docker runs on the login nodes. Yeah. Let me go find that. But it mounts the socket from the like batch node? No, Docker doesn't have anything to do with it. Because Slurm has the native container support, Slurm talking directly to the OCI runtime. So either C, run, run, C, whatever you want. Where is that little graph I had? Yeah, so Docker will run here. Docker socket will be here. The job won't have that access to that because it's running on some compute node off from the cluster, potentially being burst out in the cloud somewhere. But once the job is over there and running, it can talk to Slurm directly or it can talk to its own container runtime and the container runtime has proper interface with the Slurm because Slurm knows the relatively simple config and state for it. Okay, okay. The principle, yeah, okay. I mean, it doesn't currently have the, like the Kubernetes thing where you can talk to itself and change the job around outside of the existing Slurm stuff. I mean, Slurm has all that functionality, but it's not specific to containers, but anybody can change around their job as they please. Okay, and the main use case would be interactive work with a container. Cause I mean, if you wanted to use container workloads, then you could use the native Slurm container support, right? It's like a lot of jobs and like, if you want to have like massive batch parallel jobs and then- Yes. But this implementation here, the point here is really to enable people to, you know, interactively work with this. Interactively. And then even though my example is really simple, once the image is made, the user can probably have some kind of an option to push it out to his common location and then submit large amounts of batched up to do production batch work. I mean, the whole idea is to let Docker or Podman or whatever you want to generate your containers and have them run or be prepared in the way that users would expect, same way as their laptop or work machine, workstation. And then you can run the jobs as you please. Yeah, nothing, that's great. And maybe one last question. I mean, what are the like system requirements? We're still running quite a lot of, let's say, you know, Red Hat or CentOS 7 systems. Are there any, in particular for, since you have like rootless, I mean, does that require like CentOS 8 or I mean, do you have an idea what's like the minimum requirement for that? So this is the fun part. On the Login node, I would suggest running the latest because rootless Docker has been doing massive amounts of development and improvements. And it just tends to work better on CentOS 9 or whatever it is. But that being said, the compute nodes run C has been rock solid stable for a few years now. So if you're running CentOS 7 and you want to run the job there on the compute nodes, I'll be fine. And doing heterogeneous clusters has been working for years. So you could have the Login node be, you know, current revision of CentOS and then have compute nodes be something old that stable. You do need to have username space support. So Red Hat 6 and CentOS 6 doesn't work. That's just the kernel limitations, there's nothing we can do about that. But besides that, if you have username space support, it should more or less work in your kernel, if it's in your kernel, it should more or less work. Run C is actually really simple. So it works pretty nicely. Okay, yeah, very cool. I mean, maybe I'll stop in case like other people have questions. No, I'm here for a question and answer. I guess this is a better example of the stock. Yeah, I mean, at the end of the day, it's running under C run or run C, whatever runtime you want to use, or even Singularity or Charlie Cloud. Yeah, on that one, I would ask like, what is the life cycle of the container again? So you prepare the container on the Login node, if I understand correctly, like you do a potman pull on the Login node, right? But then the configuration based on what the compute node needs is also done in slurm. And like in potman or Singularity or Saros, they use specific configuration that is done on the compute node, like OCI hoax or plugins in potman to maybe specialize the container and tweak the container to use the right MPI or use the right GPU libraries and so on. Is this that in the Login node? So, something about that is you can actually just activate that stuff. For instance, you could act, well, Saros, Cal C run eventually. The new command, so it's SC run, it has all the same cement as Alec and it has its entire set of environment variables that you can set up the same way. So you can modify the job to be on our nodes, 10 nodes, whatever you want. All that's there. The creation of the container, like the Docker create or Saros, like the life cycle management of the container, is it done on the compute node or on the head node? Because if you configure Saros on a specific compute node then your Saros run will be, the life cycle hooks will be executed on the compute node and it sounds like you are not using Saros on the compute node, you're just using C run on the compute nodes. Yes. So, this adds several steps to it. So yes, all the hooks will be run by Docker. Docker will maintain the image as you please unless you decide to copy it out. And then you can have the hooks that C run calls also. But C run doesn't call any hooks, right? Most of the time. But they actually are there and they do work. For some of them, the Docker thing, you actually have to disable a few of them. All right, it's called. Okay. But yeah, they're actually called on both sides. Okay, interesting. Yeah, interesting to see a live demo that's, I think that's cool. Let me go get it going. See if I can actually share my screen in one second. Hopefully I didn't break this in between yesterday when I prepared it. But, you know, it's the way of live demos. Let me get it shared. Okay, can everybody see this? Yep. All right. Now, naturally, I have conveniently removed any of the errors and other stuff in the presentation. So right here, I just called a pod mad run. And I did break it a little. So, I have all the debug activating here too. So here, I'm just a normal user on the cluster. I'm calling a pod man run Ubuntu. I'm just telling it uptime. These are the authentication work rounds I have active. That's gonna be fixed before March. There's a whole bunch of movement in and out of the namespaces that I have to account for. Slurm was written before all this stuff existed. So it's requiring a good bit of effort to get Slurm to play nice with them. And then there's some more logging. I can explain it later, but the expectation with an OCI runtime is that you have a process that runs that Docker or pod man communicate against. So in this case, we make one week and I just call it anchor. And it gets split off and then it starts doing all the effort. And then eventually at some point, srun is called and it has the container location. It'll know where. So that's the container location. That's where it's been pushed to by the Lewis script and then the container ID, which is handed to us by Docker. And then the job runs just uptime. I mean, if you want to make all different command that's cool too. Then we're just gonna ignore these errors for the terminal stuff for the time being. Definitely some interesting issues with the movement of the terminal permissions, just some minor bugs to fix. And then it does the stage out and which case it's just deleting it. I mean, let's see, OS release, it goes faster. And of course it runs slow for some reason. Probably doing some compiling the background or something. Reference here, whoops. I killed it off. I'm running it on a CentOS. So this is AlmaLinux 8.6 and I'm calling Ubuntu. Go check, make sure I don't have the cluster full of other jobs, something funny like that. Oh, that's it, okay. It's just being slow for some reason. This is a problem of active development. But I hope it gets the idea across. For the user, they'll just see the normal podband commands or Docker ones. Honestly, I would expect most sites to do podman just because it's a whole lot less of a pain for them than Docker or Ritless Docker. But they both work. My example here only has podmint set up because it was just simpler because this is actually a Docker cluster. So... So this is a Docker-based cluster inside of compose. And it's actually doing that. So there's multiple levels of namespaces used here. Doing Docker inside Docker is a pain in the butt because I have to actually move around a whole bunch of the mounts. So I'm not exposing that right now. Which I assume you know all about, Christian. Yeah, I can do the work. I have made it, but the podman one tends to be the easiest and I don't know why it's hanging. And all the normal things. So I just take control, see to cancel it out and sends the signals as expected. Let's see if uptime works. Maybe I did something funky there. Yeah, uptime works. So I must have did something odd. Definitely it's not completely ready, but it does work. Might have had the interactive thing. There's some weird issues with TTY controls, but I'm working those out. Most of it involves just having to turn off a whole bunch of the features because we just don't need them or use them. I'm curious about one thing. For running the container with the resources request, and usually I've seen you create some kind of slurm scripts, does this work well with Docker, like with this SC run or? Yeah, if you wanted to, you could do, let me see if I have a queue maybe for this thing. I'd have to add the login node as a partition. I can do it pretty quick. But then once you do that, you could just do as batch. This is only along these lines. So I'll just tell it to run a login node and then I'll be fine like that. You know, that's what you wanted to do. It's not gonna work because I don't have it set up for that right now, but that's definitely something that should work. And then I guess it'd be a little, I guess, I mean, I guess I wonder what the overhead would be for like the containers like requirements for the CPU or memory versus what as batch requires. I don't know if those were quite one-to-one, right? It's gonna be a little extra, but if you're just running on the login node, it's the same price as the user calling it. I mean, there shouldn't be much of anything. I mean, most of it is idle the second. So most of the work done by Docker is done when it's creating the image, mounting it all up, and then staging it out to the login node. And then it's idle until the job's done. It's just moving the IO back and forth. Got it, okay. I'm gonna put it in the log. All right, well, thank you. And then the existing stuff, you know, Podman's log should work. I mean, you're just paying the price for having Podman or Docker do the log movement and then store it somewhere. Slurm does the direct writing of the JSON log files. So hopefully that's not gonna result in too much overhead. I mean, unless the log's a lot, but in which case you can always just tell Slurm to log into a different file and not have it go through Docker. Maybe ask another question. So if you were to do something like Podman images to list them, so because I mean, if you have them on like a shared file, so would you see your own images only or everyone's images that are quite, I mean, you made some comment about like, you know, what is run as a user and what isn't? Podman here is running as a user. It's all specific to the user. And yes, that is a possibility, but that's actually outside of Slurm. I mean, you can configure Docker or Podman as you please. Slurm only becomes involved the second it calls SC run. And then Slurm's gonna push the image back and forth to the compute nodes as needed. And yeah, you can do all the fun Docker caching if you want. You know, I'd expect any large site to have their own internal cache that they control and less certain images and whatnot. Although there are, you know, more open sites out there that could just let you pull from Docker. How about you please? That's definitely up to the site to decide. It's nothing that we're gonna apply limitations on. And I mean, so at some point you showed that it's pushing the, like, I guess the extracted image. Like, I mean, you have it here and it's like serve container, spread something. So does this also, like, if you want to submit similar containers, does it then do any deduplications? Or is this like the common shared file system for the unpacked images or? That doesn't exist yet. Right now it's just a simple push of, you know, coin arch sync, deduplication is something I definitely like to look into the future, but it's not something we do right now. There will be as well along the slides so you have the images, you show potman images, right? And to create the image, this is unpacked or it's loop mounted on this particular part. Then you pass to the compute node. But usually what I see is that potman creates images so by everyone on a shared file system and then just loop mounted on the compute node. Because some compute nodes don't have storage, right? So a local storage, so you want to just loop mount those. Yeah, for everybody it's different. That's why I'm giving as much flexibility as possible. I mean, yeah, I'll show you the, I gotta leave the user to modify it. So let's see. Could not be too hard to not copy over the root file system of the compute node, but just loop mount the image on the compute node instead. Yeah, you could do something along those lines. This is the simplest version that I set up with is just calling our sync. But you could definitely do amounts. I'm accounting for all these things because I mean, I have sites that have really fast local file systems, fast Luster file system, GPFS, stuff like that. So I want to be able to make it work for them. And then we have sites with the ultra slow egress and ingress of pushing the images out to the cloud. I mean, and I'm giving as many options here. I mean, future work could definitely go to making that faster and duplication stuff along those lines. Yeah, so the, this slurm stage in allocator would be the magic too, or the place to make magic work in different ways. Yeah, I mean, I'd hope most sites could just use our sync, but definitely configurable. Like this is the actual thing that's being called for when the demo that I'm doing right now, it's extracting out of the environment who's running. I have serve containers as the shared file system that's pushed out to everybody, which is just a really simple volume that I have mounted everywhere for this demo. And like, go on. Again, the life cycle, like maybe mapping in a GPU. Is it also a callback or like a function that you can't compare to make it work for different compute nodes? If you have like two queues, one with GPUs, one without. So if you wanna switch between what you're calling and go back to Fred, my hilarious test user, you just be passing it as the environment to the container. So let's see, I don't remember how to do an apartment in one second. But when Slurma is called, so as you run the environment that exists there at the time of, you can set any environment variable in Slurma, read it and process it just as normal. So if you do along the lines of export, let me just go find the command for you. The docs aren't written yet, but the same, all the same environment variables for as Alec will exist for this too. So you could do something like this inside of the job. I don't remember what Podman does to export the environment. Yeah, we'll have to document this better. So one of the things that we are hitting is that you can't do something as simple as this. This is a limitation of Docker and Podman. So what I really would like to be able to do this, but the actual calling environment doesn't get passed by Docker. It might actually get passed by Podman. But the expectation is we will be getting the environment. So you can export it. That was a nice Alec. I would expect this to be like configured on the host and then you don't need to pass anything. It's just depending on where you are. Oh yeah, yeah. So the thing that I do know that does absolutely work right now is passing it in the Conf file. So give me one second. See container, Conf. So if you pass the environment here, I don't remember. But that's for where it's actually running, right? So that's the global configuration for all your jobs. This is the one for all of them, but you could definitely do it per user one if you feel like it. And if you really want to be able to customize it, that's what the spank and the CLI filter plugins are for, that the user, if the site really wants to do something smart. So I'm working on getting all those to work together. The passing environment is there. It does work. You can do it here, which I don't know the format off. But you just pass. Don't worry. I'll give it a try myself. I think other people have questions as well. So no, no, these are definitely valid concerns. I mean, I'm more than happy to take all these questions because it helps me to make sure I'm not missing something. Yes. The whole goal is that you could pass all the normal serum Conf. You can say number nodes, GPS per node, all that stuff. And even if you want to do multiple nodes and stuff like that, as long as you make sure that the image is pushed out to all those nodes, it'll run. Or with a shared file system, you don't even have to worry about it. Yeah. Maybe just to state my use case, I have to call two queues, one older queue, one newer queue. But I would usually do with Sarah's potman, whatever I would configure the hook on the, the container hook OCI hook on the compute node to pick up the correct MPI libraries to be pushed into the container. Right. So it's a on a compute node configuration. So I, and I would need this compute node hooks to be triggered somehow. So I guess my, this stage in a location thing would need to reach out to the host, stage the container on the compute node instead of the login node so that you books. And then it's independent on configuration of the just reaching out to the compute node, make sure that all the hooks for the compute node are run. And then the container is maybe already present on the compute node. So I would age the container on the compute node is what I'm saying. I guess. Yes. My thought is that the users would run. Docker pod men, but there, but you can definitely run it on the compute node instead by using as batch. That's definitely an option, open option. And your use case here of making sure the MPI hooks are called. I'm going to go and double check, but that's definitely something I want to get working. If it doesn't work already. So yeah, thanks for pointing that one out to me. I got to make sure that works. The fun of development. Any other questions? I'm, I'm very open to. Oh yeah, Docker built does work. Well, in this case, pod men build will work, but. Yeah, there's no reason that won't work. And it'll actually build on the compute node. So if you want to do, you know, GCC native, you know, you can have that done as long as your nodes are heterogenic, heterogeneous, or you're just careful to make sure you compile on the right node. I had a question based on your first diagram that you're talking through, similar to what Clemens was asking, I think, just a couple of things when for a basic Docker run, I think I understand your login node and it's effectively then instructing the compute node to do the business. What's the kind of flow if you want to run many jobs? How does that actually work? You just hit it many times. I would expect if you want to run a ton of jobs, you could just pass the environment request that give it an array. So, you know, use this image, but do an array of 10,000 jobs, something like that. So you don't have to pay the cost of the pushing stage in and out each time. But that's definitely a possibility if you want to. The actual layers of the image and so forth that flows through the login mode on your diagram. When stage out runs, or yeah, when stage in runs, sorry, stage out is when it pulls it back. So stage in. So this is just a log for it. When stage in runs, it's running inside of the namespace provided by podman or docker. In this case, I just do a really simple rsync of pushing the final mounts out there. More advanced. Image slicing is definitely something I want to look into. But for now, it's not there. And how does that scale if you are running lots of things in parallel? Is that okay? So it depends on what your file system is. If you have a shared file system, it's one rsync command, push it out to the file system. And then the file system handle the dirty details of that. You got thunder, you heard problems, but that's nothing new. And then in this case of the job, you just tell it, hey, I want X number nodes for this and Slurm will run it just like any other Slurm job. So I want a thousand nodes running this, point it to this one image. I just point it pushed out to the shared file system and it'll run just like any other Slurm job. Right, I see. If the user wants to run a thousand different jobs, you have to pay the price of pushing out to the file system. That's definitely something we got to work on but baby steps. Makes sense. Cool. Okay. Well, that's great. I mean, we've got 10 minutes left. Does anyone got any final questions? Nope. And I'll point out here that when the staging is called, it's inside the namespace. And so I just have it logging right now where the config is. So in this case, it's where podman has the overlay fs and then there's the config file. Anybody who wants to go into advanced case of parsing the config file could go and grab each slice and only push those out as needed. And that's definitely a possibility, but it's not something I'm implementing right now. Sorry, did we get the question in the chat already? Does Docker build work was asked by Timothy? Yeah, it works. Definitely if it doesn't work, it'd be a bug, but it will build on the compute node. That's the important part. So you have to make sure that you understand that, especially if you're calling compile or GCC or something like that, call an M native or something along those lines. Just need to make sure that where you execute the job, it matches to where you can. You have to make sure that it matches where you build it to where you execute it so you don't have incompatibility, which in most cases, I would just set up a queue for it. That would be shared between the two because we're taking over the this DSA runtime. That's where Slurm gets introduced, so Docker build calls that anyway. So it's just getting pushed out to that instead. Great. Okay, well, thanks very much, Nate. Appreciate you taking the time. Really interesting stuff. I realized I skipped over at the beginning. We normally make sure that everyone's put their names in the attendee list and not straight into your presentation. Sorry, wonderful. And if there's any new joiners or new people who haven't been before, just stick your hands up and say hi. I can see a couple of names. Ricardo's here with no microphone. Yeah, we have one more session before KubeCon. So in two weeks time, which will be the 19th of October, which is just before KubeCon. I think Ricardo is saying the gateway API for the next session will confirm that and we'll stick out some notifications on Slack and email. Other than that, we'll also work on putting together a list of gender items following KubeCon. Yeah. I'll be at KubeCon in Detroit. Anybody's welcome to talk to me, especially on Monday in the Batch Day. Yeah, I will be there too. So, meet you a lot. Cool. Is anyone else coming, actually, out of this group? Yeah, I'll be there. This is Kevin. Kevin, cool. And I'm attending. That's my plan. If I can get a hotel in downtown Detroit, this is really hard. Holy cow, it was so hard for me to get a hotel. I had to go there every hour and just check hotels.com until one just popped up because I'm the canceled. Yeah, that's what I'm doing right now. My colleague ended up on an Airbnb in midtown. He's going to get a tram every day. He reckons or something. The thing that sucks is that if you could just go on the other side of the water in Canada, they have so many hotels available. But you have to have a car to pass through customs every day. I was looking at it. Can I pull this off? Our corporate booking system recommends hotels just based on radius. And in some places it's called Windsor. I was like, sure, they're close. And it's only one-third of a mile. So you can totally walk. No, I actually checked that. The Ambassador Bridge, they do not allow you to walk across. It sucks. Because I'm like, I can walk this easily every day. Yeah. It was a bit of a foot bump. Just a little bit. Cool. I think that's it then. Thank you everyone for your time. And see you next time. And then see some of you at QCon as well. Feel free to hit me up with questions or whatever. Comments on the Slack channel. I'm definitely open to hearing stuff like Christian Singh with MPI. I definitely got to make sure that works. And the intent of this is not to replace SARS. But to get the container stuff as a properly supported part of Slurm. And SARS can be bolted on top. To do its extra magic. Good stuff. Thank you everyone. See you later.