 Hello. I think we'll get started here. I hope everybody had a good break. I see the crowd thinned out, which I'm happy about. So I'm Tim Randalls. I'm here to talk about a little bit of slurm and glance integration that we've done at Los Alamos National Lab. Reed Pradoski couldn't be here today, but he's a major contributor to this effort. So when I talk to people back at the lab about why we're doing this, the normal reaction is just questions. People want to know why we want to muck about with our HPC. Our HPC is solid. We've had it for decades. And I'm trying to evangelize a little bit newer ways of doing things, providing more flexibility to users. Just trying to drag our process into the 21st century somewhat. So we'll talk about some motivation and backgrounds. Some early prototyping that we did along this path to lead us to where we are today. What we think we're at with our current direction and how we think that's going to evolve into the future. Then we'll talk a little bit about the actual plugins I wrote for Slurm. I'm stupid or brave enough to try a live demo. So we'll see how that goes. And we'll discuss a little bit of future work. So we've heard it twice today already in this room. We've heard it all over the conference so far this week. We've heard it in the scientific working groups. Users are always asking for more flexibility in their HPC platforms. HPC platforms traditionally are large monolithic beasts that for, I'm a cis admin. To keep me sane, we want everything to look identical all the time. Anytime there's a variance in anything, we drain nodes. We want to stop running jobs. We've got to figure out why something isn't exactly the way we want it to be. But users are always fighting this with us. They have complex dependencies and the software stack we have installed in our cluster don't always include all the dependencies they need. They have build time requirements such as internet access. There are so many build environments today like Ant or Maven if you're working around the Java world where it just expects to have unfettered internet access that wants to run out, download dependencies, build something and then execute it. We can't do that in our HPC systems. We have users with validated or vendor supplied software stacks or platforms at least. At Los Alamos, at least, we run a Red Hat flavor, a Red Hat based customized operating system called TOS. The first time we call up a vendor and say, well, we're running it on TOS, they just hang up the phone. They don't know what TOS is. We can say it's Red Hat based. It looks like all the Red Hat installs you do support, but it's a constant fight with the vendors to get the support that we expect if we had been running what they call a supported platform. We have legacy code and we have decades and decades of legacy code at Los Alamos. That code has been vetted time and again, validated against past results, experimental results, and our code teams trust that code. They want to actually just forklift, move that code onto the new platform, but the world around them hasn't stopped moving. They may have G-Lib C requirements that the newer G-Lib Cs aren't always backwards compatible. They may have old Fortran library requirements that don't exist anymore in the modern world. They need a way to rerun their code as much to rerun the old simulations they did, but also to help develop new simulations so that they can use the old code and the old results to validate the new code. So there are also new computing paradigms out there, data intensive computing, something that HPC is a mature field and then these Hadoop guys swept in. Google wrote the MapReduce paper and suddenly our data intensive workloads, everybody thought, well, we'll just plop those right on your HPC system. Hadoop, by the way, you need to install this HDFS thing. We know everything you have is diskless, but we'll just create folders in your Luster file system and we'll call those backing stores for HDFS and Luster crashes to the ground and no one's happy. So we have to be able to support these newer computing paradigms. Hadoop has Yarn as a resource manager. I'm not going to run Yarn on my HPC cluster. No matter how much they say, it'll run an MPI job. If you've ever tried it, Yarn doesn't run an MPI job very well. It was at Hadoop Summit a couple years ago and the gentleman responsible for this claim said that he had only tested it to four nodes, but it worked great for four nodes and everybody doing HPC knows that your MPI jobs at four nodes, I mean, that's your debugging, that's in your fast queue and it doesn't get useful work done. And users also want an environment that they understand and control. So we run TOS. Users at the lab can download TOS and put it on their workstation. No one's going to. They're running Ubuntu, they're running Debian Jesse, they're running something that's far ahead of from where we are today and our current TOS distribution is based on Red Hat 6. And Red Hat 6 by now looks ancient to our users. It's old versions of everything. One of the number one requests I hear is why are we still running Python 2.6? Well, it's because we're on Red Hat 6. So we did some early prototyping to meet these user needs. Back in 2013, I built a cluster called Glom. And the idea here was users want data intensive tools, so let's give them data intensive tools. But we didn't want to take our existing HPC infrastructure and try to support those tools. So we thought, let's just build a Hadoop cluster. Hadoop 2 had just come out, Yarn was just out. Everybody was excited about it, so we built it. We put disks in all of our nodes, we diskless booted our nodes, but we had a local backing source for HDFS and we ran Hadoop on it. And users tried it and they didn't like it. And the number one complaint was, this is not productive for me, I don't want to learn Java. Like it says there, my favorite user in the whole world is the guy who came back after a long weekend and said, I've been really trying to work with you Tim. I really want to do this MapReduce stuff. I rewrote it in MPI NC and boy is it fast. So much faster than Hadoop. And I've been running it on all your other clusters, you can get rid of Glom. So you don't just stand up a 96 node cluster overnight. So we had a couple of months of work into this, learning a new resource manager, a new, for us, a new way of operating a machine at some scale larger than stack of workstations in the corner. And the users just threw their hands up and said, we're sorry, we thought we wanted it, but once we've actually seen it, we don't really want it. And we also looked at another way of providing some flexibility with a machine named Kugel. So this is our first attempt at using user-defined software stacks. Docker or containers make us nervous. So we thought, let's go with a more isolated environment that we think we can secure, that we think we can get approval to run on our production clusters. Let's go with virtual machines. So we went with QEMU KVM-based virtual machines. We had a couple users who thought, this is great. Then they had to build a virtual machine. Then they had to somehow get it into our cluster. Then they had to figure out how to execute the thing. And it's a very heavyweight solution with a lot of high overhead for the users. And it was a lot of high overhead for us. Because if we're going to give a user a virtual machine that they've created, users aren't always kind. The low-hanging fruit on why this freaked us out so much was, the user does something as simple as request two nodes, crashes a node, takes the IP off the crash node, throws it in his virtual machine, and mounts the file systems you don't want him to have. So we had to work really hard with bridging and routing and some switch rules, some ACLs, and stuff to try to protect our other infrastructure from a malicious user and insider threat who was running a virtual machine. QEMU didn't last very long either. About a year, we did some playing on it. But what we really realized is that we ended up with toys here. We built toys that we thought would meet a user need. But from our perspective, the user needs this, but how are we going to support it? So now we're working on a machine called Woodchuck. And we're thinking of Woodchuck kind of as HPC Plus. It's an HPC cluster plus a little bit extra. So you'll see we have 12 gigabytes of RAM per core, which is around three times our standard at the lab. We usually are three or four gigs per core. We actually put local disk in every node on this machine. So we have six terabytes of local disk space. It's just spinning platters, nothing, not SSD, nothing fancy. We also purchased six file servers, each with a large JBOD, and provided an aggregate three petabytes of cluster local storage. This isn't shared outside of Woodchuck. It's not shared with another cluster, but it provides a place for our big data users to put a lot of data in it. We can then make available to them on the cluster. But we don't know how to configure that storage. Some users want an object store, some don't. We didn't know what we wanted to do with performance requirements for that storage. So we've left it largely unformatted. Right now, if you want a hunk of scratch space that's local to the cluster, we'll give you an NFS-served ZFS volume or pool. So that's all we've done with it so far. And then instead of going with Infiniband or OPA, Omnipath, we went with 10 gigabit ethernet. And this was a... It's still proving to be a controversial decision amongst our existing HPC users. They want to do RDMA. Because of cost on this iteration of the cluster, it's 192 nodes. We couldn't afford rocky RDMA on converged ethernet devices everywhere in the cluster, not at 10 gigabits. So we just went with standard 10 gigabit. But what this does give us is software-defined networking. It gives us the ability to let the user run whatever image they want to run, and then VXLAN their allocation on the cluster into its own sandbox. Help isolate their hopefully well-meaning but poorly executed networking plans from everybody else on the cluster. And it also gives us the ability to do some unique things that we have a little bit of demand for, but don't quite know how to provide yet. If a user has a database that contains data that's protected in some way, with SDN we can plop this database on our backend network and then we can VLAN in at runtime the compute nodes that want access to that database. So if there isn't a job running by a specific user, no one can see this database except for maybe a management machine that has access to it. And we also replaced our user-defined software stacks. So we call it Charlie Cloud. Long story, but we won't get into it. Charlie Cloud is actually doing unprivileged containers using Docker tools. So we like the Docker tool chain. We like Docker files. We like the reproducibility. We like how much documentation there is, how much support there is in the community. It allows our users to build a container that if they choose to go run somewhere else, they could. But we don't use the Docker daemon to launch it. Yesterday, a gentleman from Red Hat gave a talk on container security, and he right away said, only give Docker the ability to run the Docker command to those people you trust. And we don't trust anybody at Los Alamos. Every user is out to get you. I'm sure you feel the same way. We have other reasons for it, but every user is out to get you. So we have Docker Cloud to do unprivileged containers. In a nutshell, all it does is unshare a username space and then pivot roots into their container image. So we don't do anything yet with network, UTS, mount, any of the other namespaces available. Simply the username space. So image management was one of the major problems that we had with Charlie Cloud 1.0 and Google with virtual machines. And that's because of our environment at Los Alamos. The user was expected to create this virtual machine on their workstation. Once they had it created on their workstation, it was how they needed it to, you know, had everything configured the way they wanted it in the cluster, they had to somehow get it into the cluster. So for us, that meant putting it on a parallel file system. We have gateways all over the place between networks at the lab. So they were scp-ing and r-syncing between multiple hops. This heavyweight virtual machine could be gigabytes in size to get it to land on a parallel file system. But once it was there, we had Charlie Cloud's glue on the cluster to go ahead and launch their image inside an allocation from a batch job. So what have we changed now by using Glance? Well, we can stick our Glance server on the network in a place that's in an enclave. So users can actually reach it with a web browser from their desktops. So they're able to go ahead and build their image, build their Docker container, build a virtual machine. We still do support the virtual machine. And then upload that through HTTPS to our Glance server. So once we're there, we need to get something out of Glance and put it on their allocation and execute it. And that's what the plugins were for. But Glance provided us a lot more than just ease of access to the image. We were telling people to put their finely crafted, lovingly created virtual machines on scratch. Scratch has attributes that aren't amenable to this. We have purge policies. The user goes away for a month, back and said, where'd my virtual machine go? I spent weeks putting this together and you've just trashed it. Well, we did. It was scratch and scratch is scratch and no matter how much a user doesn't want to believe scratch isn't immutable and forever, it is. Scratch is also a fast parallel file system. In our case, it's luster. It's not hard in an audience like this to say, put up your hand if you have a luster horror story. Yeah, exactly. Luster is fast. It is parallel. It can be fragile. Certain workloads from well-meaning users can just bring the metadata server down. OSTs go down. You lose backing store. Things go away. And scratch isn't backed up. So not only did we deliberately throw away the user's image or we put it on something that we knew might just tip over when the next job runs, we didn't back it up for you either. So we're really sorry. So you're going to have to go through your multi-step process to SCP it, RSH it, or RSYNC it back up to the cluster. And we promise that we might not do this again this week, but all bets are off a month from now. So with the Glance, with Glance as a service, we can actually create a server. We can size it appropriately. We can back it up. We can protect it. We put it in its own enclave. We allow users to have direct access to it. Glance also gives multiple interfaces to it. So if a user is more of a power user and he's got a continuous integration, running or continuous deployment service running in his work group, they can integrate a transfer up to Glance since part of their process. Or the casual user who didn't know how to do a multi-hop RSYNC through SSH, he doesn't have to. He just fires up his web browser and navigates to his file and says upload. And this also gives us better ease of sharing of these images. If we're just going to put something in a shared file system, we're stuck with POSIX permissions. We're stuck with owners and groups, chmodding and choning stuff all over the place. And if a user in one group wants to share it with the user in another group and there's no intersection of Unix groups for them, they're putting in trouble tickets to have us create brand new Unix groups in our infrastructure for no reason other than we need to put two people in it so that they can both read the same file. So with Glance, we create projects. We put users in the projects to really define sharing of images between projects. That's how we've solved that solution. It's moved some of the support burden off of us and put it out to the users to decide how they're going to share their image. So then, why do we create slurn plugins? A user could just use a curl command and download the image as part of his job script. But batch processing is high latency. Maybe there's a project that's coming in right now that has a critical timeframe to meet, so we give them a massive reservation on the system and really plug the cues up. So things aren't moving very fast. And the user needs a keystone token to actually interact with Glance. Well, your token's going to expire before your batch job runs. If we had a mechanism to query the user to say, hey, your batch job needs to execute, log in here, provide your credentials, we'll generate you a new token so that your script can now download your image. Their phone's going to go off at 2.30 in the morning and they're not going to be happy with us. So the plugins have to act as a proxy service here. The plugin has to be able to interact with Keystone and Glance on behalf of the user to figure out can the user have the image? Is the image still there? Is the image still shared with them? And then, at execution time, set up the environment by actually downloading the image and making it available to their job. So we have two plugins. I don't know how many of you have ever developed a slurn plugin or got into the guts of slurm. There's a lot of different contexts that things run in in slurm. So we have a job submit time and it's slurm-controlledy on the master. And it will actually validate the image request that the user has in their job script. And I'll show you how they request that image in the Brave but Foolish demo. And then there are several ways you could do a job runtime plugin. I chose to use Spank. It's a funny acronym. People giggle. It's a slurm plugin architecture for node and job control. So the Spank plugin has to go through at runtime, which may be days or even weeks after submit time, revalidate the user. Maybe it was an abusive user and we decided part way through that we don't want him to have access to images anymore. We don't want to run around and kill his jobs everywhere. So we'll just disable him an open stack. We need to make sure he's still allowed to do what we told him was okay when he submitted his job at execution time. So it will revalidate the user, revalidate the image parameters. It will download the image, drop it somewhere on the node, and then it has to do a little housekeeping of making sure the user can actually read the image and give the user ownership of the image so they can execute it. And caching. We do a little caching. So our plugin names are ugly and jobs submit open stack plugins. So everybody runs off and let's call it glurm or glurm. So we Google this to make sure that we're not going to name something that a company has copyrighted or whatever, but Urban Dictionary jumped to the rescue. And I wasn't going to share these, but these definitions, you know, Urban Dictionary, actually describe our project fairly well. This is what everybody thinks my work is about. We do HPC. We know how to do HPC. Why are you doing this to us? You're that glurm guy. Glurm's even more unfortunate. This was my reaction to writing the plugin many, many times, and I work for National Lab, so I have to redact something. Humoricide. What are some features of the plugin? Well, the plugin has a detailed configuration in that. It only relies on Keystone and Glance right now, but it assumes, it doesn't make assumptions that you've got a monolithic open stack infrastructure. It may assume you have a Keystone server over here and you have your Glance server over here, or you have a different Glance server for this cluster than that cluster. So we let you separate out by IPs and port numbers. Maybe your network doesn't allow you to run only once a service like this on a privileged port, let's say. So we don't assume the defaults, but we do fall back to the defaults if you don't configure things. We have a username and password for this proxy user. It also allows us to use the role-based access control to fairly finally control what this proxy user is allowed to do. You don't want a malicious user to be able to re-upload an image that they've modified and overwrite the existing one, for instance. So for the proxy user, we do things like it's read-only, it can't modify anything, it has no permissions in Glance or Keystone, just the bare minimum that's required for the plugin. Then we have image caching. So I wanted to call it caching because it sounds neat. People think caching, you've worked pretty hard, but how hard do you really have to work to leave something there? But the caching doesn't care where you want to put something. You configure your plugin to say, here's why I want you to drop the images. So we drop the images there. It could be on the compute node. So we have those six terabyte drives on Woodchuck. So we create a little temp space, drop your UDSS in there. It could be a RAM disk. If you wanted to run this on an existing cluster, we had enough RAM where you could carve off a few gigabytes and limit the size of image you were going to run. We don't care. It's agnostic to that. It could also be burst buffer. So there are burst buffer APIs, and we have a little bit of scripting into test, being able to actually drop the image into burst buffer. So it's not on the node, but it's a lot nearer to the node than our traditional scratches. So what's the workflow look for these plugins? Job submit. So the plugin goes and looks at the batch script and says, does the user even request an image? No. Well, we're done. This is easy. So in Slurm, when you have a job submit plugin, every job gets routed through your plugin. So it's important to have a quick and fast shortcut. Let's get out of here. We don't need to do anything. But if a user does request something, the plugin goes off, gets the image data out of glance. Does the image even exist? Well, in testing, it's easy to make a typo. We don't want a user to submit their job, realize there's a typo in the image name after their job's been queued for two weeks, and it just fails. Not very friendly to the user. We're already mean enough. So if the image doesn't exist, we're done. We return an error to the user and we quit. If the image does exist, we grab the user data out of Keystone. And then we try to make a decision if the user's actually allowed to use this resource they've just requested. If they're not, we're done again. They're not going to get their job queued if they are a success. But we've glossed over a lot of detail here. Is the user allowed the image? So what really does that mean? So the user, you can define whether or not, you can configure whether or not you want user checks to be enabled at all. If you don't care what's in your glance, you can just say, I don't care. If the user can have anything, just let them run. And we won't even check the user. If it's a public image, we just give them the image. Whoever uploaded it says it's public. I don't care who has it, you get the image. If the image isn't public, we have to work a little harder. We have to figure out if this user is either in the project that, well, if the user doesn't exist, no. Is the user in the project that owns the image? Yes, we're done. No. Now we've got to go off and get image members. The terminology glance uses for other projects that an image is shared with. So are there members? No, well, you're done. If there are, we just continue to iterate through until we either find them in a project that the image is shared with or we don't and then we exit, either successfully or a failure. This bank plugin workflow looks almost identical to the job submit plugin because 90% of what it does is the same. We just go back through our same checks. Is the image still there? Can the user still have it? The only difference is we don't just return. We have to actually do something now that we've said everything's okay. We have to retrieve the image. So we have caching. So the first thing we do is we look. Is the image already on the system? No, we're going to download the image. So we download the image out of glance and then we check the image checksums. If you look at Glance's metadata for an image, it has an MD5 sum just for sanity and obvious reasons. So we get the checksum of what we downloaded. We get the checksum out of Glance. Do they match? No, we're going to download it again. Do they match? No, well, we're done. We're not going to play this game anymore. We're going to return an error. Hopefully an admin notices this or a user complains about it because you've got some problem somewhere you're downloading corrupt data. If the image is on the system or the image has been downloaded, we check the checksums. Is everything good? We churn the image to the user. We chmod it to 400 so that only they can have access to the image and we're done. The job runs. So we have caching again. So what's the workflow on cleanup? So spank exit gets called when the user's job returns. It exits out. We check if caching is enabled. If it's not, we delete it. We're done. If it is, we churn the image back to root. We chmod it back to 400. The user owned the image. He could have made the permissions whatever he wanted. We make sure that no one else can have the image until another job comes along that's allowed that and we're done. So I want to try a demo. I've practiced this talk a half dozen times and the demo's always worked. I promise it has. So we'll try this real quick. It's not too much of an eye chart, is it? It's kind of readable. I can jack the font up if you need to see it. So we have an open stack. These are virtual machines running on my laptop. We have an open stack master. We have an HPC master. It's running Slim Control D. We don't want to run a separate front end. I don't have enough RAM. So the user's logging into the master's submitted job. And here we have a compute node. So this is just compute one. So I'll make assumption that we're going to HPC track talk so people know what a batch job looks like. But here we have a simple batch job that does nothing more than runs off, puts some stuff to standard out and that's it. So we'll run this. Our job submit plugin is enabled. Job submits. We look. There's out because it was a fast job. And voila. The date and time is roughly correct. It's the right host. So we actually ran on a compute node. Just laying a little groundwork. So my open stack master. I should have scripted this out of the type. Imageless. We have three types of images. We have a public image, a private image, and a shared image. The wardrobe's not nice, but okay. So let's ask for the public image. And I can do a show. Or do you trust me? It really is public. I'm not just faking this. We'll look and we'll see visibility is public. We'll also see up here that there's a checksum at the top. It's a hex MD5, EE, C6. Usually I don't know about the rest of you. I don't always read every single and match every one. So it begins and ends, right? It must be the same image. So we've got a demo job here that's going to request this public image. So what does a user have to do at job submit time to actually ask for one of these things? We're using the generic resource interface in Slurm to say, I want a UDSS. And that, by the way, this keyword of UDSS is configurable in the config file. We could show you. The Gres name is UDSS. And that also, if you know Slurm, has to match some Gres things in the Slurm comp in the node gres.conf. So you call it whatever you want to call it. We're going to call it UDSS. And we have caching, disabled, and user checking is enabled. And we're going to drop everything in temp UDSS. So let's put a watch on temp UDSS. Empty. Nothing there. So let's run this batch job. Oh, this is a live demo. This is a bug. For some reason, after a certain amount of time right now, I'm using libcurl. Libcurl can't talk to OpenStack. I think the connection is just too slow to pick up and the curl times out and we move on. So we restarted Slurm. Let's run the job. There we go. We have the image. So the watch kicked off right because the image was downloaded. So it's still in by root, but there you'll see it modded back to the user. So this job is not going to do anything, but checked that the image is there and calculate an MD5 sum. Caching was enabled. So please go back to root ownership. Oh, caching was disabled. Sorry. It's gone. Slurm 7. There's your checksum. More or less matches with the checksum. Well, it does. So the file was there when the job executed when the thing kicked off. Real quickly, enable caching just to watch some functionality. So the spank is actually reinstantiated every time a job executes. So you can make a config change on the fly and it'll reread the config file. So we'll run that same job again. Put my watch back. We'll see the demo. Well, that's doing something. We also have a private request. So once again, it's just the image name with dash private on it. There we go. Caching's enabled. So the image still lives on the compute node its own by root. Next user on the machine can't read it. So we'll run another demo job real quick and it's a private job. Towards you right away at some time you can't have that image go away. How are we doing on time? We have a shared job. Nope. You're not in any of the projects that's allowed to see this image. So if we look at the shared image real quick, let's look at the member list for the image. Trust me, the ID is actually the image ID for the shared image. It has a member ID there, which is a project. The Glance member stuff in Liberty, the OpenStack command didn't handle it. You couldn't do anything. Now with the OpenStack command instead of the Glance command, and the OpenStack command I like better, they get rid of the UIDs all over the place. You actually see human readable names for stuff. They've started to put the member functionality into it. You can do a member add and a member remove, but you can't do a show. Not that I know of. If anybody can correct me on that, I'm going to talk about it. So let's add the user to. We're going to add the user to the demo project, which was the same UID as the shared one. And let's run it again. Now the user is allowed to have it. So we'll see the shared one show up. Voila. It's a live demo, but my brain's shorted. There's one more thing I wanted to show you. Oh, let's simulate latency here. So we're going to pause my one compute node to simulate something bad happening. And we'll get the user to just execute the public image. So the job submitted, it's there. We do an SQ. Our job is pending. It wrapped ugly, but PD. So let's go up here and let's disable the user. Demo user has been disabled. To zoom our node, nothing. The user is obviously not going to get a message at this point. But they have an output. There was no image made available for them. Their job finished, but the MD5 some found nothing. The LS found nothing. The spank plugin rechecked everything. Didn't let the user have access to the image. This needs worked on a little bit. The user would like to know. Right now we tell users, do a check. If your image isn't there, ask us why. We'd rather have an automated way of a spank letting the user know. So if we go back to the presentation, that's interesting. Anybody know how to fix that? I was worried about this. Aha! There's Colette. Future work. So right now, this was about a two-week project. We're using it on a system that's not heavily utilized yet. We don't have this available to a lot of users. So I'm not thread safe with my job submit plugin. If you know anything about the Slurm model, they warn you right up front. You've got to thread everything. If your plugin is going to take more than three or four hundred milliseconds, you better spawn a thread to do it because we're going to move on. We're just going to fail. So it assumes a multi-threaded environment. And right now my job submit plugin is not multi-threaded. I don't have that problem on the spank side. The spank just runs in line. It's not a multi-threaded bit of execution. I need a more robust caching model. Right now, I'll just keep putting images there if you tell me to save images until I fill the disk up and something crashes. Or an image only downloads partially or something bad happens. So we need to come up with more configurable limits on our caching. Whether it's age-based, you know, we're filling a disk. We've got a size limit on our cache pool. So we're going to do a FIFO, something like this, free up space until we have room to drop another image. That hasn't been implemented yet. Very simple right now. All the jobs I executed were s-batch jobs. S-alloc for an interactive allocation is a strange beast in that when a user types s-alloc and asks for some resources, he's left on the node where he typed it. All it does is set some environment variables and give you an allocation on some compute nodes. But it just sets some environment variables and the user's left to type ENV and grep out for what his node list is and then interactively log into it and start using it. We don't support this model yet. When this happens, Spank doesn't behave the way I would like it to behave. So I've had trouble getting the image to actually download in an s-alloc context. And we're also starting to look... You've heard at other talks that people are talking about image workflows. We're not just talking about jobs anymore. Sorry, job workflows. We're not just talking about isolated jobs. We're not talking about parametric studies. We're talking about complex pipelines now in HPC where maybe a guy wants to run a simulation out of a container, but he also wants to run some analysis on that simulation because he's going to parametrically adjust his next job submission. So we want to support multiple images in one allocation. Then we have to come up with a way to report this back to the user's environment somehow so that the user's script can intelligently make use of the allocation of the nodes, the images, where they've landed in the cluster. And there's one thing that's been driving me nuts and I'm hoping someone solves this. I know that Keystone supports X509 certificates for authentication. I haven't got it to work yet. I don't want to store a password for the proxy user on my system. I'd much rather protect a certificate somewhere. I can revoke a certificate if I think a user's got it. I don't have to run around and change passwords all over the place. I really want to reimplement the authentication side of the plug-ins proxy service to use certificates instead. And a few acknowledgments. Like I said, Reid Podorsky is the Charlie Cloud author. He's a great guy. He's incredibly patient. I waste a lot of his time. Steve's Senators are DRM expert. If you ever get in a room with Steve and you come up with an interesting problem, especially around batch management and job control, you're done for the day. Steve and you will be sitting down with coffee and that'll be it. That'll be ours. And you don't hear this enough at a conference like this, but the OpenStack and Slurm developers are wonderful. If you've ever written anything to talk to like the Wikipedia API, you're done. I mean, your life is over. The OpenStack developer documentation is fantastic. They give you curl examples on how to build your JSON structures to send something off to the service and get data back. And likewise, the Slurm developers have done a wonderful job documenting things. I'm an open source, right? So you've got the code. Why should I document? The Slurm guys documented enough to get you started, and their code is actually very well documented. So you start digging into their helper functions. Since they're so heavily threaded, they've written wrappers on all kinds of high-latency functions to just handle it for you. You don't have to get down into the dirt too much developing a Slurm plugin. So any questions? Would you mind using the microphone, please? Thank you. Do you have this plugin available in GitHub? Not yet. So I want to do some clean-up work. The threadedness will be an immediate problem until I fix that. Once I have it fixed, once I get through the laboratories tech transfer and licensing folks, I hope to submit it upstream to the Slurm, to SkedMD and the Slurm developers. Thank you. Yeah, this was written on the current stable version of Slurm, so 15088, oh wait, 15089 maybe, but I don't see why it wouldn't work on an older version too. So the question is about image versioning in the environment. I'm not. So I let the users control the image entirely so they can add a version number to a name. I haven't integrated anything beyond what you've seen, so if there are attributes, sort of metadata and images in glance, I haven't got there yet. This is largely a prototype for the workflow that we want to do. It's functional, so we can put users on it. We've spent too much time building toys to watch users play with them for a week and walk away, throw them out of their pram and just pitch a fit. So that's not really kind of my users, but they don't pitch a fit. So we've done enough to get it in the prototype stage. We'll put friendly users on it. We'll see what they think. They may come back and say, you know, you've done it again. You've created this thing and you've given me flexible stacks, but I really want an open stack cloud. I know you're trying to bolt on bits of open stack to improve flexibility of an HPC cluster. Stop it. We'll find out. Thank you.