 Welcome to another edition of RCE. I'm your host, Brock Palin. I have with me my co-host, Jeff Squire from Cisco and the OpenMPI project. Hey, good afternoon, Brock. How's it going? Hey, Jeff, having a good time. We have with us today two guys from the Slurm project, which I'm not exactly sure if it's a resource manager or a scheduler or both, but we're going to find out. We have Mo Jetty. Hi. You want a little bit of background here. So I started at Livermore in 1980 doing operating systems development on the Cray-1 computer with CTSS. That monster had eight megabytes of RAM and 80 megahertz clock. After working on that for about five years, it worked in mass storage for a few years and then went back to operating systems work and scheduling, including getting scheduling, some work on Globus, distributed computing, and I've been working on Slurm since 2001. OK, cool. Thanks. And we have with you your co-worker, Danny Auble. That's right, yeah. I've been working here at the lab since 01. At first I started working with networking stuff and around the 04 or 05 time range, I moved into the Slurm project primarily for the Blue Gene port. And I haven't let go of him since. Yeah, that's what we've been doing. Well, don't turn this into a performance review or something like that right away. So you guys are both at the Livermore right now? Yeah. OK, so rolling into this, what exactly does Slurm stand for? Slurm, it started out as a simple Linux utility for resource management is what it stands for now. But I mean, as most people know, it's not just for Linux anymore. It also works for AIX and other flavors of Linux. And it has since gone from simple to we think about replacing that with scalable. But that's really what it is, is a resource manager that just recently with the 2.0 release has been getting some scheduling logic. OK, so up till 2.0, it was just a resource manager? It had a really minor backfill or FIFO scheduler. But for the most majority of the people out there, they usually ran it with some sort of scheduler on top of it like Moab or Maui or some homegrown scheduler. OK, and how did Slurm get started? What's a little bit of the background and history of this? Well, it started back in 2001 when we were getting away from proprietary operating systems and software into the world of Linux. And we looked at where there were shortcomings in the Linux environment, what needed some work. And we found two areas that we thought were important to address scalability issues mostly. One was a global file system. And we got involved in the Luster effort for that. And the other was in resource management. And from that, we started the Slurm effort with Linux networks. OK, so let me ask you this. Why taking this in a historical context here? I mean, why did the world need another resource manager at that time? Why did you do it? Did you do it mostly just for the sake of having an open source one? Was it to get away from proprietary? Or were there specific features that were not addressed by the schedulers at the time? I'm sorry, the resource managers at the time? And how did that come about? What was the decision-making process to embark on this project? Well, there were several issues. One issue was something open source. We were using Quadrix RMS at the time as a resource manager on most of our systems and load level on some of the IBM systems, both of which are proprietary. The other issue was scalability. We were looking not to scale to tens or hundreds of nodes. But those of you who know, Mark Singer was saying, that's not nearly enough. We need to scale to tens of thousands of nodes, which meant basically rethinking everything in terms of resource management. A lot of the experts at the time felt that you couldn't go above hundreds of nodes and thousands of processors and hear Mark's talking about something, a couple of orders of magnitude larger than that. And of course, that's the way it went before it's going right now anyway. So who is Mark Singer? Mark is deputy director of Lyftmore Computing and he's been involved in procuring our latest and greatest machines all along the line, the ASCII series and Linux machines at very large scale. The other thing, the other driving factor was we wanted something that would be highly portable, not just for Quadrix or a Federation switch and the IBM, but for any sort of switch, any sort of architecture. And in order to do that, we felt it was important to structure the code very carefully and make extensive use of plugins, which we do. So if you want to use a new network or a new topology or whatever, it's largely a matter of developing a new plugin for Slurm and leaving the kernel alone, which has been proven a very, very flexible design. I'm a big fan of plugins myself since OpenMPI is all about plugins too. Can you tell us a little bit about what kind of plugins do you have? I mean, how did you separate out the functionality into different types of plugins and what can you do with plugins in Slurm? Well, a lot of plugins came about by, I guess, necessity. So for instance, almost everything that's required for a blue gene system is inside of a plugin and that's called the select plugin. And so what it does is it will select the correct nodes and so there's like a select linear for regular systems. And just recently in the 2.0, there's been a topology plugin added that will base around like three dimensional topology or other types of topology that come down the road. And that wasn't needed until like just now when we're dealing with like a Cray constellation system or something like that. Sun constellation. Oh, yeah, sorry, sun, sun constellation. But it's also usable on a Cray similar architecture. Well, you mean oracle constellation, right? Sorry, I couldn't resist, yeah. Yeah, Jeff, you're hilarious. Yeah. But yeah, so there's plugins like which type of database are you going to use? So there's like a MySQL plugin or a Postgres plugin. And all these things are called from the main code in different parts of the code. But if there wasn't a need for them, then they just wouldn't exist until there was a need for it. And so the way that SLRM is made is you can create a new type of plugin and just start calling it. Like at first the topology was just part of the SLRM kernel and then all the parts that were just like the linear topology or the basic one were moved into a basic plugin. And then we're able to just by calling like the plugin calls the hooks, you're able to branch that out and to do different things based on your system. And one of the recent plugins that Danny developed was we're scheduling for prioritizing workload. So I wouldn't get into that a little bit later if you'd like, but basically we've been taking the kernel and pushing as much as we can out the plugins because that seems like the most flexible design. And when somebody's building an installing SLRM on their system, it allows them to install a very simple environment if they want to in just a couple, three minutes. In fact, there's an article on Linux Magazine recently about building an installing a cluster in I don't know, 23 minutes or something like that and SLRM took two or three minutes of that. And if you want something that's a very sophisticated scheduler, you can just pick and choose the right building blocks and make a fairly sophisticated scheduler using SLRM. Okay, now are you guys looking at doing modules mostly in-house or are you trying to encourage an open-source community around creating modules or what are your plans there? Well, we obviously are doing coding in-house right now and we don't ever turn down people wanting to help. Of course. Because right now it's primarily Mo and I that do the coding and there's obviously other people out there that help all over the world actually. But we're the primary gatekeepers I guess you would say of the SLRM code and we definitely are developing things that suit our needs but that doesn't mean that there's not other people's needs out there that we don't look at. Oh yeah, we get a lot of help from computer centers and universities all over and in fact, the world of open source software makes for some pretty strange bedfellows. We've got among our major contributors the National University of Defense Technology in China. We've got a Nuclear Weapons Lab in Russia. We've got CEA in France, their Atomic Energy Center. I would imagine that makes some pretty interesting paperwork for you to fill out work in there in a United States DOE lab. Yeah, it does make for some interesting paperwork and I had a polygraph exam a few months ago that was especially interesting. Well, who knew kids, when you sign up to be a computer scientist, these kinds of things might happen to you as well. All right, so let me ask a derived question here. What's your software license for SLRM and how does that affect both internal and external software development, say, for example, in the plug-in realm? The license that we use is the GPL version two license. So anything that you contribute will probably need to be compliant with that. Most people don't have problems with that. Linking to our code would also mean that you have to sort of comply with that license also, which is a problem for commercials like schedulers like Moab, where we have a plug-in just to handle the communication between schedules such as that of a wiki plug-in. But in terms of other development, HP and Bolt are named two companies that have been major SLRM developers. They've wanted SLRM capabilities expanded in certain ways for their customer base and their needs are definitely different from ours. And they've got more manpower they can potentially apply. So because of the GPL license, when they develop a new SLRM plug-in or other SLRM enhancements, they have to feed it back to us. And in many instances, the work that they've done has eventually helped live more in other people. So when people develop software and contribute it to SLRM, almost always it ends up in the main code base. So stepping back a little bit, they mentioned how you can't link against it, but you can use something called the wiki plug-in. That lets you control SLRM over like a socket or something really simple, right? You don't have to actually use your library. So that's like how Moab controls it. This wiki interface actually lets say someone's doing some research on some scheduling algorithms. They can very easily write something that can control SLRM, right? Right, in the case of Moab and Maui, they use what Danny mentioned, a wiki interface that's basically just some XML over going over a socket. And for people who are developing their own scheduling software, their best bet is probably to develop a SLRM plug-in. And because of the terms of the GPL license, if it's just something they want to use in-house, they can do that. But a lot of people who develop stuff in-house or for research projects, they'll feed it back to us. And if it's something that we feel is of general use, we'll pick that up and incorporate that into the main code base. So if you guys, like you've mentioned bull and stuff from a lot of schools, like what fraction of the existing code you say actually comes from sources outside, Lawrence Livermore. I'm gonna guess somewhere around 30% comes from outside and 70% inside. It's actually quite a lot. Yeah. It's actually quite a lot. So that's pretty impressive. So you got quite a few people who are actually getting their hands dirty and is it mostly just all in the plug-ins? Were you seeing most of external fingerprints? Most of it's in plug-ins. And because of how that's structured, it lets people do development work without having to learn the kernel of the slurm code and all the plug-ins are well-documented and basically all somebody needs to do is develop a library with a handful of functions. What's the weirdest plug-in, like the most unexpected plug-in you've ever seen somebody come up with? Well, actually one of the most surprising ones, and I won't say it's weird, but originally slurm, again, using the S, they're the simple aspect of it, it originally just allocated whole nodes to jobs. And HP for their own customers needed to be able to allocate individual cores or sockets on clusters. So they did all that work within a plug-in and made very, very few changes to the kernel of the code base. And that was 4,500 lines of code that you'd think needed to be in the kernel, but they realized all that in a plug-in. Other people, there's been work in scheduling plug-ins and accounting, record keeping type plug-ins. Okay, so again, another derivative question. I keep doing this. Dig down a little deeper here. So you have all these contributors. Do you run this as an open-source project or do people mainly just submit stuff to Moe and say, here you go, we'd love to see you put this in? Or do they see themselves as full peers and you guys make decisions together and release decisions together and testing together and things like that? I mean, how do you run this as a project? And it's always, I'd love to hear about how other projects run themselves since I'm involved in an open-source project, but everybody's got a different workflow and a different way of doing things. So I'm just curious to hear what yours is. Well, I'd be interested in hearing what yours is for the OpenMTI sometime, but basically, most of our contributions come from the corporate world, HP and full, mostly. So when we're making plans for a new major release, we'll contact them, find out what their requirements are in terms of timing, in terms of capabilities, they're interested in adding, and try to work out something with them in terms of how it happens, potential, I mean, sometimes somebody will say, well, I want to work on this and we're able to hook them up with somebody else who's interested in working in the same area. And the more people that are involved, typically the better the end product is. And then people will go off and work on designs, iterate with us on those designs and potentially other people, and then send in a patch. After we get the patch and get it integrated, hopefully we get some testing done as part of that. We've got a pretty extensive automated test suite, so it's really nice to get tests as part of the code patch. Cool, all right, well, I have to put in a plug for an earlier RCE cast, if you want to hear how the OpenMPI project runs, listen to RCE number one, I believe, I think we talked about that a bit in there. Little self-serving plug, had to do it. Well, they're all there for downloads, if anybody wants them, so they're all still there. That's right. Moving on to some of the gear this is actually running on, what's some of the larger systems, or the largest public system, I know you guys were mentioning a lot of bomb labs there, but what's some of the larger systems you actually know Slurms aren't running on? Well, we know it's running on our big blue gene here, and how big is that? According to the top to 500 there, it's got around 212,000 cores, so it's running on, it's running on a few, we have a few blue gene systems here, we have a couple of P systems, well, we have just one P system, I suppose, and we have a couple of Ls that are quite large, I guess, in most people's standards. And I think that it's hard to guesstimate, but we figure that about 40% of the top 500 computers in the world that are on the list anyway are running Slurms. In terms of a conventional Linux-type cluster, I think the biggest system it's running on is Thunderbird at Sandia, and that's, what, about 4,400 nodes? Yeah, about 4,500 actually. Okay, and do you guys do anything unique to Slurm about the way you actually, it kind of start up a job? We had the tort guys on, they talked about, they kind of had this join model, where all the PBS moms came together. I was looking, you gotta do something a little bit different than that to try to get to these larger scales. Yeah, so Slurm works off of a, like a tree fan out type thing, and the tree is built, there's no static links with the, so the way Slurm is, let's just give you a little hierarchical view of what Slurm is. We have a controller that's on one node, and then each node has what's called a Slurm D, like a Slurm Damon, that sits there, and looks for information from either the controller or from a job launch, and so when the controller starts up a job or wants to communicate with anybody, he spans out a tree based on a configurable width, based on how big your system is, and so you can dial that to be whatever you want to get the fastest communication, and then they spawn their answers back up through the tree, and then the tree gets torn down behind itself, and the same thing happens with job launches, is whenever a job launch comes through, it spans out, and so that reduces quite a bit of traffic on your network, and it speeds things up a heck of a lot faster. It's also fault-polarant, so if a node drops out, it'll work around, it'll self-heal, so that definitely is a big boost to scalability for Slurm. Okay, and then actually, almost Jeff would be a better answer to this maybe, but you guys know, does a Slurm have an interface for helping MPI startups? Like the actual, similar to the TM interface on the PBS? Slurm supports quite a few different flavors of MPI, and how those MPIs start, it really depends on what flavor of MPI you've got. We support PMI for MPH2 and MbopH2 for OpenMPI, something I've just recently added is some support for port reservations and allocating ports on nodes so that we'll be able to start up all those tasks immediately, and they'll have port and host information right out of the chute and be able to open up communications to each other immediately. So what's, I mean, we're starting to talk about some of the specific features about Slurm here. What makes Slurm great? I mean, I'm a little biased because I run Slurm on my Cisco development cluster for all my MPI testing, and so I know the things that I like about Slurm and why I chose Slurm, but for someone who's never used it and maybe looking at getting a cluster of their own, whether it's small or mid-sized or large or whatever, why would they choose Slurm over something else? What is cool about Slurm? What are the great features that are genuinely useful to people? Well, both to people and to administrators. Well, I would say, referencing that article that Moe was talking about just a few minutes ago, it's easy, it's not the hardest part of setting your cluster up. It really is pretty simple to set your cluster up. We've heard it from multiple people. Sometimes we don't even realize that sites are using Slurm until months after they've already set it up and standardized it on their whole cluster set. And then they're like, oh, by the way, we have this question about this random spot on Slurm. We're like, oh, when did you guys start using that type of thing? And so that end of itself, the whole simple aspect of it, I mean, it can easily get complex if you wanted to. But if you're just at home and you had like three or four boxes you wanted to throw together, it's really simple to set the whole thing up. That's just one aspect. The whole scalability issue that we just talked about before. I mean, I don't foresee a system in the near future or close to the near future that Slurm won't scale to. And given my experience with computers, the sort of state of the art machines we deal with here at the labs are the type of machines that people will see in their homes in a couple of decades. So when your grand kid asks for the PlayStation 18 or whatever that has 100,000 cores, it might very well be running Slurm. That would be pretty funny. Sony Slurm, you know. Well, you gotta manage those of course somehow, you know. That's true. They don't do it on themselves. So still, could you break it down a little bit better? I mean, what's unique about Slurm? What's, what are some of the nice features that people like to use? I mean, why do people choose Slurm other than just the simplicity? Simplicity, scalability, the plugin model. In the event that if you wanna do development on your own or tweak things, it's very easy to do that. The commands are very flexible too. For example, if you wanna get information about jobs or the nodes in the system, you can format the output any way you want, report any fields that you're interested in, sort any way you want to. We've got Perl APIs available. So if you wanna roll your own tools, it's easy to do that. That's actually made it easy. There's a contribution directory that we have that contains such things as like Torque wrappers. So you can, in theory, switch your system from Torque to Slurm and put these Perl wrappers that use the Slurm APIs and that your users don't even have to know that you've switched. Cool. You could do that with other schedules, you're a scheduler, do you think? Yeah, true. Oh, that's one. Slowly switching the world one cluster at a time. So from an admin's point of view, if you did wanna go to Slurm and you're like, oh, I don't wanna have any heartburn with my users, you could easily write these things so that our user ones that already exist and your users might not even need to know that you switched it and there you go. Huh, yeah, no. So the plugins, what are those written in? You mentioned Perl in there. Can I actually write like a, if I just need to do a little thing and I'm more versed in Perl than I am in C or something. You know, what are the plugins normally written in and then what can they be written in? Well, the code base is ANSI-C and so that's what most all of the Slurm proper code is. Now, we talked about the Perl APIs, those were actually a gift from our friends in China that they interface directly with Slurm through a PM that is based off of a .xm which is linking directly to the .so from Slurm. Look at me and all my dots. But it's real alphabet soup. Yeah, so the Perl APIs are extremely handy for if you wanna do one thing or the other and it's really small, you can interface directly with Slurm, get structures back in the form of Perl and not have to worry about like having to look at output from a existing Slurm function. You can just get the raw data yourself. That's really handy, because every now and then I wish I could just do a little tweak. Yeah, so you would be able to do that. Yeah, I can't say I go that far, but on my own cluster I have a couple of aliases that use some of the very flexible commands that you mentioned earlier that I wanna see the Q output but I wanna see it in a very specific way, right? Or I wanna list, you know, ordered on a specific field or something like that and it's nice. It's very definitely nice to have the power to be able to do that without touching any source code at all. It's just I can tweak the command line. Or I know if I've known about the Perl abilities there for a little while, but I haven't really explored anything with it because I haven't really needed to simply because even just the command line itself was flexible for what I needed to do and that was cool. And there are environment variables that can control the output and formatting most of the commands too. So if you just wanna always look at it in a certain way, rather than an alias, you could just set up an environment variable that specifies which fields you wanna see and what order or whatever. Cool. Well, I'm a C coder by definition, which means I changed my mind 10 times a day. So I've got 20 different ways that I like to look at the Q. So I've got about that many aliases. Oh, that works fine too. Yeah. So do you guys do any type of Kerberos ticket passing for say, AFS, which has been a problem of mine for years? Which is as long I've been doing this? The people are at CEA are working on that right now. It's not available to the public, but hopefully it will be soon. And there's- I heard they actually haven't working, correct? They haven't working correctly and there's an entry on source forage for it. I believe it's called OX AWKS, but I don't know if the code is out there today. And it's gonna take the form of a Slurm plugin. Okay. So we already mentioned MPI. Is there any other tools out there? Like is there like a grid system or say one of these cloud services using Slurm? What other type of tools hook directly into Slurm that you've seen out there? Well, LSF and Moab and Maui that we've mentioned, Danny mentioned the Torq or PBS command wrappers. And as far as I know, that will work for Globis. So if you've got some tool that works with Torq or PBS today, you could put Slurm in place of PBS or Torq and put in the command wrappers and you should be good to go. Let me dive a little deeper on that. So I think what Brock was getting at there was he saying, you know, MPI, you support a different bunch of job startups for MPI and various MPIs hook into Slurm in different ways as the MPI or they fork exact SRUN or something directly or whatever. Are there any other tools besides schedulers and besides MPI say like system administrator tools or resource administrator tools or something that a human would actually interact with and under the covers it's just talking to Slurm directly? Well, we've got a number of local tools that interface to it in terms of generally available things. I'm guessing that's the same everywhere though. I mean, you talked to other places and they've developed a lot of things that just aren't, that are based off of something they've already used in the past and they've just ported it to Slurm so to speak. So it's hard to say that there's or one way or the other on like commercial software out there that does what you're talking about. I don't know. But you can have monitors that, you know, check for temperatures or voltages or whatever and can easily drain nodes if there are problems anticipated. That's one of the areas that we're starting to get into some more watching for signs of impending failures and being able to drain nodes and take other actions if we think something might be going bad. Oh, so you can actually have Slurm monitor the health of a node? Well, not Slurm directly but other tools monitoring it and they can talk to Slurm if they see something abnormal happening and when a node is say drained in Slurm you can include in that a description of what happens. So you can say no drained at such and such a time because I'm seeing the temperature rise or whatever and Slurm also has a trigger mechanism so that when that event occurs it could trigger an email to be sent say a system administrator to investigate it when he comes in in the morning. Okay, so actually that's a pretty good lead-in to what's new in Slurm 2.0. So you mentioned that 2.0 I think as of this recording we're still in our seat candidate but by the time this goes, well, what's your timeline for actually releasing 2.0 and what's new in it? Well, right now we have a few, it's probably gonna be the timeline is gonna probably be about two weeks probably right now is what we're looking at within two weeks. That's obviously a shut in the dark guess but that's what our plans are right now. So you're aiming towards the end of May is what you're saying? Yeah, let's say May 20th or so. Yeah, probably right before your Memorial Day weekend so you know, you can load it up on your... Perfect, release it Friday and then disappear for three days. Yeah, see you later. Yeah. So that's our plans on right now. The whole release candidate, this is actually gonna bring up something else that we're changing our release format. In the past, if anyone's kept track with Slurm you could get quite a bit different Slurm in the minor releases like the 1.3.1 to the 1.3.2 could have quite a bit more functionality in it. We're actually changing that model so that the minor releases will only contain bug fixes. Just some background on that. We're really forced into that model because our environment here was changing so quickly that we needed to make major changes fairly quickly and our management here wouldn't let us roll out major releases. So basically all the APIs needed to stay the same although potentially some significant changes were happening under the cover in terms of functionality. Yeah, when I say minor, I actually mean macro. So yeah, sorry about that. I mean macro. But yeah, so starting with 2.0, the only releases that will be 2.0 will be bug fixes and feature enhancements will be in the next release which will be 2.1, which is, I think right now we're setting fall, fall of 09. Yeah. But as for the change between 1.3 and 2.0, the main differences, the scheduling logic which we talked earlier about is the ability to schedule on multi, which is called a multi-factor scheduler. You can do it on like age, job size, fair share. It's based off of, most of the guts of it is based off of the database integration that we started in 1.3 with the accounting and the accounting's done off of a fair share hierarchy so you can have multiple levels of density in the fair share, which is pretty unique and nice to do inside this form. So that's one thing. So just to give a little example, so for example, you used to have different projects and divide up the resources at each level so chemistry gets 30% of your resources and physics gets 40% or whatever and an important factor in that is we have coordinators at each level. We've got individuals that are identified that can actually control resource allocations of the users or groups that are under their control so you don't have to call up a hotline saying I wanna change Joe's allocation from 5% to 10% of our resources. He can just do it himself. Or if they have a trouble user in their group that just won't listen to them, they can just dumb down their fair share allocation so it just don't fit the cycles as everyone else in the group. So that's one of the major reasons that we went from 1.3 to 2.0 as opposed to just 1.4. Let me ask you a clarification question on that. Are you gonna be preserving the simple side as well for users like me? I mean, I essentially have a single user cluster. I mean, it's me. I have no need for fair share. A trivial backfill, it really is all that I need. Is that stuff still gonna be there? For you, it turns out that there's a plugin because all this was done in a plugin surprise but a plugin called basic. Oh no, no, we're gonna name one Jeff, aren't we? Yeah, yeah, just Jeff. Yeah. For everyone else. Make sure it only works about 50% of the time. Yeah. Yeah. Sweet. But yeah, all of the scheduling logic for this is called inside of a plugin called the priority plugin. And so once again, what's in the basic plugin which is the basic priority plugin is was once in the slurm kernel and then moved into this new plugin and then the new plugin that does all this magic about fair share and just mucks around with the priority is called the multi-factor plugin. And so the default is basic and so you'd have to turn this on. So you, Jeff, running a simple cluster wouldn't even need to know about this stuff. Yeah, we definitely wanted to keep it simple for somebody who just wants basic job scheduling, resource management in a simple cluster. Yeah. Yeah. So you won't see any difference. Yeah, you could just switch over and your old slurm.conf file will suit you just fine. Yeah, excellent. So if you guys looked at doing anything like, there's a lot of talk about like powering down nodes and VMs and stuff like that. Are you guys looking at doing anything like that? Subject very near and dear to my own heart. Well, you're in luck because that actually is in this release also. Yeah, we've just added logic to power down nodes and I know Jeff's been kicking the tires on that and has filed a couple of problem reports for me to look at but unfortunately given the size of the code and the number of people involved and other things happening, I haven't gotten around to it quite yet, Jeff, but it's in one left to the left. Yeah. That's okay because I mean, we're getting pressures internally here in Cisco like, oh, we need to be green, turn off resources when you're not using them and it's also just, it's just the right thing to do, because my, like I said, I have a fairly unique single user cluster and most of the time it's running stuff but there are times when I'm just not doing anything and it would be, to be honest, I'm too busy and I'll never remember to turn off the nodes by myself but if my resource manager can turn them off and then power them back on on demand, boy, that would just be awesome on so many levels. So I think it's fantastic that this possibility is even partially there right now and I am, well, I'm Mr. Open Source Guy so I'm very happy to help and help to bug this stuff out so that this becomes a useful utility for everybody because I just love it. We do appreciate your help there and you talked about a small classroom in here, we turn on a new cluster and it can dim all lights between here and Hoover Dam so the greener we can be and power down nodes in the weekends if they're idle, it could make a really big difference truly. Yeah, huge, especially when if your power builds like millions of dollars a month or something. Yeah, we actually take a little bit different approach to that, we try to make types of scheduling policies so flexible that, you know, basically everything's in use always instead of just sitting there idling. We've kind of gone the other direction and we just try to minimize the amount of hardware we have. If you've got enough work, I mean, sometimes in the weekends where we go idle, so. Oh, you're so lucky. It depends on your environment. I'll be happy. Yeah, and it can happen for me too. You know, my cluster can go idle for a variety of different reasons. Like, you know, my cluster is mainly used for regression testing of OpenMPI but sometimes, you know, somebody will commit a compiler to the tree and it just won't build that night and therefore I've got 23 and a half hours to do nothing because there's nothing to test. And so, you know, sometimes the work will just get flashed. Yeah, there's a variety of reasons when I can go idle. So, you know, I think it's a, this will be great functionality, like I said. Yeah, so yeah, once again, we appreciate anyone and everyone testing our code and Jeff, you've definitely helped us out on it quite a bit. Well, do me a favor and I know I don't wanna touch on a religious topic here but please don't go to GPL3 because that would preclude me from looking or touching the code at all. We've actually had that discussion and I don't think that that's the way we're going right now. From a personal note, I think that's great. Everybody's got their own opinion and I certainly don't wanna step on anybody's opinion but that is most useful to me. Good, good. We'll definitely write that one down. Well, on something similar to this, in the 2.0 release, we actually can boot different operating systems on the nodes. We can down, like for say some scientist wants, oh, I want this version of Linux or I want this kind of OS running on the nodes. We can set it up so that Slarm will go down the nodes and then boot that kernel or something. It's leveraging a lot of that same logic that we do for powering down nodes. So do you do that just by messing around with the bootloader or do you actually reloading the machine? We just run, you know, a system administrator can define a Slarm script to run based upon what the user requests and at this point it's pretty simple if you wanna limit certain operating systems to certain users or limit the capabilities of certain people, it's all pushed out to user land at this point. It's just a matter of running a script and that either booting a new operating system or returning an error, for example. Okay, so basically the site could do it any way they wanted to. Yeah. Flexible. We might add more infrastructure in Slarm down the road, but for right now, that seems to satisfy customer requirements and we'll just see how that goes. Yeah, no, I mean, that's really a flexible whether somebody's using like an XCAT disk list system or there's many different ways that a cluster can be built that just ends up booting Linux in the end. So Slarm can handle that. It's just, you're not pigeonholed in any one site into you have to boot a machine this way. So that's quite nice. That's flexible. Yeah, I have to say that my experience with this administrator type too is they don't wanna be pigeonholed. They just say, give me a script and then I can have that script do whatever I wanna do. So just give me a hook and with that hook I will move my own mountains, but you don't care or know how I want to move my mountain. Yeah, I am kind of that way. Yeah, I'll admit that I fall into that camp. So, go ahead. Just to say, what's beyond 2.0? What's coming up in version, you know, Slarm 10? Well, I actually wanted to touch on a couple of other things that are in 2.0. Oh, okay, go ahead. Support for Red Checkpoint restart using the Berkeley Lab Checkpoint restart and also Advanced Reservation. So you can say, you know, reserve 16 nodes for user Bob every day at noon or whatever. One cool thing about the Advanced Reservation is you can set flags for like maintenance or something. And so when a node is downed in the accounting land, this is one of the requirements from other labs was that you say that this was a planned downtime or this is unplanned downtime. So in the Slarm Accounting, you can say, okay, well, I have a reservation here for maintenance and then all the downed nodes that happened during that time in that reservation won't be counted against your system utilization as like, oh, something happened to the system. This was a known quantity that was gonna happen. Right, and these reservations are all tied into the accounting like Danny said. So if you reserve 16 nodes for user Bob at noon and he's out to lunch and forgets to use it, we still charge him for it or at least account for it. Yeah, and you can get reports on this through the S report tool about what the utilization of your reservations were or your cluster or whatever. It's good to know that the tax man even exists for your cluster as well. That's right, we get our share. Yeah. So getting back to your original question there about what's coming down the pike, I mean, given our limited manpower, I mean, at Livermore we're really focused on what our internal needs are and a couple of those needs are, we've got a procurement called for a Peloton system, it's gonna be a 20 petaflop system and we're looking to get a Blue Jean queue for that. So there'll be some fairly significant changes required for that. Hopefully we can put all that in a plug-in. Which will make it easier. We're doing some work of failure, anticipation that I already mentioned. The other thing is Linux containers so that we can better constrain jobs to certain resources especially in terms of memory use. And we're also looking to, now we've gotten into scheduling but we're also looking to be able to support an enterprise-wide environment. I mean, you can run accounting and see what jobs were run throughout the environment but right now using Slurm, you can't submit a job on one machine on another machine. So we're going to start looking at being able to execute some of these Slurm commands to see the environment on the different clusters and move jobs from cluster to cluster. I don't see being able to take on full-blown enterprise-wide scheduling. I'll LSF in the immediate future but they've got 500 people and we've got just a few. There's definitely more work happening in terms of scheduling and accounting happening from outside of Livermore for people who want to take an open source route in that area. And, you know, Slurm is very widespread. I'm not sure what other large machines are coming down the pike at other centers like Sandy and Los Alamos but support for hybrid machines and who knows what down the road. Okay, as a developer, I mentioned earlier, it's always interesting for me to hear how other projects open source in particular run themselves and whatnot but another question I always have to ask is, what do you guys use for version control? What's your source control tree for the Slurm code base? We use Subversion. Our main repository is actually running 1.4 right now but I believe that that's going to change to 1.5 pretty soon. Okay, any particular reason you guys went with Subversion? We actually started with CVS and switched over Subversion. I'm not really sure what the methodology behind it but I personally like Subversion a lot better than CVS. Yeah, this is a common story we hear quite a bit. People started it with CVS. The same is true for OpenMPI, we started with CVS and once Subversion hit 1.0, we switched and have not looked back until actually very recently. But yeah, we were thrilled to move from CVS to Subversion. That was really good. So guys, thanks a lot for taking some time out and speaking with us. This show will be up soon and you can find it off the website, www.rce-cast.com. There's an RSS feed, iTunes, subscribe. Also, before we go, where can we find Slurm and where can people download that? Slurm is available from Sourceforge and our website is at Livermore, let's see, HTTPS, I'm gonna have you edit this out here for a second. Okay, HTTPS, computing.lnl.gov.linux.slurm. Or just Google Slurm and you shouldn't have any trouble finding it, although it might get a little confused with the Futurama beverage by the way. If you Google for Slurm and Linux, it'll probably come up. Yeah. If you Google for Slurm, if you actually type Slurm in your URL on Firefox, it usually brings our side up. That's what the case on my system is anyway. Cool. Always good to be the number one Google search result. But if you type it into Google, it's easily in the top there someplace. It's up to me. Okay, cool. I mean, you guys have a boss every year at SE? Yeah, we had one last year and the year before and hopefully this coming year we'll be giving a tutorial, but we'll have to see the results of our proposal. Okay. Okay, yeah, because those bosses are always a good place to kind of pick up and learn some new stuff, so I definitely enjoy those every year, so. Well, there's so much that's changed in Slurm 2.0 that I think a tutorial should be very useful. Okay, okay, well, cool. Thanks a lot guys for taking some time out to speak with us. Thanks a lot for taking some time. Appreciate your time today, thanks guys. Thank you. Thank you. Thank you. Bye.