 Welcome to another edition of RCE. I am host Brock Palin and I have with me my co-host Jeff Squires from Cisco Greetings, how's it going Brock? All right Today we are speaking about the torque resource manager. We have with us Josh buta kofa from cluster resources Josh Hi And we have a case sandgren from HPC to N Hi So I have to ask before we even start here. This is kind of a traditional thing in the podcast here I'm I'm absolutely positive that we're pronouncing your name wrong there Could you could you tell us how to pronounce your name properly? Oh Get some to get in Is the correct one? Yeah? All right? Well, we'll try but we are ugly Americans So we'll probably get us wrong and you'll have to forgive us for that. Oh, yeah Okay guys, so um first 20 of you give us a quick rundown exactly what torque is for people who have maybe heard of it But not sure exactly what it does I guess I can talk about that Torque as you said as a resource manager. So in the HPC industry resource managers are Their middleware and they sit above the operating system. They allow jobs to be submitted usually into some sort of queues or a batch system and then those jobs are Migrated out to compute hosts or compute nodes that are usually also running a daemon that's part of the resource manager Those jobs are then executed the daemon monitors the success of the job or its progress and then when it completes or fails it reports back to usually a head node and then You can use commands or the users can use commands or the scheduler can interface with the resource manager to find out The success of the job the resource manager also is of course as the name implies in charge of the resources it monitors the nodes their health their status Resource usage things like that. So torque is one of many resource managers And it is based on the open PBS code base which has been in use for Decade or more now so Okay, I wonder if you could give us some so you mentioned open PBS there I wonder if you could give us a little bit of a history because there is a bit of a tangled history of Where all these resource managers come from and who inherited code from what and what ideas and then there was a Bit of a split within this code base itself and whatnot. So could you give us the the history of torque? Yeah, I'll do my best. I Won't go into all the details as I've heard them because it's quite lengthy and probably boring but torque as I mentioned came from open PBS and open PBS was was Design and created several years ago by by NASA Ames laboratory and a few other organizations like Lawrence live a more national laboratory Viridian software was also involved. They were a contractor that worked on the open open PBS source code helping those organizations create open PBS and Viridian software was then acquired by Altair Well Altair has the commercial rights to distributing open PBS and therefore they have the PBS pro-product That is a commercial close source version of you know the PBS or portable batch scheduler But torque comes from open PBS and that license on open PBS was liberal enough that allowed People to redistribute it essentially so torque came about because over the years several organizations a lots of organizations lots of users had a Created patches against open PBS to make it more scalable to fix problems to add new features things like that And there were so many patches out there that had really built up to you know, Dave Jackson Who I work with said it was hundreds of patches and So there were websites you could go to that would explain how to get open PBS up to the latest and greatest But to our very convoluted directions like go here and install these five patches then Go and install this patch, but don't install this passion next unless you you know want to work out conflicts So it was really really complicated. So cluster resources or probably super cluster org at the time Started to gather all these patches together and apply them to open PBS and We then started to release this product that we eventually named torque So torque is actually a free product right now like you can just go to the cluster resources website or super cluster and actually download the Torque and build it and use it without any licensing needs are correct That's right. It's open source and there's there's really no risk distribution Limitations so it's totally free as in beer and free as in speech Okay, we just to clarify because I do work for corporate overlords here. There are licensing restrictions It's just that they're very liberal, right? That's correct. Yeah, there is a license on it. Yeah, they are very liberal. Mm-hmm It's not public domain in other words Okay, and so, you know just to you know, what are some of the differences? So if you both came from kind of a common code base PBS pro and torque What what are kind of the differences between them today other than the fact that you know one is you know A paid commercial support model and whatnot, but are there significant feature differences between the two? Between PBS pro and torque you said Yes, yes, there are I'm not an expert on PBS pro But from what I understand as time has gone on they there are more and more features have been put into PBS pro That are not in torque and vice versa for example PBS pro they have a They have a new way of managing resources called virtual nodes and it allows you to have more flexibility in In controlling some of the resources They also have other features that Allow you to What's the word? Allow you to better Yeah, I guess better to find resources that are available like generic resources kind of like I don't know in Maui or Moab Which are schedulers that work with torque and other resource managers. You have this concept of generic resources which represent Entities that are not processes or memory that you can schedule against and PBS pros support for that torque it has Kind of diverge or has features that PBS pro doesn't have in the sense that it has high availability features now It has a lot of features that the community have asked for in for example having a similar generic resource feature set Or also improving the scalability changing the way the communication works things like that So quick question then um so torque itself does not come with a scheduler And you always have to hook it up to an external scheduler or does it actually come with a scheduler? No, it actually does come with a scheduler PBS underbar sked is what we call it. It's a very simple scheduler by default. It's FIFO and there are others Plugins that come with the torque source code that you can use to alter the scheduling behavior Do many people actually use that scheduler um no, I Think that some people do but the vast majority of the people we interact with and maybe we're biased because cluster resources as a scheduler product Most of people we come in contact with do not use PBS get it all Okay, so this is actually a great lead-in. Can you give me a a crystal clear distinction because I this is something That is frequently lost on most users What exactly is the distinction between the scheduler and the resource manager and and you mentioned Maui and Moab earlier I wonder if you could you know classify that in your in your definition there Yeah, so the resource manager This is the way I was explaining to people the resource manager are the hands and the eyes And I guess you could say the feet of a cluster For the software layer. So the resource manager it like I said it watches nodes It it makes sure it understands the availability of nodes, you know, their status other up or they're down the resource users things like that It also actually start the job and monitor the jobs. It will cancel the job. So it handles all those processes that are started or Required by either serial or parallel jobs So it does all of the lower level work that requires to interact directly with operating systems and environments to manage jobs and resources The scheduler on the other hand is at a higher level. It communicates only with the resource manager usually and it tells the Resource manager where to start the jobs and when to start the jobs and it sort of prioritize workload or jobs that people submit and decide Based on policies that have been configured who should get what share of a cluster or supercomputer So going along with the analogy of the so the resource managers the feet the eyes the hands the schedule would then be the brains that orchestrates everything Okay, so if Bob and Sally and Sue all submit jobs The scheduler is the one who who actually decides who gets to run right now and and where right exactly Okay, and so could you describe then what Maui and Moab are and what the differences are between those? Yeah, so Moab Maui and Moab are both schedulers and policy policy engines So when I say that I mean then they do more than just schedule They also enforce a whole bunch of different policies to meet Business goals or goals that you may have on your cluster. So Maui is an open-source version of Well, it's an open-source product. I should say and It supports major scheduling features such as advanced reservations backfill Priority fair share things like that and it's very popular when people want a pure open-source or a cheaper solution They'll install torque and then they'll install Maui and they'll be able to manage most You know small or modest needs HPC resources Moab on the other hand is a commercial product that cluster resources sells And it is very similar to Maui At the beginning, but it is much more advanced than Maui So it's close source and like I said, it is a commercial product and it has the same Features and capabilities of Maui, but it goes above and beyond that So it's a traditional HPC schedule scheduler. It does also have those that policy engine. I mentioned earlier But it can also do grids. So that's a cluster of clusters essentially You can also do a lot of data center and workflow Management so managing complex dependencies things like that. So I won't go on up into all the details But hopefully that's a good enough introduction into the two products. I mentioned No, that that was very good. Okay. How do you guys use as an admin at site? How do you guys personally use torque at your site? Yeah, well, we have a bunch of clusters that we run torque on. It's Basically as George said just for managing the resources, it's pretty straightforward We have everything from This time fairly small hundred and two hundred node clusters up to a 600 node cluster and Yeah It's it's What you're saying It's just it's just uses as the resource manager. We also use Maui as the scheduler, of course. I Find it fascinating that you you say, you know fairly small is 100 to 200 nodes whereas, you know a couple years ago that would have been holy crimini. That's a really large cluster Yeah, yeah I've been around for a while doing this. So Yeah, it just speaks to the state of the tools and and the industry that you know The the state of the art has actually advanced into the well It's pretty common to have hundred two hundred three hundred known clusters these days. Sorry just a little editorial remark there. I Seem to see is also part of the work that's gone into torque verse, you know Census diverged from open PBS is making it work better on these larger systems Yeah, oh, yeah Yeah, definitely in the beginning when we started in around two thousand and two it was Fairly common that torque crashed and and jobs fail to start but today it's Almost flawless in that respect So question for you guys, what what are your roles in the project? What do you guys both do in in the torque project? Well for me, this is Josh. I I'm a principal developer here our software engineer cluster resources. So with torque I've actually been involved Throughout my tenure here at cluster resources, but in the last year and a half I've taken a more active role of kind of overseeing the development. I I do do some development work So I'll fix bugs add features And they're usually customer driven So if we have a customer that has a particular need or bug then I will you know help Solve that problem in source code But we also have two full-time developers here at cluster resources and a and a handful of community developers That I kind of help oversee And in coordinate between to make sure You know the project's going well so I guess my my role is more of a manager or Overseer of the developers here at cluster resources So Josh a quick question Cluster resources do they provide support contracts if somebody should want one for torque? Yes, they do Usually so it's kind of odd if you if you want to support conjure for torque You essentially buy Moab because you get that free when buying when buying Moab and if someone says why don't want more I just want to buy a torque support. We say oh, well, it's the same price So you might as well get Moab too and a lot of people not a lot I should say there's a handful of people that actually just pay for Maui and torque support and You know don't even want to use for whatever reasons the our commercial products Moab and the other products Okay, so jumping back just a second here Okay, I don't think we heard your answer. What what is your role in the in the torque project? Well Officially nothing, but since I've been around for a while I have Made a couple patches during the years mainly memory related stuff We will hopefully some time get down to to And doing some of the Kerberos Pieces that are in progress Okay, so your random open-source developer that periodic does a feature that's useful to you. Yeah Okay, that's the good open-source way, right? Yeah Okay, so I wonder here here's another random question. What are some of the more famous sites? Famous clusters HPC installations and whatnot that are using torque Well, the ones that I'm aware of there's there's tons and we it's interesting because we're not even aware of everyone Who's using torque because it's open-source and we don't require people to register to download torque And it can be redistributed It's it's hard to pin down everyone who's using it, but we do know some of the more famous sites luckily for example Kind of the pride and joy right now is Los Alamos National Laboratory. They have the fastest computer in the world Roadrunner Pedaflop machine and it uses torque I Should say that they are getting torque to work on it. So they don't use it. They just barely Got control of the machine again, and so we're working with them to get torque running on it But they but they haven't using torque to launch Subset of jobs. They haven't yet launched a job for the entire machine, but we're working on that with them Gotcha. Yeah, that's a as an MPI developer, and I'm the open MPI guy, and I'm involved with Roadrunner as well I know that that is a lot harder than it sounds. It sounds like a million jobs. What's the problem? Well, it's actually pretty darn complex because it touches just about every aspect of the machine Yeah, and and we're getting there though We're getting close so and and that's the plan for them to use torque as a resource manager It's also worthy to mention. I think Oak Ridge National Laboratory Who has the second largest machine and? They're running torque on that as well We also have people and all you know not just an HPC. We also have commercial people using torque So X and mobile we have pharmaceutical companies Pretty much every major university has one or two clusters using torque A lot of data centers for example Yahoo, they use torque extensively internally both via hadoop on demand and also just using a traditional, you know clusters and data center setups they have there and so that's an interesting trend actually that torque is being used more and more for data centers and Not just for your traditional HPC cluster that you usually think of So that's really cool So we actually talked to the Hadoop guys a couple of weeks ago on this podcast and I did not realize that Hadoop could be used with torque so you actually Can move the computation around to meet where the data is I mean that's kind of their whole philosophy, right? So I didn't realize that torque could be used in that kind of context Yeah, so Hadoop on demand uses a torque So just plain Hadoop, which is the file system and the you know map reduced Portion of it. It doesn't use torque But Hadoop on demand what it does is uses torque which allowed to basically schedule a set of nodes or allocate a set of nodes And there's a scheduler on top of it and this in this case It's Hadoop or the PBS get it will allocate a set of nodes and then basically a mini Hadoop cluster will be created with those nodes And so torque helps, you know allocate those nodes and start the Hadoop processes on them stuff like that Okay So Hadoop is still a very like compute intensive kind of thing is still like some form of HPC Do you see more use of a torque like you said for traditional data centers? Are you seeing it used for anything? Very outside like the traditional compute No, not very outside I mean at the end of the day There's there's still a compute job that torque is starting and it's usually compute intensive either, you know IO bound or CPU bound and it monitors the job reports back the status things like that There has been some work and research done with torque starting virtual machines and Then you know the compute job itself actually runs inside the virtual machine But then but there again at the end of the day. It's it's still a compute intensive job So Torque isn't doing something anything too exotic. It still does basically what it was designed to do But it's being done by people that you wouldn't traditionally think is HPC folks Okay, so that's that's my point of view from what I've seen so Okay, so back on the really large jobs you mentioned with the Roadrunner that it was really hard to start that many processes at once That's yeah, the job of Torque or is that the job of the MPI launcher or do those two integrate together? Well, it's kind of the job of both of them. So Torque's job is to Well the scheduler will determine what nodes Torque should try to grab or allocate to start a job So Torque will go receive instructions from the scheduler and for example in the example of Roadrunner It will say get a list of 2880 nodes which is the largest job to date that we started on Roadrunner and Torque will then Go out and talk to all of the daemons running out on the compute nodes those two thousand eight hundred and eighty nodes and We'll try to get them all to answer back To the main compute node and say, okay, we're all ready to start We're already we're all together. We're all in a group and once that has happened So we call that a join where they all come together once that's happened then the head compute node Which is called the mother superior It starts the script that is submitted by the user and he's usually I gotta I gotta interrupt you here. You got to explain the term mother superior there Yeah, I thought you're gonna ask about that. So yeah, so the daemons that run out on the compute nodes are called PBS underbar mom and Mom stands for machine oriented mini server. I Don't know why that's called that but that's traditionally what it's been called. So Somewhere along the line someone decided to use Catholic Catholic nunneries as the nomen naming nomenclature for the different part of PBS mom So you have your mother superior Which is in charge of a a group of sisters or a convent and so if you're the the main compute nodes called the mother Superior and the other compute nodes in that group for that job are called sisters and I don't know What the historical reason for that is but that's the way it is and we just we continue to use that At nomenclature now, so Okay, so yeah, the mother superior is the head compute node it it gathers all the sisters together and in the sister group and Once they've all joined and are ready to participate in the job Then the mother superior will run The script submitted by the user and that script usually has an MPI run or an MPI exec in it and then at that point Torque will pass in the host list to MPI and MPI will then go up do its wire up and We'll actually start the job that does the actual compute Interesting stuff. So actually does a computation Yeah, and this is this kind of veers straight into my land as well where you have you know open MPI or or a couple of Others or even some third-party start-ons like the the MPI exec project out of the Ohio Supercomputing Center They actually use an AP a torque API to start processes on the other nodes as opposed to say RSH or SSH so Things work for a variety of uninteresting reasons. They work a little better that way when you use the resource manager to Start all the individual MPI processes rather than RHS SH Yeah, exactly and that API is called the TM interface or the task manager interface and as Jeff mentioned open MPI has native support for that when you compile it correctly if I understand right and This MPI exec is a wrapper for many popular MPI tools that Ohio Supercomputing Center created It's very valuable tool. So yeah, it's unfortunately named because MPI exec That that that word is overloaded in in several different cases. So you say oh we'll use MPI exec Well, which MPI exec do you mean? Yeah, yeah, well, do you know Jeff what came first was it OSC's MPI exec or MPI exec from the MPI industry? MPI exec from the MPI standard. That's why Ohio Supercomputing chose that name I see But then it becomes confusing because several MPI implementations including open MPI We include our own MPI exec which is completely and wholly different than the Ohio State Supercomputing MPI exec wrapper for TM Little bit. I had a little bit of MPI history there for you. Yeah, a little little extra for you So let me ask you this to perhaps a little bit of a leading question here during this whole startup dance here with MPI and TM and so on The the comparisons that as an MPI guy I hear sometimes are like well, jeez, you know, I can I can TM launch Been true at you know half a second across a billion nodes Why does it take MPI so much longer to start up in doing now? So can I can I throw that leading question over to you? Do you know? It's because MPI guys aren't as good as torque There you go, that's the perfect answer. Yeah, no, no, I'm just kidding Well, it's got to do with MPI is much more advanced in its Communication mesh from what I understand So I would assume that the wire up for MPI is Much more involved because once you have because you've got to get the wire We're just right because after the wire ups done just right the communication is going to be much much faster Then it would be trying to do a linear communication, you know using the the torque compute daemons. So the PBS moms As a quiz question, yeah, but yeah, that's right Yeah, so that's that's my understanding and and and also MPI has to you have to deal with network interfaces You know you have infinite band. You've got all kinds of things to worry about. It's a much more complex problem Yeah, we basically have to exchange all this metadata During it's usually during MPI in it so that you know when you do an MPI send or whatever we can open the Q Pair we can open the socket we can open the shared memory whatever, you know is necessary So there's a whole pile of additional information that has to be exchanged during MPI in it after the process has already started and Just that volume of data just takes time to transmit across the network Yeah, but like I said after it's the wire ups done you're able to be much more efficient So we're doing a broadcast, you know is going to use a log in Basting rather than a linear so Yeah, and little known fact who for most MPI implementations This is true as well that the first time you do an MPI send it's a little bit slow because we usually have to Establish the connection and then after that the the connection is established and things go much faster I'm sorry. I didn't mean to drag us off into the weeds here of MPI things But since MPI frequently interacts with torque This is this is a common question that we get asked. Well, why is MPI startup different than process startup? So there's a couple other advantages of starting up processes under TM Well, it's notably I tend to use torque sometimes to monitor resource usage by even a parallel job when instead if it started by SSH RSH, I actually lose track of the resources consumed by the Like the different MPI ranks out there if it started under TM torque kind of keeps track of all that for me, right? Yeah, exactly and the reason it can is because it's an actual child of the PBS mom Damon a child process So it's able to monitor it knows the PID first of all So it's able to monitor its resources to the operating system using that PID But it can also know exactly when the the job completes By calling weight PID and it can also get resource information directly from the operating system because it's it's parent So yeah, there are but there are benefits and most and that's why MPI will usually spawn their processes under TM spawn you can also kill off the The ranks out there if they run over wall clock or something a lot easier instead of having zombie jobs kind of floating out there Yeah, exactly. Yeah, definitely definitely advantage with RSH and SSH You can end up with these zombie jobs because there is no firm control But something like torque can exert positive in complete control over. Oh, you you know your your MPI job crash Well, we'll make sure to clean up all the all the processes for you so that those nodes are clean and ready for a new job Okay, let's let's move on to how torque is a developed You said it's open. How can other people get involved with? Working on torque or if they have a patch they've came up with in the house and they want to submit it back How can they do that? Well, what's what's what's been your experience Aki? Yeah, well It's fairly easy Create the patch make sure it works send it up streams and Garrett will usually take care of it So who's garrick I Keep forgetting his full name garrick staples is full name. Yeah Is an open source. Well, I mean a community developer he's been very involved in torque and From from what's from since it began called torque. He's been very involved He is out of the University of Southern California and he's one of their admins there And so in order to make his job easier He started hacking on torque to make it do what he wanted to do improve it And you know, he's been involved in the community and because of that I hear that His cluster runs really well now So he's always he's always very helpful to the community and like Aki said if you have a patch Garrett will probably look at it either rejected or accept it and then check it in Okay, since I'm an open source developer myself It's always very interesting to me to hear how other projects run I mean you guys have is there kind of a governance panel or is it true? You know distributed open source and an email send a patch and and that's it or is there an upper level? You know someone who sets roadmaps and what kind of features are going to go in in the future and things like that or right? You know, how do you guys run yourselves? Well, it is it's there's no governing panel cluster resources does sponsor torque so Oftentimes we will gather together requirements either from our customers or from the community and we will we try to Present them to the community through the mailing list Which is the main method of communication between us in the community and users and so forth And we we try to come up with you know common goals that we all want to achieve And then some people in the open source community or in the torque community I should say we'll go out and develop some of those CRI will tackle a few of them and We try to achieve those goals. So it's pretty loosey-goosey There's no there's no real governance panel no voting no president nothing like that yet We also have supercomputing and other conferences where we will meet face-to-face We traditionally have a birds of a feather a boff at supercomputer supercomputing conference here in the US which You usually fill up an entire room of people interested in torque that have Comments or suggestions or just want to hear what's happening the next year. So I'm asked we're getting patches submitted usually submit a patch to the mailing list and one of the committers will pick up on it and We'll evaluate it if it's good, you know, there might be more discussion about it more changes might need to be made But if it's really good, usually just be checked in to Either the stable branch if it's safe enough or to the development branch so So on the community model there, what do you guys do for both QA and for licensing? Like you mentioned specifically committers out there, which means that you must have a small set of designated Super committer kinds of people if they you know, you guys have done the licensing stuff all properly And then how do you do the QA do you know distributed testing? Is everybody tested or is it kind of a it works for me and doesn't work or doesn't fail for other people? So that's good or what do you guys do there? well traditionally, it's been more of a distributed testing so Here's a patch try it out. Some people will try it. They all report back. It worked. Well, so then we'll check it in But we're trying to be more formalized in that now because more people are being involved We'll see our developers and community developers quality assurance is of course a big deal. So we will We will usually do quote-unquote beta Testing so we will announce we want to release a version of torque and we'll encourage people to download the latest snapshots To run it through their clusters and a lot of people are actually pretty open to doing that and are willing to help out in these testing Sometimes it's just out of the goodness of their heart or out of curiosity Other times it's because we've embedded a carrot in torque such as a feature that they really want to try out and so they will Try out the software let us know how it goes and we also have a couple at CRI a couple friendly sites that are quite large That we are that are willing to install early builds for us and you know help us get that scalability testing in there as well and The community of course does its own testing for features that they think are important or for their own branches that they're working on So that's kind of how quality assurance goes One final question on community stuff. I always have to ask what version control software are you guys using? Right now we're using subversion So you're able to not only check out but of course check ins are restricted right now to keep some sanity to the project Yeah, we're using subversion 1.3. I think Okay, so community supported features and other features. What are some of the features that have been added to torque? You mentioned a couple of the G res and stuff What some of the more recent features have been added to torque that someone who's been using it for a while But not keeping up on it might find useful Well recently we've added well this is about a year ago But it's still pretty recent I think is a high availability feature Which allows you to run two PBS servers one of them which is active and the other which is inactive So when the active one goes down is you know either crashes or is killed or the machine is running on shuts down The inactive one is able to start up and take over the workload and resource management. So you know it's a I Guess it's a first step in helping make torque more highly available or have a failover feature The reason I say a first step is because it's not perfect it right now is based on an NFS file locking So there are some limitations some of the other features. We've added our Some minor things such as log management. So torque really didn't have a way of rolling or cleaning up logs by itself Of course the community's advice is to use something like log rotate But for some of our customers that isn't an option other features that we have done have been really to increase this The scalability and stability. So like okay mentioned torque in the early days used to be horrible, you know jobs It just well not horrible, but it was not as good as it is now It would often jobs would often fail for no reason Jobs would get stuck in states like running or E for indefinite amounts of time and we've really tightened up some of those protocol Issues or communication problems that would lead to those sorts of bad states or jobs failing. So We've really been trying to focus on that just making it really tight And work well and the last thing I guess that we've really focused on over the last year is performance So how many jobs it can start per second? How fast you can run commands? How fast you can submit jobs into torque things like that? one of the Features that came out recently that we've gotten actually a lot of mileage out and I know it's not complete yet I know Glenn still has quite a bit. He wanted to do with it, but it was a job arrays that was added That's actually been very useful for a number of our users Yeah, and I didn't mention it because Glenn always he's always you know, he's really good He wants to make sure that people know that it's still work in progress and not to bet your you know your farm on it So yeah, well, it's working great. I mean if Glenn's gonna listen to this I mean let me know that the things working great so far. So Yeah, a lot of people use that and that's that's one of our performance enhancing things We're able to you know get a thousand to two thousand jobs in the queue very quickly as opposed to waiting in a few minutes Future features then okay you mentioned Kerberos. What exactly is a state with that? I Not sure the last thing I saw about it was More than half a year ago, I think Josh well Yeah, CRI our cluster resources We haven't been directly our developers not been directly involved in the Kerberos development. It's been more community driven. So yeah I Think the last thing I saw was somewhere about six months ago but What's really needed here is for us? AFS users to to get AFS tokens into the Moms. Oh, this is an AFS question. Yes, I remember Yeah struggling with AFS support many years ago when I was a graduate student under name using open PBS and There was a third-party tool written by Dale Souther and I'm forgetting the name of it But he had it a way to securely pass the tokens in and out What was the name of that tool I don't remember yeah, what what what kind of approach are you guys gonna take? Do you know I'm not sure I was intending to to to look into this last year, but other things got in between But to do this safely you Really need to rewrite parts of the communication layer in in the torque So It's not that easy to do it in the current state Okay, and so this is mainly an AFS issue not necessarily just plain vanilla Kerberos Well, if you get if you do it right it will solve all the Kerberos related problems, but okay, I hope Okay Yeah, so things like NFS v4 that use Kerberos or like a Kerber eyes like WAN style luster systems or something like that that are all Saying they're gonna be secured using Kerberos to be able to use those file systems easily if the resource manager could pass Tickets for the user make things a lot simpler Yeah Precisely, I do know that there was like okay mentioned there has been work on in the past in the community environment the community and I'm trying to remember Someone's name. I think MIT Yes, welcome University. He's also involved. Yeah, there's there's a branch basically right now in torque I don't know how well it's being maintained that is what was created to try to Kerber eyes torque, so Yeah There has been work on it to pass I don't know like I said, it's community driven, so I don't know how things are going right now But I haven't heard a lot of activity on it recently so But I know at one point it was possible to pass these tokens around an AFS file system and the things were working at least With very crude initial tests. So Yeah Okay, so so what else is on the future? What about what else is on tap for torque? What makes torque go to 11? Well From a cluster resources perspective, we've gotten a lot of requests at last supercomputing to have better support for CPU sets And also Scheduling of cores so actually, you know supporting the core affinity features and Linux and other operating systems and pinning jobs to cores since, you know cores are Really big in clusters now So that's something that some of the community is working on And also here in CRI we hope to get some core pinning core affinity Or or or a processor finnity for that matter Great because I have some very definite opinions about that as an MPI guy great. Yeah, good to hear So also we are really interested in scalability. So as we mentioned earlier on with Roadrunner it's been You know learning experience getting torque to scale up that large But it has gone well most of the problems thus far have actually been Network configuration issues on Roadrunner from what I understand, but we we do recognize that once You know, you have a million node cluster out there Torque's not going to be able to scale to that with its current communication model now right right now It's quite linear. So we hope to have an alternate communication model in torque. That's tree based Very similar to how open MPI does things We've worked with Ralph Castain who's also an open MPI developer to kind of come up with some ideas and to figure out how open MPI Has tackled some of these problems So we want to do that The tree communication both for the resource manager part So just nodes reporting back status and also for the wiring up of jobs not the wiring up of MPI per se But the wiring up of the communication between the sisters Also, if we're lucky and this is this is pretty far out there So we'll see how well it goes, but I'm if we're lucky Ralph is interested in working with us an open MPI. I would assume to actually have torque participate in the wire up of MPI So that's yes actually I've Ralph has talked to me quite a bit about that and by that that sounds great We would we would love this kind of stuff We've wanted this kind of stuff from resource managers for long long long time and It would be great for it to actually finally materialize. So yeah, I agree It's it's still kind of far out but something that we're we're actually fairly excited about an open MPI Yeah, and so other things other future Future things for torque is I know I personally would like to see the torque source code clean up more It has made a lot of good progress a lot more comments are in the code now. It was kind of designed by committee That's the feeling I get looking back. You can see there's multiple authors with multiple, you know styles of coding and old old pre-NCC Functions and things like that, you know bringing it more modern more more fresh more up-to-date That will help it be easier to maintain and also weed out some of the bugs that have been lurking in there for years We'll be old cobalt from the code base and and Lisp Yeah, well, I don't I don't know if there's any cobalt or list But there's definitely maybe that style of programming in there. Yeah, so You know consistent nice and like I said just tighten down the communication and the state machine Just to make it rock solid and it really is much more solid than it was And on some people do say it's rock solid and never fails for them kind of like Ake said it's it's flawless usually but there are the occasional Clusters that just beat the snot out of torque and will uncover some flaws. So we want to fix it completely Yeah, as from an admin perspective some of those Managing of resources like we've been wanting to use torque on an SGI all-tech shared memory machine for a while and to get the full CPU set integration and now that CPU sets are a Linux 2.6 thing and less of a all-takes only thing Admins of just pizza box style systems the multicore can take more advantage of that Also a different type of resource test becoming more common or like these compute peripherals GPUs we talked to a GPU project on this show early on and things like a six and stuff Is there any look for is that a job for torque or is that a job for a scheduler? Well, that's a good question. I don't know. Okay. Do you have any insight on that? Yeah, well, it's It's a resource should be managed is the resource manager so depends on what your point of view is but You need to handle those resources just like CPUs and memory so Torque needs to handle it. I think Yeah, I think there's I think it's a two-fold thing here I think that torque doesn't need to be well first of all it's a scheduler from a schedule perspective You can actually probably schedule and manage these accessories just fine For example, if you're using Maui or Moab you can create a generic resource Which would define those GPUs so say you have a cluster and each note has you know two or three GPUs installed on it So each node has three GPU Generic resources or widgets that you can schedule So when someone launches a job they can request one of those GPUs then Moab will tell torque or Maui will tell torque to start The job on that node because it knows it has a free GPU and then when the job actually starts It will you know contact the GPU and do its computation on there But the the second part is Torque may need to be aware of the GPU and its resource utilization its usage and things like that to get meaningful statistics and also to To avoid the scheduler getting confused maybe about over committing or about someone actually Submitting a job outside of the batch system and running it on the GPU things like that Those are kind of some of the issues that we've seen at least with our customers and users that are using GPUs in the hbc environment so Yeah, right now the way we do it We wrap around a lot of we do a lot stuff in epilogue and prologue and kind of mess with the environment to be able to have users Know which GPU they're actually supposed to have been assigned But it works pretty well it just be nicer to see a little like integrated out of the box Yeah, and I totally agree. I think that's it seems to me that GPUs and these other accelerators are I mean, they're not going away. They're going to become more and more dominant Um, and so it would behoove us to learn more about them and to integrate better with them. Yeah Okay, well, thanks a lot guys. This was this was a lot of fun. Um, again, we have Josh Budicofer from cluster resources and Ake Sangren from hpc 2a. Thanks a lot for taking some time out for us guys Thanks for your time guys Yeah, it was a pleasure. Thank you. Yeah, thank you