 Welcome to another edition of RCE again. This is Brock Palin You can find all the old shows and find a link to subscribe be their favorite podcatcher or iTunes at www.rce-cast.com I also have here Jeff Squires one of the authors of the newly released MPI open MPI 1.6 So yeah, we just released open MPI 1.6. Very excited about that finally took that one to the Stable series and we're encouraging everybody to upgrade so go look at openmpi.org in the announcements and free code and all those kinds of things but the other thing I'm kind of excited about is Just got a book chapter published about the internals of open MPI in a book called the the architecture of open source applications volume 2 and It's actually all the proceeds go to charity. They go to amnesty international So not getting a dime for it. It was just kind of an open sourcey thing to do to talk about how your software works and Release that out into the world and so go Google that and you'll find it and it's good stuff There's a lot of other applications well-known open source apps in there like got Hatchee and Git and a Hadoop file system and things like that. It's a great Learning resource just to go see how other people other successful open source projects have architected their software. It's pretty cool stuff That is pretty neat. So We're actually doing something a little bit different. We're doing more like a panel discussion. So today we're going to be talking about HPC resource allocations, we're going to stick to like your compute nodes. I want to talk to three different Organizations about the different ways that they allocate out their compute cycles either be it condos or allocations or something in a Metal and so we have with us today. We have representing the University of Michigan We have Andy Caird who I should also point out is my boss We have Brian Gulfos from Ohio's super computing Center and then we have Preston Smith from Purdue. So Andy Why don't you go ahead and introduce yourself? Sure. Hi everybody. My name is Andy Caird as Brock mentioned I'm at the University of Michigan and I am the director of high-performance computing for the College of Engineering here at the University of Michigan and I've been doing the stuff with Brock and a few other folks About eight years now and we've recently in the past couple of years moved from a pretty explicit condo model to a Something closer to an allocation model, although it's not very pure in the sense of what I think Brian will talk about So that's kind of what we're doing and we're looking forward to hear what everybody else has to say Brian Yeah, my name is Brian Gulfos. I work at the Ohio super computer center where I manage our client services I've been doing this job here for About eight or ten months now I've been at the Center for almost seven years before that I was a defense contractor at right pad Air Force Base doing a human interface research for UAV control So it's a little bit different than the HPC stuff, but we do primarily allocations based scheduling and and we're Exploring some other opportunities, so I'm interested in seeing what everybody else is doing in here Which works for them? Okay, and lastly but not least Preston Yeah, I'm my name is Preston Smith I'm the manager of the research support team at Purdue University in the Rosen Center for Advanced Computing The Rosen Center is a unit of what we call ITAP information technology at Purdue at the Central IT Organization at Purdue and we we build and allocate our our HPC resources through a condo or community cluster model Okay, so how about you guys go ahead and give us a little bit Each mentioned a kind of allocation while you have but could you go into a little bit of depth of what that is and kind of description of your resources? Sure, I'll just go in order. I guess again So we have we have a couple of things here in Michigan, but what I'm going to talk about mostly is what we call flux FLUX The name was to indicate sort of the lines of flux to attract people to the system. I'm not sure If that was accurate in the end, but that's what we have And the allocation model was intended to take people who are very used to a condo model So they have this idea of I have this many boxes and they are mine And move them with that same kind of restriction, right? If you buy a computer from HP or Dell or IBM or whomever You get say 12 cores and that's all you get you cannot then Later get more in that same box So our idea was we would give them that constraint at sort of the top end But we'd let them pick arbitrary integers and we would make allocations with that one Limit of cores and then the other limit of course is time So we have to kind of this two-dimensional thing of cores and time And that's what they get charged for kind of whether they use it or not So in a sense, it's more like a hardware purchase in that way If you buy a computer from Dell and you don't ever use it. You don't get money back from Dell, of course So we wanted as huge to that as best we could at least at the beginning We still consider ourselves after two years to be at the beginning So our model pretty much tries to look like hardware that said from the operator standpoint We can oversubscribe it Which we do and as such then the price everybody pays comes down so hopefully we're more competitive in that way because of your subscription and because in our condo model over a Decade or so of operating that we noticed the utilization was fairly low 50% or so so we We thought boy we could oversubscribe by a factor of two and it would work out and if we would pay half as much In reality that doesn't actually work out, but we do oversubscribe by it by a bit 1.5 ish or so 1. It depends. We haven't really figured that out yet, but in any case it's more than a factor of one which is better than piece of hardware and Hopefully we're reflecting that in the cost to everybody who uses it without too much violence being done to their view of what a computer is for them That's that's kind of what we do here So just curious why doesn't an oversubscription factor of two work out? Is there bursty issues that you have to deal with or something else? It is it's the it's the bursty issues Pretty much because of course it if we I think if we had more Computers right so we've only ever reached that point once and we had 2,000 cores We've been expanding it. We're now at 6,000 cores and and I think we'll get to 8,000 pretty soon this summer maybe even 10 this summer And I think with more cores we'd have more jobs starting and ending in any given time and then we could increase the oversubscription and people would See less of a delay Again, we're trying to keep it feeling like your own hardware right so we kind of want it to be responsive you know Within at least Minutes or maybe small numbers of hours, but certainly not large numbers of hours or days for a job to start so But yeah, we tried it up to two and it didn't it didn't work out. Nobody was very happy with that So but we do think that sort of you know 1.7 or 1.8 is probably gonna work We haven't gotten back up there yet with the additional hardware. So right now everybody's pretty happy with it Okay, Brian, could you describe what you guys do? Yeah, we we have fairly Homogeneous setup here. We have a just actually just earlier this year. We brought on our newest cluster hopefully 106 teraflops, you know 832 cores or nodes almost a 10,000 cores and For us we basically we hand out what we call resource units, which are one resource unit is currently 10 CPU hours And we hand out, you know, we get 5,000 just kind of right off the bat for asking Hey, I want to use the system And if you need more than that, we actually require Grant proposals to be submitted and then those are peer reviewed and an allocation committee that is made up of From our user community. Basically, we'll we'll review those for scientific merit You know is the RU request reasonable Are you asking for a reasonable amount of resources to do the work that you want to do and then they kind of build those out And then we we you know scheduling isn't fairly straightforward You just use you burn your your time burn through your RU's so it's all it's all very straightforward in terms of The resource allocation You know, it becomes a little bit more difficult When we start dealing with with special cases Especially large users or issues like that Okay, that's good overview. Let's go with that for the moment Preston. How about you? So at Purdue we operate in a condo model We use community instead of condo, but it's it's essentially the same the same meaning of the of the of the word We every year usually about in springtime will put out a call to our faculty that we're building another cluster We'll work with a number of different vendors to get and get bids on the cluster We do this every year so the faculty can plan no matter no matter when their grants may be coming to or when somebody may be Starting with university they can plan that either their startup money or their grant funds There's always going to be someplace to invest in computing So we we centrally provide all the system administration software installation Consultant support and all that sort of stuff use central data center space central networking and storage So all of the faculty have to do is is pay money in for the for the cost of the node So they know if I buy 116 core nodes that I get that many cores with to choose from for my research group the The slightly odd part of that we're kind of a kind of a hybrid these days with Rather than physically buying nodes for the faculty They're they're buying what appears to be a service so they they couldn't walk into the machine room and look and look at Look at the racks and say that is my node right there But they know they at least have that that much capacity That they that they've invested in Okay, so let me get that straight. I I didn't quite catch all that so You're saying that the faculty play a Flat fee how do they determine how much each faculty members is it weighted or do they get the other? Is it the same fee for everybody or how does that work? I'm sorry I just kind of made it if it's a flat fee per node So it's essentially the cost I know So that this year we know the machine I'm going to refer to for most my questions the machine called Carter which we just brought into production last month It's it's nodes cost thirty three hundred dollars So faculty member could buy in with thirty three hundred dollars get one node and it can then you can then start using the cluster So the all they have to pay is that thirty three hundred dollars they get access to that cluster for its entire five-year life and And then the days go on from there They don't have to pay any of the the the hidden costs for the networking the power of the storage Okay, so they get they get access to one node for the life of that cluster at any given time Is that what you mean correct? So if I buy one node three three hundred dollars I get a 15 a 15 processor Q and I can use that however I need and then you know Going on upwards for the main node as somebody wants to buy in for So I have a question about what do you how do you see what do you do with unused capacity? What if I know never use it to just sit there in the data center? Well, it it would normally but we have we've come up with a way of letting letting all the other faculty that are members of the Cluster use everybody else's nodes when they're idle In early versions of the CUNY cluster program we used to use a preemptible queue That didn't turn out to be very popular. It's kind of nondeterministic if you know if you never really know when your job might get preempted So we came up with what we call a standby queue So I can run as well as large of a job as the cluster will will will let me start But I can only run it for four hours So the faculty know that if their node is being borrowed they'll get what they've paid for within within a couple of hours and That's and the people who are doing the borrowing know that they can count on four-hour chunks in which to get extra computation done So let's just free for for people then It's free for people who are who are partners in the cluster So the the the feet the feet to enter is at least one node So somebody who isn't a partner in the cluster. It doesn't have access Got it. Now do you do any kind of fair share scheduling with that to? Prevent abuse like one guy who buys one node and then continually submits 500 node jobs in the standby queue for the next three years Yeah, currently we do occasionally see some users that do that and I think that's just a virtue By virtue of us of our bad system currently we're using PBS pro on our clusters Which doesn't give us the ability to fair share within one single queue we can do it across the entire cluster But then not necessarily within the queue We are we are in the process of moving to Moab on our Carter cluster our newest one and it doesn't give us that capability so you mentioned there a number of Things that they're really only paying for the nodes and you're picking up a lot of the other costs I'm curious from the other panelists. What costs you have to cover and which costs are kind of offered as a public good? Brian, let's start with you on this one Yeah, well, I think we're probably a little bit different in the sense that we are explicitly funded by Ohio's Board of Regents. So we are a And we always have been you know, we're in our 25th year now a shared resource for the state So there is no no cost to enter for any of our users Basically, they just they come to us and ask for resources and we hand them out. So There's no, you know, there's there's no no sort of cost to pass on any of our users now Some of our users do volunteer to help pay for some licenses Or some storage if they're exceptionally large users, but you know, you could be a tiny user using a tiny amount of resources And you'll pay the same amount of zero dollars as our largest users Okay, Andy, what do you guys do in Michigan? So for we have the mandate in fact, we're I guess the total opposite of what Brian is to me in Ohio We have the mandate of recovering every cost. We have zero extra funding So we we have to cover People pay we have it we have a rate that we set and we have an internal university as everyone does I'm sure it's kind of fake money. You can have an account. I guess it's real at some point, but And you can give us your account number and we will charge that for what you asked for and We are obligated to recover all of the costs for the staff the data center the networking the compute nodes the software We cannot spend we did get some seed capital as it were so we could get going But we are not allowed to spend anything that we cannot then recoup So That's I would not recommend that model by the way to anyone listening I think it's a very difficult matter and it's unlike many other research things where You know when you show up in your office as a researcher you flip the lights on and they come on You don't you don't get a bill for that So it's very hard for people to come to grips with the fact that they have to pay for every single thing Unlike when you go to the grocery store where you know most human beings go to the grocery store And you do pay for everything you buy so it's a different academia in that way is weird But we're trying to I guess here at Michigan look like we are more like a grocery store Where when you walk out with a bag of chips and you know six pack of pop you you have to pay for both those things And that's the way it is on our cluster So just curious you even have to prorate things like power and HVAC and whatnot So if I ran a job on two nodes for two hours You've got some kind of formula for how much that cost and power and air conditioning and things too We yes, yeah, that's exactly right We don't buy the hour, but we do charge you by the minimum our minimum allocatable unit is one core for one month So if you say I would like one core for one month, we will charge you somewhere around $20 for that And then in that $20 is in fact the HVAC and power and that's right Yeah, so it's an interesting Discussion the idea of trying to recoup all your costs and and I will add from our perspective that Even being a state-funded resource is not exactly easy, especially in the current Economic climate, you know our budgets have been flat for for quite a while. And so It's a scrape, you know to get The resources that we need in play for our users and and I think you know That's one of the things that we're looking at is additional ways to get some Money into the center pay for staff time and and capital expenditures and and and things like that and and as a fact As a matter of fact, we we do sell some usage of our time to commercial entities And so we have a a rate card that you know to find a CPU our cost and and we can bill our users for that net presumably covers Hardware, you know capital expenditures operating costs of the hardware and staff time as well So Preston, how do you guys do the cost there at Purdue? So we're actually it sounds like we're kind of right right in the middle in between in between those ends of the spectrum For the faculty all that they're paying is the cost of the node as I mentioned earlier But we do we do fully cost everything else We do the in our in our total cost of ownership Calculation we do include the power of the the data center space the staff time so we know how much the true cost is to the institution and As I heard just a second ago for for external customers like from private industry We do use that as a rate to build them for access to this program But we don't we don't fully take that same amount of Cost from the faculty That's the same furnace that when I say fully costed we do get a lot of Benefits from being being in the you know Broad embrace of the University of Michigan. So I think that if we were to look at a commercial cost It would be hard for us to say this is the true total cost appreciation of buildings those kinds of things We are we don't we don't worry about too much Which we would have to do of course if we were going to do a true total cost of ownership for this stuff But so there are some areas where we have our hands, but the things that we buy it on a short time horizon of years Not decades. We do attribute all the costs somehow and then we charge them to people So Brian why don't you start covering the question of what do you do with the two different types of users? What do you do with the user who has a Consistent need they're running say a hundred cores worth of stuff non-stop versus a user who wants a thousand cores Say every other month That's actually an excellent question Week because we do have both types of those users obviously I What we will sometimes do is we'll set up reservations so Nodes will be held available for a user who will typically have a consistent amount of usage So they continue to get that kind of usage. Well, we you know We're in a situation right now where we are running almost that capacity almost all the time So we're trying to avoid setting up too many reservations on our new cluster You know, we're still running our oral cluster gland and we still have some reservations on that But we could we just really don't want to have a situation where users are Holding nodes not getting charged for their use and pulling them out of the pool for everybody else But we do occasionally set up reservations for some users This actually happens more often for commercial users where they have a Certain timeframe to get some work done And and of course, they're they're paying us whereas our other users are are not directly paying us They're paying us their tax money, I guess, but So so we do we do a little bit of that we do a little bit Q priority sometimes for some folks And Those I think but the two things that we do most often to kind of get away from the normal usage So Preston, how do you handle those two different types of use cases? Right we really don't differentiate in terms of how of how we operate for for either for either use case We just we just give them the ability to to acquire computing and if it's appropriate for them to To use it that way like if they run a lot of jobs and get a lot of value for it Then great if they don't get if they don't run the entire time Then then it just they just don't get it. We just don't get all the all the bang for their buck to speak But it's really up to them So how does that work in with you know researchers have to buy you know whole nodes? Do they have if they really have sporadic use are their machines basically just idle even though they paid for them? Yeah, they work out that that does work out that way But what the nice thing is that most the people that operate in a real bursty mode But they don't need computation very often they're oftentimes the the faculty that may be buying one or two nodes and just using it sometimes Which for for spending $3,000 spread out over five years? Is it isn't really a ton of money the people that are that are putting in large sums of money buying 50 to 100 nodes Though those are never the groups that let them sit idle all the time So for the people that have bursty have have bursty needs the the investment is pretty minimal And it seems to work out fairly well for them in practice Hey, Andy Yeah, so um the the model that we have which is sort of this two-dimensional time in cores model It meets some amount of bursty needs right if you're if you need a thousand cores for a month We can do that. I'll talk about how in a minute if you just use a baseline of 100 cores forever That's very easy for us. Of course. It's predictable If you need something less than that to use a thousand cores for a day, you would end up paying us for a month And that that would be unattractive to you probably It's still cheaper than buying a thousand cores from from a hardware vendor for a day So we have a bit of an advantage there But even so we're looking at other ways to accommodate the very very short very very large jobs Which we have not figured out by the way The our mission or our dream I guess is that if we have 10 people who all want to run big huge jobs over a short period of time We can multiplex them together and take advantage of the hardware without ending up with a ton of hardware sitting around idle Waiting for these big jobs to come I think we're headed in a pretty good direction on that it feels to me like like we might be able to actually do that But like I said, we're still in the early stages of this only a couple years in so we'll see how it goes as we As we add more users and the system gets larger But the general idea is that you you you can pay for whatever you want Up to the size of the system and then our issue is make sure the system is big enough and can sustain itself at that size To accommodate those kinds of requests That makes sense Yeah, yeah, it makes sense. Let me ask a This is not directly related, but it's kind of sort of pseudo related For the guys who said that you do have some idle capacity sometimes Do you ever power down nodes and and there's at least two variables to that one, right? Is it worth it power wise to power them down? Do you actually get any cost savings and number two is it worth it? Mechanically to you know spin down the drives and spin them back up and things like that or does that? Shorten the life of the machine and things like that. So Andy. I'm gonna throw this back to you Sure, we we don't do that at present. We have investigated it. I think we probably will get there Because I do think that power savings for us will be noticeable. We're doing some also some work and some data center stuff Which should reduce our power costs, which then might not make it worthwhile We just leave them out all the time, but We do have just recently we've come into some idle list And we've been getting closer to looking at powering things and we would do it in an automated way We run Moab and torque here and they have a truth add-on product that would that supports doing that And that's what we're looking at using to see if we can do it, but we've not we've not done it yet now Okay, Preston. Do you guys have I I think her call that you said you have idle time too Or do I have that backwards? Well, we we we do have we do have idle time in theory so if If our if a faculty investor that isn't using their time it is in theory available for others to use in practice the between the The faculty using their dedicated cues and the standby access where where I can borrow my neighbor's cue The cluster is used about 75 80% of the time For that extra 20% though we've we've developed a solution where we've created a very very large Condor grid that runs behind all of our clusters So while pbs primarily scheduled to cluster when the nodes aren't being used by pbs Condor is free to schedule on them and that condor resource we make available to anybody on campus at no charge So in practice when you figure in Scavenging the cycle of the condor we get much much much higher utilization Into the 90% So one way or the other that idle capacity gets sucked up somewhere Fascinating Brian, how about you? Yeah, we you know, we don't really have much in the way idle time movement Actually, just check both of the cues right now while we were talking and one of the machines is not nighting 95% of course earn use excuse me 95% of the nodes are in use and The new machine Oakley's 98% and this has been pretty consistent for us for about eight months. So We don't really power things down for a power savings But you know, we're at a situation where you actually hit the cap for power consumption in our facility And we had to size our new machine appropriately to sit inside that power one We had to buy something a little smaller than water that I think our goal would be to never power anything down is to match that, you know the demand to the resources accurately as we could So that's actually our primary goal is to do that and not worry about powering things down So we've heard a couple different things. Um, you know units being a core units being a node when it comes to actual scheduling policy and mixing users across stuff Especially you Brian with the just giving out the CPU hours How do you manage an actual queue within these allocations use fair share policies or max proc or what kinds of tricks do you do? Yeah, we we actually kind of do all of those things We have a hard cap on the number of cores anyone users and be using it at any one time the hard cap and the number Actually the cap happens to be the same on cores for a project. So, you know all of the jobs in particular project could be using I think it's about 2048 cores at one time And we also have caps on the number of jobs per user and jobs per project And that's just to you know avoid any one user sucking up one hundred percent of the resource at any one time We also do some fair share So and I don't remember the exact numbers that we use but we do do some of that stuff too To kind of keep one person from starving too many other people from getting access of course Preston, how about you guys at Purdue? Well at Purdue we allocate it such that each each faculty owner gets gets a dedicated queue set up for them with with a max proc limit Based on the number of cores that they purchase into so if I buy one 16 core node I have a queue with a 16 processor max proc Within within that owner queue. There's really no limit. There's no cap on jobs per user There's not fair share by default. We do have a couple groups that that are interested in doing that Within our standby queue like I mentioned earlier as we're as we're moving our clusters to Moab We are going we are going to set it up so that it is fair shared with them within that standby queue Yeah, so here at Michigan for our allocation based system Everybody's sort of in a way they sell limit right they tell us how many cores they want And that's then how many they can use it any one time And we do do fair share amongst the groups and users so In the event the system is busy it totally fall right at 100% full Which is the only time that stuff actually comes into play is if there are cute jobs to order Which like I mentioned at the beginning our goal is to not ever have Very many or any jobs in the queue we want them to flow through pretty quickly So hopefully stuff doesn't ever make any difference But within one project of course it might you might have a two-core allocation you asked for And you know 50 grad students and that would be then very busy But then we do apply fair share within that so people can have a fighting chance of Of getting their stuff to run at least no one person can dominate it for a long time So let's let's go through that same round again, but there was something Preston early on you mentioned that you've kind of gone to a servicing like a faculty can't find and find a node with Their specific name on even though that's what they're that they paid for a node Does that mean if they have 16 cores that they can have if they submitted 16 one-core jobs They could be running on 16 different nodes Yeah, that's correct. They'll get at least a number of of cores that they've that they've bought Yeah, so we allocate based on the cores, but we but we sell based on nodes But what about something like memory like what if they request they just submit 16 one core job But each one of those jobs request the entire memory of a single node Then in in practice that'll work out that the bat system will only let them get one job through at a time So Andy, how do you guys handle that kind of problem? Yeah, we sort of do it. It's it sounds like in practice the same way that the Preston is doing it Every the actual unit that you get is one core and it's associated memory. So in our system, that's four gigs of RAM Um, so if you say you've got a 16 core allocation and you submitted a one-core job I can't do that math made. Let's make it easier. So you've got a two-core allocation You submitted a one-core job with asking for eight gigs of RAM You only that one would start the second job would wait for that one to finish So we do count resources both in processors and memory in that way They're not tied to each other explicitly Where and you can only run four gigs of RAM with one core But the sum the sum of them all has to add up to what you have been allocated And then Brian, how do you guys handle this and how do you charge your CPU hours based on you know? excessive resource use other than CPUs Well, we so we only explicitly charge for CPU usage So if someone is requesting one core on a machine, but they want all you know 48 gigs of RAM on that machine Currently they're only going to be charged for that one core However, you know, we've got the ability now with Oakley to use containers Not where I'm right at six I believe So we're investigating. I think we're probably going to start charging basically a proportion of the node You know, so if you're asking for half of the RAM on the node to be charged for six cores Even if you're only using one and we'll probably do something similar with scratch-based in the nodes as well All right, so just out of curiosity just to make sure I understood what you are saying you guys actually do In real time allocate more than one job per node if if required So like it let's say I've got a 16 core machine you couldn't schedule two Eight way MPI jobs onto those onto that machine. Is that correct? So we schedule Serial jobs and we could what so I should perhaps clarify our terminology We consider any job that stays on one node a serial job whether it's one core or you know all 12 on Oakley So we'll allow serial jobs to co-exist on a node So somebody could be using six cores and another person be using the other six cores Parallel jobs we require Entire node usage. So if you want to run across two Two nodes we consider that a parallel job you would charge for use of 24 cores even if you're only using one core per net Gotcha. Now you guys the other guys Andy Preston you guys do something similar We are purely core allocation. So um, yeah, we could end up if we had We have 12 core nodes right now. So if we had 12 12 way MPI jobs it is conceivable that One task from each MPI job would be on one different node and then be then 12 different MPI Jobs one test for me to one on a node Gotcha. Does your scheduler attempt to minimize that effect or is it ignorant of topology stuff like that? Does it does try but it's a good question, you know, and I don't know if it's explicit It doesn't generally work out that that happens I don't think the scheduler is that Aware of it, but it does try to pack things as tightly as it can in that sense So we don't we don't see that in practice very often And we do allow people if they want to make cost them some queue time But they're allowed to request if they want I want all 12 of my cores on one node So if the human interaction by the user can influence that Yeah, and then it then a pretty we operate the we operate ours the same way it's by the core The scheduler does try to pack them, but if the user wants to spread them out, they're free to do so Okay, Andy, let's start with you Why don't you give us three things that you like about your allocation model and three things that you don't like about it? The three three things that we like about it so the first thing is we do get pretty good efficiency with respect to hardware usage machine room usage Staff usage for operations, especially So we think that that's a quite a nice thing for us. That's one good thing the second good thing is we one of our original goals years ago two and a half years three years ago and we started thinking about this was We thought the barrier to entry was high for people who may not be Computationalists they might be though So we thought boy, you know asking somebody to give us a couple thousand dollars and make a three-year commitment Just so they could see Is a lot So we thought boy, it'd be nice if we had a way that you could come in and try for just a little while So one of our goals was in fact to lower the barrier to entry So, you know, if you have a couple hundred dollars say you had a hundred dollars You could get five cores for a month You can run an MPI job with that you could you could get some modest work done And it would cost you a hundred dollars so that we thought was a nice. That was a nice way to start for people So we think that's that's quite a nice model It's very nice feature of our model I guess See The third the third good thing boy Brock is there a third good thing about our model Great So that was I mean that was I guess there are only two good things about our model And I think that they really are they the efficiency of it is we think is quite high And we think that we've enabled researchers here at Michigan to To come in And give it a try without a big commitment or big expenditure I don't have to actually own hardware that I've worked on machinery in the rack space And because of that we were able to provide a fairly wishy-configured resource, right? It's all in finnaband we have quite a bit of scratch base using luster over in finnaband So it's things that you could not even with a medium-sized cluster. You may not get for your own You have to be at some scale To to achieve those things at a reasonable cost, so I guess that would be the third good thing about our model is the We're able to provide a Fairly rich feature set at what is essentially incremental cost now for people which is which is nice Okay, Andy. Let me let me interrupt you. Let's go get three from everybody and then we'll come back and do three bad from everybody So Brian, give me give me three good things about your model So I think you know, I think from our perspective as a shared resource center One of the best things about doing the way we do is that we keep our utilization very high. We don't have a lot of Idol notes And and when we do it's the case that anybody can submit a job and then we'll run very quickly and so Looking at it the kind of user community as a whole utilization high wait times are low Kind of a second thing that I like about our system is that requests for resources Are justified a reviewed Be that your review process. I think that that's a nice way to ensure that That what we're we're handing out resources appropriately to to what their needs are rather than what they think their needs are And the third thing that I think is good about this is there's no real barrier to entry Small users, you know, maybe don't have any money I know we've talked about small enough money But maybe somebody who doesn't have any money can still access these HPC resources that you don't have to worry about Purchasing a portion of cluster or purchasing their own cluster You know, maybe they only need the resource for a short period of time and they can easily access it with our allocation model Okay, cool and Preston So what one of the the best things about our model is I think a couple of them center around the cost Since we since we do build a community cluster every year It's a predictable thing for the for the faculty To be able to get computing no matter when their grants may be arriving or when a new faculty member may be starting and then also because we do this every year it It realizes some pretty tremendous cost savings to the entire institution over the years are We've negotiated with the vendors to get good prices on the clusters But then we're also able to use those same configurations and prices for servers in every area of the university So we can save a million the dollars of in server acquisitions over the last five years For the for the faculty member the barrier entry is is fairly low It's not zero but at the cost of only a single node at about two or three thousand dollars It's it's pretty minimal. So just about anybody can get access to the computing Okay, and let me flip this question now, you know, what are three bad things about your model? Let's go in the same order again Andy Sure, so now that I've had some time to think I'm probably be better at the bad things But but I don't think that's a reflection of my opinion of our model quite quite fond of it Actually, but the three bad things are what one is We have spent a lot of time here at U of M talking to our internal audit group and we've not yet figured out a way I'll be getting closer to accepting grant funded hardware with the restrictions that come with that. So You know, if you're listeners, you're familiar with the A21 rules We we hear to those from quite closely here at Michigan and that prohibits sharing of hardware If it was purchased on grant money with anybody other than the grant. I think there's some flexibility in interpretation Nationwide on that, but the interpretation here is quite strict So we because we have a totally shared model what they said for utilization reasons and efficiency reasons We don't we can't really take a grant funded hardware into our model So if your grant funding is hardware specific and not just plain flexible money The model doesn't work and we have condo clusters to address that So there's that issue the the second thing I think is our granularity is a core for a month There's been a lot of interest in people either getting direct bill pay for what you use Or we're getting a smaller granularity core for a day for example That's not to be fairly complicated for us, but we are working on that as well People are gonna see the same cost. I'm afraid and it might be less attractive at that point But but the granularity is weird for us in that way And the last part of the model which is truly a model thing. It's a financial thing Because we are obligated to recover every cost We are somewhat The things that we cannot charge for a tribute direct costs to we cannot not put into a rate We cannot not we keep their costs. So those are things that we don't do At least not as part of them this very narrowly scoped model of the hardware things like consulting for example If we don't outside of it different funding streams that kind of stuff So that makes it a little bit trickier for us in places We do address that by sort of having a stone super approach to it where we asked everybody to throw in a consultant And then pretty soon we have a bunch of consultants we can kind of work with so that does work out for us But it makes a little bit challenging because then it's not centrally wrong. It's it's very much, you know contributor based So that's those are kind of the three. I think it shortcomings of it So, okay, probably some other ones as well. I'm not thinking of so I don't you know, it's not perfect, but But it's not good Brian, I think you know one of the big challenges we have is Convincing people who do have resources and funds the value in contributing to a shared resource like oh, I see you know, I Think condo model is much easier to sell to somebody then economies of scale Which we can provide and and the ability to to kind of maintain the shared resource You know if somebody's buying their own cluster in their lab. They have to maintain it They have to pay for it They may not have the expertise to run the cluster as well as we can and that's a hard That's a hard sell It's a hard way to convince people that to kick in money to help us buy bigger bigger clusters So that's a challenge for us and one of the reasons that we're exploring the idea of perhaps Getting into the condo space For some individuals, you know kind of second problem some individuals weights may be long Depending on the resources that you're requesting If you if you need You know one of the larger memory nodes or GPU access, you know, it's We have like eight nodes with a hundred ninety two gigs of RAM I believe that's both of the forty eight on most of our notes So those are going to be in higher demand and the weights to get to those resources are longer The weights to get you know one in every ten notes of the GPU So the weights to get to the GPUs may be a little bit longer Than if you had your own kind of dedicated resource where you asked for that stuff up front and it was always available and Yeah, kind of big I guess the big white elephant in the room is we haven't talked about disk usage Long-term storage that's something we don't account for at all in our allocation model We tend to be very generous in giving out this space, but you know, it's we found that our users Can gobble up the space as fast as we put it out there and That's that's becoming a real challenge for us and in the usage is different than CPU CPU is you know, but It's very easy to to reallocate a CPU to somebody else when it's not being used But storage is is kind of perpetually in use. It's a different resource entirely. So Disk storage is just a big problem for us Indeed okay Preston, how about you? Hi, I find it interesting that you mentioned the storage because that that was actually one of the areas I wanted to mention the challenge for us Where we're where our problem lies is we don't currently have a provision in our model to do condo storage Which is something that our users keep coming to us with as I have X amount of dollars I would how many terabytes of disk can I get for that and as of yet We haven't got that got the perfect way to do that. So that's definitely a challenge that we're working towards solving We don't currently do the the number of cores times unit of time Model yet, even though we do have a we have in the last couple of months had several years of users asking for it So we do see that this is the direction that people are going to want to go So there will have to probably be some future developments and making that part of the model and then finally I think that one of the one of the weak points in our model is where it scales to to Very very large-scale users. We do have many research groups that buy hundreds of nodes on each cluster And that's great for them when they can when they can use That their entire share of the cluster But if they want to run something on twice as many nodes as they own which may be most of the cluster There's not a reliable way to do that yet We've we've got to do some more exploring whether we're going to do reservations have dedicated times where they were Somebody can do a whole cluster run and today that's really been kind of an ad hoc type of process So we definitely need to come up with some more structured ways of doing that in the future Okay, everybody. Well, thank you very much for your time I think a lot of the people who are looking at you know starting an allocation thing or Combining a couple of research groups together and getting some economies of scale will find this very useful So again Andy Preston and Brian. Thank you very much for your time and we will have this show up soon Yeah, thanks guys, I think we could have gone on for another hour here. This is fascinating stuff Well, if there's if there's response from the from your listener group that there's more interest I think I'd be at least in for doing another hour of it Yeah, I would do but I think per your suggestion over the instant messenger. We should do it over beer next time Well, that makes everything better doesn't it? Yes, it sure does. Okay. Thanks guys