 We'd like to get started in a minute or so. Now we're going to get back into more, away from general things you could take with you in any place you ever go back down to the kind of BayoCat itself and the resources we have here on campus. You have to generalize a little bit here. Let me talk about queuing systems. Your desktop PC that you have at home, who's in charge of that? You are. Who's in charge of BayoCat? Not you. That's a good question. Of course, we would really like it if everybody could jump on. And I'm not even kidding. We would like it if everybody could get on and have all their jobs run, starting right away and working immediately and finishing as soon as possible. That's what we try to get as much as possible. But we have two people wanting to do a few resources. So we have to have a way of managing that. And that's what Adam and I do a lot of, is managing that. So you have what's called a queuing system. It's actually similar to the old mainframe days where you submit a batch job and it works on it a while and it comes back. So that's what we do. We submit and process jobs according to what's called the scheduler. Our scheduler is called SGE, which originally stood for Sun Grid Engine. Now Oracle took over Sun and we use an open source option you'd have it called. Sun Grid Engine, so I still call it SGE. But it's just one of the many systems out there that allows people to submit jobs, figure out when is the best time for them to run and to execute them. And try to do it in a fair way. And fair is different to every person out there because of course fair. Mine runs first! I know how that works. So we really try to do that as much as possible, try to get everybody's job in as quickly as possible. And we have philosophical discussions when we see things in the queue that have been sitting there for a while of, okay, is that fair? How do we make it fair? We're tweaking things fairly frequently. It tends to be in spurts. We'll tweak a little bit for a couple of three weeks and then it'll stay that way for months at a time, that type of thing. I will say also there's one interjection. If you've got something significant, like, look, I have a paper deadline and I have to get this through the system or I have a class that I'm teaching. For instance, I was just changing emails with some of your bosses, Dr. Agents. And she's got a class on computational chemistry where, look, she needs to be able to run the class exercises. We can reserve notes. If you convince a guy you need 100 notes for the next three months, I'm going to be very, very distressed. But if you say, look, I just need, you know, I need some notes because I've got to keep coming through. I just need to make sure these things get through in a very timely fashion. Give me a holler. Let's see what we can work out. So, pretty quickly, for instance, for her class, she doesn't need that many notes. I'll just take a slice of Baocat dedicated to her class for a month. Sure, piece of cake. It's really easy. And so if you have unusual demands or needs, let's see what we can work out and go from there. So, pretty quickly, for the faculty members, maybe you have a deadline where, if I don't get these results done, I'm going to miss this conference deadline or the terminal paper deadline. Or NSF deadline? Our NSF deadline. Let's see what we can do. Everybody wins when you get your results done and you get more money and then come back later and say, hey, can I contribute to Baocats? So I have the vested interest in making sure you're successful. To be fair, it is also very hard to do some of that at the very last minute. So if you know ahead of time, let us know. We can try and get something worked out. We have been asked of the physically impossible, which is difficult to accomplish. And we've also been asked for things that would require us to kill off half the jobs that are currently running on Baocat. And that's something that I'm probably not going to be willing to do. But give us the time to work and it can usually be accommodated. We are looking into, this is preemptive sedge scheduling. This is something we've looked at, we have not gotten done yet. And that is a system where we've had a lot of, we've had people that have not contributed to Baocats because they want their jobs to be able to run now and they've got the money to do it and so that's the way they're going to do it. We're looking at a system where, it's all within the schedule that we have now where you have your resources that you bought in and you say, this is what I, whenever I need my resources, I need them now and we can have people running on them at that point and we'll basically schedule low priority jobs more or less on there. And so that way, if somebody comes along and says, okay, I need to use these machines now, we can kill those off, reschedule them and you get access to your stuff right now. That's the long term plan, we're not there yet. But I'd like to admit through some of this centralizing resources again because there's the typical cycle is you use it really heavily for a while and then you don't. And more than likely by centralizing this, your peak is not going to be your peak and your peak is not going to be your peak and we can kind of ever get out the hills and the valleys there and make it better for the entire university. We really do try to accommodate everybody that we can. We try to get jobs here quickly. We try to help people out with if they're having problems, even coding problems to whatever we can do. We try to be really customer friendly, so to speak. We try to do the best we can with what we got and we hope that we're not seen as that group of, Jesus, I have to deal with them again. If you get to that point, let us know. We'll try to do something about it because that's really a problem we're trying to avoid. I've got kind of a schematic here of Baocat and even scaling it down a bit. It's got a little smaller. This is kind of a high level overview of what we got. We've got the campus network. The internet comes into the outside. It comes as a nickel hole router and that comes to everything in this box here in the Baocat server room. We're going to go on a tour here in just a few minutes. Those of you who haven't seen it yet. It comes to our firewall. There we have several different networks internally that we deal with. One you guys don't have to mess with is the management network over here in the green. I labeled that green on purpose because I think all of our management cables are green. That helps us to realize as we're plugging in and unplugging things where things go. Those go into all the different machines across the network. We have the purple is the production network. Got my key here. I have it written down elsewhere also. The red is some hosted VMs. We have several people that we just run virtual machines for. They're running web servers and GIS applications and some other things that by putting them on a separate kind of network they can have ultimate control of their own machines without being able to take control of our other machines which we don't want them to do. OpenStack is in blue over here. OpenStack is a new system that we're going to. We're dedicating some resources to. OpenStack is basically like Amazon's EC2 cloud more or less. You can fire up a virtual machine right now we have it targeted at students in a particular class. We're hoping to expand that out once we make sure that everything works correctly to more general use across campus. You can fire up and fire down a virtual machine at your leisure having some own space out there using the resources that we have here as opposed to having to use your own especially for, you know, we got UPSs and backup systems and all that kind of stuff in place. So that's the idea on that. Our reduction network goes to all our compute nodes everything here and then we have OpenStack talks to Ceph which is a new disk system we kind of have a few resources set inside the disk on there. The main disk is top to two file servers that helps us keep some really high availability by having two file systems talking to the same disks and all of these guys are talking the orange here again on purpose because that's the color of the cables are Infiniband which is the very high speed network that we have that's used to talk to for file storage. It's what the OpenMPI runs across your MPI stuff runs across and it's very low latency. It also runs at 40 gigabits per second so those of you guys who have networking stuff it's fast, very low latency, very fast that's why we use it to talk to disks and such. Hardware wise we currently have as of the writing of this and I think it's actually changed since I wrote this a couple of weeks ago we have 156 compute nodes 10 OpenStack nodes and 18 infrastructure servers kind of had that broken out here I'll talk about this a little bit this is how many users we've got in the system Dan just sent me these today got them in my slides 2003 and 2011 went from about 10, 15 it was in 2003 now we have somewhere north of 300 we're like 500 now that was 2011 we're up in the 500 ring just last year so obviously this is becoming something that more and more people are using and we like that if we're here we might as well be use us that's what we're here for number of cores in the system just crossed the 200 threshold in 2008 and now we're actually over 2,000 this only goes to 2010 we're over 2,000 now the compute nodes that we have we have four classes that are currently in service first one are the scouts those of you this is kind of aimed for those of people that have used it before so you kind of know what's going on here we have 76 of those we've got a whole bunch of them really inexpensive from sun the oldest ones we have in production they are not the fastest guys on the block but we have lots of them that's not all bad there are a lot of jobs that we have come through that actually don't need that much CPU horsepower they need things to run for a very long time these work fantastic for that kind of stuff they're running 8 cores of these 4 cores of these to 4 cores of neurons at 2.3 gigahertz and at 8 gigs of RAM it's really sad I'm coming from the private world and 8 gig of RAM server I'm going wow that's pretty good and here that's the old slow stuff what we're dealing with now next oldest one is the paladins we have 16 of those they're running 2.6 cores Zions at 2.9 gigahertz that's 12 cores each 24 gigs of RAM I left that in there I meant to take it out sorry these are the ones that we do have the GPUs in they have the Tesla GPUs and they have a Phinivan the only ones we have that have a Phinivan are the have a GPUs in them are the paladins so if you're interested in GPU computing these are the ones you're if you tell it you're going to run in Cuda these are the ones that will go on so you may want to be knowing about the CPU limitations they're still a lot slow by any stretch but they're not the fastest things out there available next we have our big ones I like these guys these have 8 10 core CPUs 2.4 gigahertz with a terabyte of RAM apiece in a Phinivan these are the ones we might need to go for bioinformatics use we said there's 6 there's actually 12 but they're tied together at the CPU level so you'll see two boxes in there and those two boxes together will make one mate that's why if you if you happen to notice what job if you happen to be running on a mate and you're looking at what job it's only going to be an odd number we have 1, 3, 5, 7, 9, 11 so we can break them apart and make 2, 4, 6, 8, 10, 12 if we ever wanted to but as they're tied together they're all the odd numbers and we have the ILS we have 38 in production more that we're waiting on cables for and a switch and so we'll have some more of these in place these are 16 core if you wouldn't know how gigahertz they're the sandy bridge those of you that keep up on CPU technologies that's the fastest readily available CPU from Intel I say readily available because like you said there are the files but you got to show them that you really need them before they let you have any of those and they're quite expensive even if you do 64 gigs of RAM again not as much as the mages but still nothing to sneeze at and again they haven't fed a man on the tube and with that we'll come back to what I had before there just to figure out how many at a time here because I'm thinking if we take 35 people through at once then you'll hear this big squeaking noise which is my voice from the front of the line and nothing at all towards the back tell you what I'll just do let's divide it into thirds thirds I think probably and then actually so maybe we just take one, two, three and you guys can each grab a group of about ten and it'll just kind of go through and you can talk about the different nodes okay we'll go from there so let's see here we're pretty close at least spread out just pick whoever's closest and come toward the sure, okay about ten of you form up on me and then we'll go take the tour and then we'll go from there and we'll be back in this room in about, you know, five minutes or so and if you want to leave yourself here that's fine, we'll lock the door so group of ten two, four, six, eight, ten-ish got it, okay alright we'll get to the corner or whatever right, yeah last time we got it I think it was a little bit like 75,000 75,000 okay you do what hopefully they have some monitors across the that is the same building you're building a server I mean you're clearly far down so alright, now that everybody's back around I think everybody's already here has already got an account if you don't have an account you need to send an email you need a support list or to Dan if you send it to us if you send it to Dan he has to approve it and we create the accounts if you send it to Dan he still has to send it to us anyway so it doesn't matter it's got to go back to one from the other anyway do I need to go over how to log in you know cat I think everybody here raise your hand don't feel shy if you don't know we can definitely go over it you're not I was not yours have a guilty look on your face no alright there are a few people Kyla I think okay alright I'm going to use Putty here you saw me do a little bit earlier here the SSH command would be what you'd use within a terminal window on either Linux or OS 10 but I'm just going to run Putty and where it says hostname or IP address over here we're going to put in baocat.cis.ksu.edu every day the default should be just fine on this some of the older versions of Putty did not default to SSH so if you have an older version of Putty you have to change that one and we click on open and you can also save this so all you have to do is double click it over here so it's like double click instead of where it says default settings I could make once in baocat as a matter of fact let's just do that baocat save now I can just double click this double click double click only loads the settings only loads it still got to click open alright and then after this point it uses your EID use the name of password once it connects says login asked I'm going to use my EID Hudson my password which is I'm not going to tell you that what do you think that the password you have dated with all the K-State passwords you don't have to worry about that we just took into the main system that works well I will say Kyle one time I was teaching in class and I didn't quite hit return as vigorously and I thought I started typing my password after my username and it got about 3-4 so I threw my password and showed it to the entire class which was mildly embarrassing and scary and I changed it immediately after class my last password change was about 2 weeks ago and I did something very similar and it was only Adam watching but I still changed it right afterwards anyway not that I don't trust Adam but I didn't take at all his money when it said my password then man not found I said ah shoot I just made this bigger so you can see it on the screen here but this is what you see when you first login and it's just sitting there ready for you for you to tell what to do we're gonna go over some of that when you first log in you log into one of the head nodes they're named Athena and Minerva all of our systems are named after Greek mythological characters the reason why is because we have a lot of systems that seems to be a limitless supply of Greek mythological characters so sometimes they have deeper meaning than what's there is and I was just because that's what came up so Athena and Minerva are the two head nodes the head nodes are the ones you log into from the outside world they are the ones you submit jobs from and you can't even run programs on them we do have limits on that because they start right away we had somebody who was doing a lot of testing and she was submitting jobs that would run for 15 seconds I said why are you doing this because I was working with her on a problem something we had a particularly busy time in Bay of Cat and her jobs wouldn't be scheduled for about two hours and she was complaining you know without this really short job why wouldn't it get in the schedule well you were just running the testing on this so there's no reason to even need to schedule that you can run right there from the head node from Athena and Minerva but we do have limits on that one gig of RAM one hour of CPU time use it mostly for testing or if you have really short type stuff the things the examples I've been giving today I've been doing those all on the head node too the memory limitation is actually 4 gigs oh is it? yes okay it's a 4 gig limitation on memory it's a quick kill though so as soon as you hit 4 gigs that process gets killed so yeah that no manual intervention required yes so so if you do have short jobs if you are doing just testing to see how things work that's that's perfectly valid and we expect that no big deal okay what happened when you submit a job I like that I passed English class or typing class one or two what happens when you submit a job it goes into the queue it goes into the queue and the scheduler figures out when is the best time that I can run this job deep and mysterious and very strange things happen yes that's about the best I can tell you deep and mysterious things happen the way you submit a job is through the queue sub the queue sub command that is the SGE submission process all their jobs begin with queue so that's how you can generally tell something is a is an SGE process or not is by starting with a queue talk a little bit about that in just a minute actually tour took a little longer I thought I was going to just run it rush through here a little bit if you go to this web page here this if you click on some great engine from the main page they'll take you to this page and support in case you know what you use that gives you pretty much everything you need to know or could possibly want to know about submitting jobs to Beocat it tells you about multi-core environments yes ma'am good question how do I know how many to run that's a good question actually generally if you run it, have a short test you can run it on a head node and kill it off if it's going to not hit the 4G I don't know like I have the code but how do I measure like you might pull up a window and talk to them about something like htop or some other or some other tools that can tell you how much memory you're actually using but they can't run htop yeah they can't they can't log into the node though to do it yeah as long as you're running a node so normally you can't log in directly to the compute node which is kind of what this discussion is about just because we don't want people running directly we want you to log into the head node and then run it however if you are running a job so there are two ways of actually getting on a compute node one is if you have a job that you're running on a compute node you can log into that compute node and so just to monitor it you can monitor your jobs and take a look at that sort of thing the other thing is to use you can use a command called qsub you can use a command called qrsh which just gives you a shell on a particular compute node and it's just like a normal qsub and you can just work on it from there but in this case I just go there and just do probably you can but in general if you made that that's why I was doing that it's fun let me turn off the lights here and then we'll talk about each job because it's a fun tool and I'm these guys' boss so I can order up Kyle for users to get a hold of nodes and that kind of stuff there's this thing called monitor node it's a good monitor it's all probably only on the head nodes I can't do I'm not running any jobs though you're right you are not it checks for that it checks to make sure that your jobs are running just run each job so each job looks like this and you can run it on a particular node and I end up using this a lot which is one of the reasons that I said hey Kyle pull this up and you can likely ignore this if you don't want to so there's a couple of cool things for you the first thing it does is give you a little text based bar graph of what the CPUs are doing on each of these in this case they're doing practically nothing however there are a couple things to look at one of which is the red this is time that's spent in the system the green at the time it's spending doing like core 3 here it's 100% is the time it's spend doing user code normally for your programs you want to be able to run htop you want lots of green and very little red that means the operating system and the computer are doing stuff for you normally if you've got lots of red it means something is going badly wrong occasionally there are really good reads for it but usually that's a bad sign it means that you've got multiple threads and they're all fighting over who gets to use what piece of data they're sending lots of data or something like that and not actually getting the work done for you so a couple things to answer your question one of the things you can do is you look at memory what I'm using here and it's saying we're using what is that about 1.2 gigabytes out of 16 gigabytes if I'm interpreting this correctly and the green stuff is the kind you're looking at so that gives you overall for that particular note how much it's using and you can also look down here and say okay let's look at the various jobs so KW, Jordan has a bunch of nodes that are running, Root's running one Nora's got some here I don't know who Lile is hey there you are he's on here and it'll tell you some things most of the time you can ignore priority it changes and it doesn't really matter on the head node because you don't have to worry about priority but we also talk about how much virtual memory you're using how much resident memory you're using this is getting into the numbers you were wondering about how much resident memory you've got how much shared memory which is you're using shared libraries and that sort of thing but these things can give you kind of an idea of okay how much memory is my system using I would probably be looking at the virtual and the resident resident is what's actually pulled up in the RAM virtual is kind of going to be their total that you're actually using and then what's your CPU percentage look like like here nobody's doing anything so everything's at zero but it can tell you how much of a CPU what percentage are we actually using so a lot of times for instance if your CPU percentage is sitting at 20% then if your code is running hard then but you're only using 20% of the core that probably means your code is waiting a lot it's waiting for the hard drive to get you to data that sort of thing so maybe you need to balance more and say you know what instead of reading a bunch of small little reads off of a file maybe you should get big hunks same sort of thing when you're dealing with working off of reading off the hard drive as we're dealing with the network is fewer bigger hunks tend to work a lot faster than a bunch of little teeny hunks of data either reading or writing so I tend to look at this time you can easily ignore it the total time it's been running and then you can figure out exactly what the commands were that people that you were running and that sort of thing does that help? so so to answer your question this would probably be the first tool I would look at but there are others out there but then you need to first log in the compute node if you're running on a compute node running on the head node then just run each top that go from there so a lot of times I would have two putty windows open one of which is actually running my code and a second putty window that's running each top and that way I could be running my code and I could be seeing what are these bars doing and does it look like it's working well or not thanks for letting me know sure can you sub-command let's there are tons of options or that you can submit for a few sub-commands if we go here again this is the main biocap page to sun grid engine most common thing you'll see is memory and time this person here they took the option of migs of RAM they wanted an infinite band and a run time of two hours this is the P.E. the parallel environment if you're not running MPI it's single this case they wanted MPI one dash with two nodes of MPI one that means you're allocating one at a time MPI one means you're allocating one at a time two means a total of two of those so now the one thing to know if you're requesting multiple cores is this is per core we've had people that wanted to use an entire mage so they said I want 80 cores in a terabyte of RAM which is fine it'll take a while to get scheduled but it'll go except when they realized they were asking for a terabyte per core and we don't have any machines that have 80 terabytes of RAM so the job never got scheduled the run time once your job reaches this run time it will get killed off so don't underestimate your run time yes if you don't it defaults to an hour and it'll run for an hour and then it'll quit and as a matter of fact that's on our effect page because we've had a lot of people say hey my job will run for an hour and it'll stop that's because you didn't ask for more than that run time and that's not a fault they ran for one hour kept going and it quit don't underestimate your run time if you're going to the error in one direction overestimate your run time there are disadvantages to that it becomes harder to schedule when you overestimate your run time especially if you do it by an order of magnitude which we've had people do that the other thing is we run into maintenance periods for instance we have a maintenance period scheduled for next Monday we have some maintenance to be done and we're going to take everything down for a couple hours several people asked me this last week why didn't my jobs run well because the scheduler knows that this maintenance period is coming up next Monday and you said you were going to use two and a half days it doesn't have any way of knowing that you really only needed five that you were overestimating by that scale therefore their job is not going to run until after the maintenance period is over so that's kind of the disadvantage of overestimation obviously the closer you can get to being right the better you are but don't underestimate because if you've asked for a week of run time and it takes a week and an hour to run your program you've just probably lost part of that the side question back then does it also take into account the time that the job is sitting in the queue waiting that's from the time your job starts to the time it ends is there any wiggle room I've had cases where I scheduled jobs for a week and it cuts me short six days and 20 hours and I know it shouldn't make a big difference but I've had a couple of jobs like that, does that sound familiar? if you could get us some job numbers we could take a look because I've not seen that before okay in theory it shouldn't happen in reality it may okay like I said and the other thing people do is they'll request MPI when they don't they just think that's going to give them more orders and that does not help if that's your job specifically we'll use MPI as a matter of fact it will cause things not to work correctly question back then why do you request an MPI and why would you not I mean I never actually used that by default it's off and Finland is primarily for MPI jobs okay so you can talk over MPI with that super fast network okay you can talk MPI over the switch network the production network but it's not as fast but if you're not running MPI that does give you no advantage anyway several other things here I'm not even going to go through most of these things CUDA, that's for the GPU nodes um in VinoVan or IB slots you notice that what I emphasized earlier it's right here on our page Bay of Cap will not magically thread your applications when we we put the main places that we can we still have people try it getting close to time here let's see what else I had I do have a sample Qsub you can make a Qsub file which I do here this is one I just made up again I was having a great several of these so it seemed to me kind of make one kind of general and kind of explain what we're doing here so normally on a shell script which is what this is uh a hash at the beginning of the line is a command but Qsub will take a hash dollar sign as a command for Qsub itself so I make two hashes at the beginning of the line and that way you can comment on one so again, feel free to copy this that I have in my directory can you say that you had two hashes at the beginning of the line here comment that out so that way you can take this file comment out one of the hashes and have it work being hash dollar sign and then the rest of the command how am I going to demonstrate that hit Q then nano or vi or whatever it is right so if I was going to nano make myself use nano nano so instead of activate any of these lines if I wanted to say specify memory 1 gig which is default but I could go here and say 6 gigs is what I want and I don't want 1 hour of runtime I want 7 hours of runtime and especially note the underscore per core which occasionally leads to weird math but you have to do it yes so anyway I put like I said the most common ones that we see here in the sample so all you got to do is comment out that first hash on here and submit your job different parallel environments default is single and one so if you had a non-npi job but which is using like open mp you could use 12 cores for instance I could do this one and now it's set for 12 cores the different mpi jobs we're going to get into in the next hour different mpi schemes that actually isn't working right now so ignore that you can do all sorts of things I've tried to comment them all in this file just some of the more common things people have done naming a job running what job we're running changing directories giving names for your output that type of thing I'm going to exit but not saving haha so again that's interesting directory over here the baocad intro directory feel free to copy that rename it I put it out there for your use I've had it out there my home director for a while but now that I have things that might actually find it even better I don't know if you're finding and doing this but would there happen to be a script that can tell you when the job is in the queue is expected to run that was actually the next thing on my list actually haha we as sysadmins we use a command called qstat all the time which will give you a list of what jobs are in the queue and normally I screen very large so this all fits on one line instead of two but this is telling me the job number the username the program they're running and how many cores they want as a quick example there is a command which was written by our very own editor status what did you I didn't write that one you didn't write status? no I wrote qscad I'm trying to give you credit here man let's say that's because it's not running well why is that haha by default status will give you all of your own jobs I don't have any jobs sitting out there right now so have it tell it screen status of that particular job why is that not running user let's try this there we are you have errors on this I'm not trying to point anybody out here but since he's here this will tell you that this last job this the previous one that we can see up here it was submitted this afternoon at 122 now if it's all working we've been having problems with the scheduler which is why we have the maintenance period the next week so I promise this will work I promise that it should work after we're done with the maintenance period it's a 122 9-4 that one may not show up just because it's small I think only requested one quarter right? try the next one it's probably fairly short look for something larger 870 there ok like I said we've been having problems with the scheduler this would normally tell you when this job right here this 122 870 I'm into this job here it would normally tell you when it is scheduled to run this is such a thing as summer if you start when you have many jobs on the queue and you want to know how many are running many are on the queue that status command yes I have a few there there that will tell you your submit time and anytime if you had any running it would tell you it started running at this point it won't it doesn't do the same job as queue schedule tells you when it is scheduled to run right but sometimes I have more than what the screen will allow me and I don't want to go down and just count them do you have such a thing as a summary which will say at the bottom you have this many jobs running this many jobs scheduled working out so you have 29 lines that's going to give you pretty close ok so those are going to be all my running jobs there is another high performance that I have worked with they had other functions and I think it was something like something like status where you would get the listing of your jobs that are running at the bottom a line that says there are this many jobs running then a listing of the jobs that are on the queue and a line that says you have this many jobs in the queue so that you can see not only what jobs but how many jobs which I would be running in the hundreds I think we could probably write something that would summarize this fairly easily my guess is that they were probably running a different scheduler that had that command built in or something like it on the other hand if you would send an email to the OCAD at CIS saying hey could we have this and we would like it to look like X my guess is that it would be pretty easy for these guys to skirt up and it would be pretty good that should be pretty quick and easy thank you guys so how many films which ones are running or are those just a schedule it will actually tell you both if they are in the queue and if they are running in this case this is all fun submit time it will say on there let's find one that's running here Cheung has running jobs instead of saying submit time this one says start time and it will tell you what queue it's running in it will tell you what node it's scheduled to it will tell you what queue it's running in yes I don't think so is there a way of changing the length of this I don't think so no not to the best of my knowledge but again this is all open source softwares though I can't see any reason why we couldn't this is obviously an abbreviation in there it's probably a constant built in somewhere and it may be just because I have a smaller screen too I haven't tried this out I normally run with screens that are this big that are all text based I got a 27 inch monitor and it's all just running internal sessions so I've never noticed that maybe it would work it would show the whole thing online I don't know queue scan status and then the only other things that you may need to deal with is we had somebody who submitted 1.4 million jobs well I got through about 100,000 and then realized something was wrong so you can use a queue Dell I won't use this for real because I don't have a job that I want to do but you can run this on your own jobs and you can delete your jobs from the queue it will only work on your own jobs what's that? it will only work on your own jobs unless I do it or M doesn't does that just delete everything? that delete, well I would give it a job number so I would say if I wanted to kill that one I would say kill 1,2,2,2,2,9,0,6 or queue Dell 1,2,2,9,0,6 and that would you can also give it a user to kill all the jobs for that user but that's probably not what any of you want there is one more command here that you may use occasionally as a queue alter you can kill all the jobs for a user or you have to kill them one by one can't you kill another jobs that have a job name starting with a certain you know, a certain number of characters I don't think queue Dell does this but quite frankly we would probably script that we would do a list of them and then say, delete that one, delete that one and make it all happen that way if you had something like that probably just quick email to the video cat this would be the simple switch to it you can use the queue alter command you can change the resources you listed say you submitted a job as MPI fill, which will take one node and try to use as many cores on that one as it can and use many cores on that one and you said, no, that's not what I wanted I wanted MPI spread to spread across as many nodes as you possibly can you can use the queue alter command to do that again, those are documented on the side grid engine pages you can only change those before a job has started once you submit it to the queue before it has started once the job has started its parameters are set and therefore it shall ever more be we do have, we have several people ask me, can you increase the time on my job because it's not going to finish in the time that I thought it was going to generally speaking the answer is no if it's really really important and important for get your research done and maybe if it's running for three weeks and you need another couple days we can probably make it happen but it requires killing off everything else on that node so it's a pretty drastic measure for that to happen so if we try to do that as little as possible the reason they don't let you change the parameters on that once it's started is because it'd be really easy to gain the system start up and say, yeah I need an hour oh just kidding, I need 300 hours so we don't allow that to happen but they don't even allow us to make that change as a system administrator it's pretty drastic if we want to change if we want to let a job go past the time that we have so once again don't underestimate your time that you need it for if it finishes early our system is perfectly happy to say, hey look we can move things up that's what I had for yes is there an option to get notified when a job is finished like something, get an email or something yes, if you looked at my submit queue I have to think I don't know it's a dash capital M I believe you want a lowercase m and a capital M because capital M gives you the email address to send it to the lowercase m tells you when to send it tells the scheduler when to send it I don't have that in here, I should I will update this shortly because I have several people that do that and it's really handy, I've done that on my own jobs to email when it's done and when it's starting that kind of thing the lowercase m if you're wondering you have a, b and e it's abort, begin and end those are the times that it would send it big question over here is there a way to automatically change the number of jobs based on the availability like if I assign 68 that we're going to cover in the next hour what we're going to be talking about is a variable number of cores depending on what you can get so stick around, we'll get there any other questions, comments or sniping marks nobody ever comes up with a sniping mark when you ask for it yes, so can we node through the bone count to the head node to baokat to the head node you can't run x applications on a node on another node we don't have x on any of the nodes themselves only the head nodes and so they are limited to the 4GB ram for limits and the one hour CPU time the what? oh let's see if I understand here you're saying RQ that is resubmitted and queued so probably something a node died or something like that so R is resubmitted Q is queued and QW down here is queued waiting which I'm sure you're all and hopefully R yeah R is where we want them all to be but a lot of the capital RQ is something that automatically you never had to do it yourself it was just the schedule none got taken out for reasons or something anything else take take 5 and then Adam will be talking about videos I believe I got a couple of things first definitely getting there very good