 Yeah, next up is serial jobs, which can be found here on the schedule. So I will go ahead and open it to see, but what's a serial job and why? Yeah. So, so like we already said that like a couple of times with different wordings, like sometimes you don't want to be there sitting in front of your computer typing commands, like that becomes repetitive very easily and not efficient, and you don't get the best out of like if you start to have like million terminals open, like if you feel like a main character in a cheesy hacker movie or something like that, you know that something, they must be a better way of doing it. Like they don't show it in the movies, but in reality, if you want to run commands, you usually want to write them as scripts. Yeah. So, uh, similarly to if you write your code as a script, like you have a Python script, Matlab script, Julia script, whatever language you're going to be using. Uh, there is a scripting language for command line and like you can write commands as, as this kind of a script. So a bunch of things that you want to run, uh, like basically what you would type in the command line, you can write them into a file and then run that. And there's also, like you can, you can do that in any system, like that, that you can run these kinds of scripts. You can write these that you can run in any kind of system. But there's additionally slurm scripts, which is basically like, uh, it's, it's the same as a normal like script for this command line, but you also tell the queue to do, do it in some other place, basically, you tell the queue manager to take over and run these commands. So instead of like running the script file where you are currently sitting, uh, or where your command terminal is, you tell the queue manager to take over and run this in a place that the queue manager thinks fits the script and the queue manager determines it based on their requirements that you have specified. So in the example script that Richard is showing in the stream, we have a very simple script. So we, we, at the first line, we have, uh, like the so-called interpreter. So what program is used to run the script? And usually it's going to be this batch. So you can just copy base the same, same thing for every script that you're going to be using. There's advanced options, but let's not deliver into too many details. So the first line just is, as it is, is this so-called she bang that tells that, okay, run this with that command line. And next few lines are then these s batch comments. And these are basically similarly that you gave, like this, uh, you gave these, uh, flags to the s run command. These comments are written in a completely the same way. You have the same syntax here, but you just preface it with this hashtag and this s batch. And that syntax needs to be this exact way. So you need to have this hashtag and capital s batch all the way through and then a space and then some argument. And this, uh, this argument is then read by the queue manager that determines, like it reads, reads these comments as this, uh, kind of requirements for the, for the job. And it determines the requirements from these comments. And then it, it takes over. Basically you give the queue manager the script and the queue manager will read, okay, this script needs these kind of resources. It finds the place for the resources and then it executes the script. So basically it's like, instead of running the commands the way you are currently, uh, you defer the execution later on to a proper place where the script can be run. Uh, so, and this is why we consider shell one of these fundamental things for computing because you need to be able to make scripts that are complicated enough to do what you need. Like for example, Simo's array jobs, by knowing a little bit of shell, he was able to run something much more easily than doing it 10 times yourself. Yeah, so the command line is basically this kind of like, uh, like a wrapper of basically like, if you think, like if you are using Windows or Mac or whatever, you might have used these kinds of like also these tools that basically automate, let's say clicks or something on your mouse. This is basically the similar kind of thing that you basically tell, okay, type these commands basically for me for the computer and the computer will basically type these commands and it will run the exact same lines. And this makes it important that you like, the script should represent what you should, what you would write interactively and it will run those and then it will, it will do the things that you want to do and it will run them in the system based on the requirements that you Yeah, so should we, should we do some live samples? Okay, let's see. Uh, I think we've discussed this introduction. So let's go to our first batch script. Uh, here I am. So I'm in my work directory. What should we call our working space here? Um, our, so Richard is now creating a directory. Called kickstart 2022 and he's going into that directory via the CD command. Yeah. So, so now, and not with the PWD command, he's listing his current place in the system. Yeah. So if you think about like, if you have like a process explore, like resource explorer or something like that in, in a finder or whatever in Mac, uh, if you go to a certain folder, basically the commands would be executed in that folder now relative to that folder. So let's use a simple, like command line editor, like nano to create a script. Yeah. Uh, so nano is this kind of like a minimal editor which has help, helpful hints at the bottom of the screen so that you can, you know how to use it. Uh, and let's write this. Hello. That is age and again, like we said before, it can be really useful to be able to edit some files quickly on the other computer. Yes. And the age, like the, uh, the ending S H it's, it's just a shell script. So it's, it stands for shell and it's standard to be like S H. It can be other stuff as well, but usually it's standard to be like S H so that people recognize that, okay, this is a script. Yeah. It's to be executed. There's an interesting question in hackMD. Why is the dash L option needed? So here it doesn't have been bashed dash L. Hmm. So yeah, this is a good question. It's a bit on the technical side, but it's, it's a very good question. So the dash L option makes it so that, like, it, it performs the shell performs as if, if it would be an interactive session, it will load all of the login, uh, scripts that you would load whenever, like, it's certain, this, this certain profile files and stuff like that are not loaded if you do not have the L option there, it can be also detrimental if you have specified some, uh, profile options for yourself that you didn't want to be loaded, it might load them, uh, automatically. Yeah. But, but basically it's like with the dash L, you make it extra certain that it, it performs as a, like, uh, as an interactive terminal. Okay. So we've done this and now let's see. Is there anything else? This is describing how to run it. So let's see. So I save it and luckily now it tells me. Yeah. Let's quickly go through, uh, what we have in the script, inside the script here. So what we have is an S run command within the script. And, and this means that, well, let's, let's run it and then, then this, because they be first. Yeah. So nano tells me how to exit this carrot sign means control. So control X and I do yes to save and enter to say to write the same thing. I will do LS to verify that. Yes, it's there. So we run with S batch. Yes. So many, those people who have already written scripts, you might be familiar with like dot slash or whatever to or bash script or something. And that's, that's basically to say that, okay, I want to run this script right now. Like, but we don't want to do that. We want to run it through the queue. So we need to give the script to the queue manager and you don't need to specify any of that stuff. You just give it, you just give it the S batch. And the. So this is what Cmo said. So this would execute it directly. This would also execute it directly, except it wouldn't execute it asynchronously. It would try to run it right here and right now. Yeah. So we want to do S batch. Yeah. So, so just like a recap, if, if you want to submit it to the queue, S run, S batch, S batch is, is to for scripts, but S run is to commands. So now we see that there's like this output that Richard got that submitted batch job and then a number. So each batch job, similarly to the S run command that you have run, it gets a number that you can identify and look it up later on in it. Like, like I mentioned, also, previously, in other clusters, you might need to specify, for example, account statements in the S batch comments or partition statements or similar stuff in order to get the script to run. So whatever commands you would give to the S run command to run, you can give to S batch command and it will run. Yeah. Okay. Should I try storm queue to see? Yes. It will probably show nothing because it's already finished. So then what's next? Let's type LS to see what happened. Like, because like, like here we see the file. Yeah. So, so you noticed that we didn't get any output on the terminal when we run the script and that's because like of the non-interact nature of the thing, we could have like, let the script run and just go for a coffee break or whatever. And it would still be running asynchronously in the queue, like, like, it didn't need the terminal anymore to run. Once we hit the S batch script name button, one free press enter, and we saw the job submitted, we no longer need to like worry about it's, it's now running somewhere. And that, that means that the output is also not produced into the terminal. So where does the output go? The output goes into a file. So we get our normal like output, this print statements, we get into this output file. And in the script, we specified the name for the output file to be this hello dot out. So if we use, for example, cat or cat and eight or should we use less is nice. Yeah, yeah, yeah. So let's say you need to look at the contents of a file, you could copy it to your computer, which is slow and annoying. Yeah, you can use less to open it and then you sort of can scroll through it or you can use cat. Yeah, to quit less, you press Q just for those who typed less hello out and are stuck. Yeah, there we go. Yeah, cat will basically catenate whatever is in a file or like you can give it even a binary file and that will mess up your terminal. So don't try to catenate like a binary file or image or something like that. It will print you garbage. But if you have this output file, it's easy to like use cat to check what the file contents were. So now you see that the output that like the command produced ended up in this output file. And now you see that here that we have a hello data starter one, you are on node time is something like that. But that wasn't written in the script itself. Like that, that text wasn't in the script itself. So if you compare to the script, we have in the script, we have these dollar sign things hanging around instead of like instead of these texts. So what these dollar sign things are, they are environment variables. So this is again a shell or command line thing. So there are various of these environment variables that you can specify. And here we basically evaluate the script only when it's running. We run the script when it's running in the node, in the compute node. So then when it's running, it will look up the user environment variable. So the username, it will check the host name variable, which is the CSL 47 in this case. And then at the end, it will run this command in the sub command. So it's a bit obfuscated because it's a command line thing. But you have to think that more, these are all like standard shell tricks. So you don't need to know them to do basic work. But once you know them, you can do a lot of cool stuff. Yeah, but the important thing together from this is that this could be anything like you could have anything running what any kind of commands running here as long as they can be run from the command line. So this is why it's very, very important to usually write your programs in a way that you can run them from the command line. Because once you are able to run them from the command line, you can plug that command into the script and you just specify these environment, all these variables, all these comments at the top of the script to specify what kind of resources you want. And then it will be run remotely, basically on the compute node. So it can be whatever, like choose your own adventure here after these comments. So you can have, you can run Python, you can run Matlab, you can run R, you can run NPI programs, you can run GPU, GPU using programs, whatever, like you can have, as long as your program can be run from the command line, it's good to go. Yeah. Okay, so. What's next? Is that basically most of what we need to say? So setting resource parameters we've talked about and with what you know in the interactive jobs and what we've said already, it should be clear how to set these. Should we, should we run one job that doesn't finish, finish? Oh, actually, I think that's one of the exercises. Maybe. That's also with our grace period, it might take a little bit too long before it gets killed. Anyway, but yeah, so you can set the different resource parameters, the interactive page gives a full list of them, but it's exactly the same between the two. Monitoring your jobs is actually what we'll talk about next. So we'll tell, we'll summarize the things we've said, like slurm history and so on. You can cancel jobs. Partitions is something that we don't talk about much, but that's because it's not really, well, at least on Triton, we automatically detect partition. So what was your analogy or metaphor for partitions? So partitions are basically like, if you think about the restaurant and you have like tables inside the restaurant, tables outside of the restaurant, and you in some clusters, like in Triton, you basically, you get the table that fits your group best. It might be inside or it might be outside. It might be next to like a cubicle. It might be open table. Like it, you might have different kinds of tables, but in some, you can usually specify that, okay, I want this to be outside. It doesn't necessarily help you eat the food all the same outside than inside. But in some clusters, if you need, or in some special cases, you want to specify like the partition. For example, in CSC, you need to specify which partition you want to run job. And that's basically analogous to telling the queue, like the waiter that, okay, I want to eat at the outside table. And then you, the queue manager will notice that, okay, I will limit my search for possible places where this job can run to the outside tables. So that is how it goes. Like, yeah. So on some clusters, it matters and you need to actually select, like, are you going to something for small jobs or big jobs or so on? On Triton, this is automatically detected. But you can learn more about this by reading the information on your particular cluster. Yeah, in many, many, for example, again, the CSC, like many, like huge jobs are allocated into their own queue because they want to be able to fit like these huge jobs and having lots of small jobs with the huge jobs might make it harder for huge jobs to fit. So it's easier to like, you basically have a one big dining hall that's reservable for big parties. And then you have lots of small tables in a nearby room that have small tables for like few parties of few people only. And they won't give like the whole dining hall for only two people. So basically that kind of thing is possible. But it's nothing complicated. You just then specify the partition that you want with this dash dash partition and the name of the partition. Yeah. Finally, if you look at our reference page or right below, you can see some reference on all of the different commands and all the different slurm options. But we don't need to discuss them anymore now. So we've got exercises. How should we do this? It's 24 past the hour. Should we give 20 minutes or maybe 30 minutes for these exercises and then go on? That would go a little bit over the schedule, but I think we're answering where. Yeah, I think and this is like the crux of the whole thing. Like this is this is probably the most important lesson of the whole course, like if you know how to run cereal tops, you're good to go to run like the rest is just like spring like this is the main cake and the rest is just like different kinds of well, different kinds of sprinkles on top of it. So if you know how to run cereal tops, you're already really, really good to go. So let's give 30 minutes. Yeah, let's do something like that. Yeah. Okay. So and yeah. So what we do here, it's like, oh, go ahead. Yeah, yeah, go, go around. So during the exercises, there's one which is basically doing what we've done above checking things. Then you try submitting and canceling it. Things like looking at the output. And then some more. And these are quite well, they're not very advanced, but things that you don't need to know in order to use the cluster. So if you have extra time, you can think about them, but really don't worry about them at all. And instead, explore and play with everything we've been doing. Yeah, and I'll also mention that after the course, if you still feel unsure about the like using the command line, I highly recommend checking our bash course material on the command line usage of various other command line material. CSE has good one as well. Code refinery has good one, like any material that makes you feel more at home with using the command line. The slurm thing is just a little like being on top of it, which is this, this resource parameters and the S batch command. It's nothing fancy, but but getting accustomed to the command line is very important. And I also highly like even if the example here looked very easy, it's good idea to write the scripts by hand, not simply copy paste them, because then you make the mistakes of like type or making a typo of a of a comment or something like that. It's very common to like think that like it happens to everybody that you think that you know what the syntax is. And then you write it out of memory and it's wrong. And you have to type it like a few times these serial top files in order to get it so that it's it's actually the syntax is always correct every time. And and also like if you if you do a mistake, it doesn't matter, then you just get an error that's probably says something like unknown command or you run out of memory because you the memory requirement was specified incorrectly or something. It doesn't matter that much, but it's good to get that out of the way like basically like just try it out and see how it goes. Yeah. OK, so let's get right to it then. We prepared the HackMD with the exercise information. So see you in about half an hour. Let us know any problems via the usual way HackMD. Bye. And we're back. Hello. I hope you all had a nice little break there. I mean, first we can quickly look and see what important questions we have from HackMD. Simo, did you see any? Yeah, one one good question. Well, the first question was that why do we have S run within the script itself? Like what's the point? And the point is having S run within the S batch script is to to get like additional information about what the script is doing. So when you have this S run statement, it basically means that Slurm will like record this so-called job step. So it will like it will record that OK, like I will need information about the specific specific command that will be run and the specific command should get all of the resources that the that were requested by the job. You can also like give it less resource if you want to, but usually you you want to give the resources. This is especially important when you're running MPI jobs, where you need to. We'll talk about tomorrow, but where you need to have the Slurm allocate all of the relevant workers for the job, but S run is usually good to to add because then you get additional information what the script is doing. You don't need to preface every command with S run, only those like that are the actual calculations that, but then let's say you have like a complex pipeline or complex workflow where you do some pre-processing, you do some simulation and then you have some post-processing, maybe then if you have S run at each step, you can see how long each individual step takes, you can see what memory requirement was for each step. You can see if it failed on a specific step like you get more better information. The other thing that you can like there was a question is that if you have multiple S run stages, I have an example of the so with S run, you get these extra lines like here for echo, which shows the individual memory. So if you had multiple things running, you could better understand the behavior. Yes, otherwise it will just like say that, OK, everything was underneath the banner of the whole script. Like which command actually used the resources and then it will calculate them for the script itself. OK. The other thing that was asked is that if you have S run statement, do you need to specify any other requirements for it? And usually no, like usually S run inherits like everything inside the script inherits the requirements of the S batch statement. So if you have these S batch statements at the top, everything inside the script will have the same resources available for you. So I think those are where the most pressing questions, I think. OK. So let's go to the exercises that and I think Seymour will demonstrate. Yeah, so let's go to his screen. OK. So yeah, let's go through the first exercise. So we had the basic batch job exercise, which was that let's submit that batch job that run host name. I'll call it like run host name dot SH, the script, open with nano. And I'll first write here the bin bash. I'll use the dash L, why not? Then I'll specify the time limit. So let's say the time limit is one hour and 15 minutes. So what is the syntax here? So is it one 15 or one 15? Well, the syntax, yeah, you can look it up. But the best syntax is to use this hour, minute, second. So this way, you get the correct time. Yes, you can also specify days at here at the top start if you need. But usually, well, it becomes apparent when it's needed. But you can specify it in hours as well. And then memory limits, I use mem 500 megabytes. So you don't need to put the B at the end. I almost did it. But it's only the multiplier, so 500 megabytes. And then let's add the S run host name. We could run without S run, but let's add S run anyway. Now let's save the file and say yes. And I'll submit it. I submit it. I'll run slurm Q. Oh, it already runs too fast. And I'll run LS to see the output. I'll catenate the output file. OK, so it ran on PE84. So yeah, the job name. Now the output file was this like automatically generated slurm dash and the top ID. But maybe I don't want to like create. I want to run the script again and again. I don't want to create a different output file every time. So let's open the run host name script. Let's add a SPAT statement called output equals, let's say, host name output.out. Like that's as explicit as it can be. Also, we wanted to actually specify the host name. And I didn't say that. So let's write it again, put host name, say host and output.out. And job name, I think it's written like this. I'll check for you. Yeah. Well, if it isn't, if we will get an error. So that will give us. I'll give it something different. So let's say my host name script. And now I will save it and save it. And I will run it. So let's run it. I'll try to catch it before it goes. There's the shorthand for slurm queue called slurm queue. Is it pronounced? And we can see that it was pending and now it's already done. And let's look at the slurm history for 10 minutes. So here we see previously, the first script we run was run host name.sh. And the second script, it has its custom name. So if we want to identify our jobs easier, we can use the top name to figure out if we want to make certain that we understand what we are running. So we can then look our jobs easier in the output. OK, so let's, so does the printed host name match? So over here, we had the PE84. And if we look at the output, now we see the host name output.out. So maybe too redundant the naming, but we can catenate it and it's PE84. So it's definitely run on that now. Yeah. OK, so what's next? Submitting and canceling. So let's create a script called sleep.sh. Or let's say sleep job.sh. And we want this job to keep running so that we have time to actually cancel it. So we first write the usual liter liter cheese. So let's put time, let's put 10 minutes memory, 500 megabytes, let's say like that, and put sleep 300 there. Just for a variety, I won't put this one here. Mm-hmm. And let's submit the job with s-patch. OK. Now we see with slurm queue. OK, there it is. And let's cancel. OK, take the ID of here or over here and s-cancel it. So Richard, when do we need to cancel our jobs? Usually. I guess if you don't want to do it anymore, like maybe a common case is you realize you made some mistake in it. So you need to, you don't want to waste the resources or you submit a bunch of jobs and you see that the first ones are dying. But you make a big array job and you see something's wrong in one of them so you want to stop all of them. I guess I don't really cancel jobs that often somehow. Yeah, if you don't make mistakes, you never have to fix them. Yeah, but it's good to keep in mind because otherwise you might end up, especially with the interactive sessions and stuff like that, you might end up with something running. And what we didn't mention that much, well, we'll mention it in a second about the priorities. But we can mention it a bit later. Let's finish the exercises first. So let's write this. Yeah. So this is a script which will run 30 times the four i in sequence 30. And it will print the date and then sleep for 10 seconds and then repeat this 30 times. So the point here is that we're going to see the output as it's being generated, which you often want to do. Like you submit something that takes several days and you're like, OK, I really want to see what is coming out of this job live without waiting for it to finish. And this way you can do that. There was a question in the Hack&B that why does this fail? And it fails because there's no such command as four. And it's internal to the bash itself. So you need to put srun on top of the command that you actually run. So let's say the date command. So let's say that would be the main thing we want to run. You want maybe to put the srun on top of that, not to the other ones. I'll put an srun there just for fun so we can see what the output looks like. Let's save the file. Maybe I've missed. Oh, OK, I made a change. Yeah, OK. OK, I'll just save it and I'll move it and move the file. So this is something when you make a mistake like this, you can fix it by moving the file. So now it's a renamed file. So let's submit it. OK. Yeah, so now it's running. So where would the output be? So it's running there. Let's open a new terminal so we can see it on a different terminal. So I'll close this one. So my connection to Triton is closed. OK. I have this shorthand makes it my connection to Triton a bit easier because I make maybe 100 connections per day. So I have all of these shorthands. I don't worry about them. OK. So you're going back to the same place you were before. Yeah, like my place. So this is a very important demonstration. You can log out and log in again. It's still going. Yeah. So OK, here we are. So let's look at what the queue is running. It's still running there. So I could be sleep. I could be working on other stuff. It's still running there. So let's look at the output. So let's use, for example, less to see the output. So it's the job ID is that. So we need a specify output file. So there is it running. Do you want to tail it? Yeah. So what is tail? So if we specify this tail, if we follow the file, basically we follow the file output. So if you have something that pins output, you can follow the output and see how it behaves. Usually, like in best case scenarios, you don't need to do this. But if you need to see what happens, then this is one way of doing it. So there it's running happily. And let's cancel it. OK. And let's look at the output of the Slurm history. So we see here this is the sleep job. And because we had the S-run statement next to the date statement, we get job step for each. Every date has its own thing there. So let's say instead of date, we would have something that runs a simulation or analysis that takes an hour. We could see output for every simulation step. This is especially important when you have something that you have like a biasing process or something that you don't know when it's going to end. So you can check how different variables maybe, how long did it take. So you might want to do something like this. Yeah. OK, the advance stuff, we can probably skip those. Yeah, we can comment and go forward. Yeah. OK. So we left ourselves a bit of slack at the end of the day because we wanted to have enough time for everything. And it's going to be helpful, I think, at this point.