 So here we are ready for the next one. So yeah, this question, well, how do I get my stuff back if I've closed the shell? I mean, is there any better lead for what we're talking about now? So basically what we'll do here is we'll write a program which is like writing our order for the server, whatever. And then it gets sent off. We can do other things. We can close the shell, whatever. And we come back and we can see what the results are. And that lets us, well, close the shell, leave, go do something else. So the results appear on the disk on the file system. And you would look at the output files to see what comes there. Should we start with the first batch script? Yeah, let's start with that one. So we see an example here. If we go through the different lines, the first line, if you can point to it, is a hash bank bin bash. And that basically means this is going to be run with the bash shell. So all the bash stuff we've been learning can still be applied here. So the question here is that we basically tell the queue that when the queue will run this script, it will use this thing called bash, which is the terminal to execute all of the rest of the stuff. So this is basically like what tool it should use to interpret all of the commands that come after this. Yeah, okay. And then the next three lines should look familiar. So there's the hash sign and S batch, which tells it that slurm is going to be reading these things. So in bash and many other programming languages, this hash sign is the comment symbol. So for bash, these are comments, but for slurm, it will see them and read them. Yeah, so slurm, like we previously used with S run, we used these dash, dash options to give like time and memory. And we can give them in command line, but we usually in these scripts, we want to make certain that the queue system can read them and the queue system has its own syntax. And this is important, like it's specific syntax. So you cannot like add extra stuff there and it's not chat GBT, so it cannot like infer that if something is typed in a different way that you meant probably this, it will read this specific kind of a comment called S batch when reading these, and then it interprets it as a instruction to the queue. And all the same options we've been learning apply for S run and S batch. So the next line is an echo, which is basically like a print statement. And it says, hello user, you're on node, node to date is, and this dollar sign parentheses syntax means it runs the date program and then inserts the output here. If you run date from the command line, you'll see what date does. And then finally, there's two more comments and then the S run, Python, slurm, py.py. This is Python not Python three. So if your cluster doesn't have Python, but the program should run with both Python two and Python three. So in that last line, why do we have S run there again? Can we just say Python, slurm, py.py? Yes, we can, like we can leave the S run out, but the S run means that the slurm will think of this as a job step and it will like do extra monitoring for that job step. And it can also like do fancy stuff, like give a certain amount of resources for this step and certain amount of resources to another step. So basically in your script, you could have things like, okay, we'd give one CPU and this amount of memory for one process and then start another process that does another with certain other resources. This is very rare. So most often this is used because we want to emphasize that, okay, this is the heavy part of the calculation and we want to get extra monitoring from this part of the calculation. In certain, like when we get to the parallel part, there's also like certain things that are, like when we're doing parallel calculations, we sometimes need to use S run for specific kinds of parallel calculations. Should we, I mean, this is basically it, the lesson. I guess we can do one demo and then go to exercises. Does that sound good? So should we do this? And now this is interesting because we have to be combining more things we learned yesterday. So we have to be in a directory. We have to edit a file using nano or whatever editor you would like. We have to use S batch to submit it. We have to use the Q commands to see when it's done running. Then we have to run LS to see what new files appeared and then look at that contents of that file. Yeah, unfortunately, Zoom wants to constantly modify the, Zoom share size and not the size of the actual documents. Yeah, sorry about that. Okay. So here we are on this example. So are you on your computer? Can you do hostname to verify you're on the right computer? No. So we need to exit. Completely different one. Okay. So we're on the login node now and we're in the HPC examples directory. And what we just did here, we just ran the stuff to verify where we are before we start doing things. This is always a good idea. Okay. So we're gonna use nano to make open a file called run-pi.sh and I guess we can type those. So the bang line to tell it to use batch. You can of course copy it but just for the sake of like visualization, I'll write it out. Yeah. Okay. So time, memory and output. So this tells it the output file it will be written to. And if you don't give anything, there's some default name that's like slurm dash the job ID which works fine for basic testings but sometimes it's nice to give an output. Hello, your username, your on node hostname. The time is date. Okay. And then you don't need to comment slurm slash. Yeah. And Python three maybe. Yeah, maybe Python three. Okay. So are you sure that we're in the right directory to run this? Like does the file slurm pi.pi exist? So yeah. So if we, we can use less for example, to check less is like in the Linux shell tutorial there's example how to use less, which is like a pager that you can check contents of a file. So we have written slurm pi.pi. So I press Q to exit less and I will check if there's slurm pi.pi And this is a common thing that happens like you write the script, but the program doesn't exist because you're not in the directory you think you are. Okay, should we submit it? Yes, let's submit it into the queue. So s batch. I run, run, run. Okay. And then we'll do s queue or slurm queue. Oh, it already, well, so it's already finished. If we do LS, we see there is a file named pi.out. So that's probably good. Should you look at it? Yeah, I use the less again. Hey, there it is. So hello, your username, you're on some note. The time is something. And then the output from the program. So it worked. Yes. And it ran on some other machine. Okay. So with that, are we good for the exercises then? Yes. I mean, I think. Yeah, correct. I think y'all can probably figure out the rest. I don't think there's anything really very new here. Just looking at srm queue, resources, checking. Yeah, this is all stuff we've done. Should we give 20 or 25 minutes for the exercises? I'll quickly mention that when we are talking, you can sometimes hear we say things like bash and sometimes we say sbatch and sometimes we do stuff like that. They are different words, even if we mess up the pronunciation. So bash and sbatch, like slurm batch, it's basically from that. And bash is, I don't know where it even comes from. It's so old from the 80s or something. So you can't know. But basically, if you hear these words, it's very important to notice the difference. So bash is the terminal we're using and sbatch is the queue system batch, like processing, like submit something to the queue, submit a script to the queue. And these sbatch comments are comments for this program. So there's a difference there. It might get confusing, but try to remember that if something sounds a bit different, it might be because it is. But yeah, let's go to the exercises. So these exercises are very important because most of the jobs you're going to be running in the cluster are jobs that are not going to be interactive. Like you're like... Do you want to scroll down and show the exercises? Yes. Yeah, and I mean, you can try what you like. There is some really basic ones, some more advanced ones. You can go read the other shell stuff from yesterday if you'd like, basically use the time as you'd like. You can ask us questions about anything. Yes, and it's a good idea to ask in the notes because then we can, like previously, we can then go through the questions in the notes in detail. We'll probably go through some of the solutions as well because they show some philosophical things sometimes. But yeah, most likely it's a good idea to ask in the notes. Yes, okay. So switching back to notes here. I've already been entering the things here. And yeah, okay, so I put down back at 15 minutes past the hour. Okay? So I guess we're good then. So see you later then. Bye. Bye. Hi. Welcome back. We are back from the break and the exercises. So now there's been a lot of good questions in the notes. So let's go through them. And then we can probably go through one or two of the exercises that emphasize some of these points here. Okay. So let me know when to scroll as we're reading here. Yeah. Let's see. We can focus on what's not answered already, I guess. Yeah. So quickly to mention again, is espach a command or a comment? So it's a comment to the espach command. So that might sound like a bit convoluted, but it's like the espach command. When you give it a script, you give it some file, it will read that file and it will read those comments. And from those comments, it will determine what resources your script needs. It takes the script and it keeps it to itself until it finds the correct place where it can run the script. And then when it gets the correct place, so the correct table in the analogy, when it finds the correct place, it will put the script there and it will execute the script. So the espach comments are for the script so that the script can know the resources it should reserve. Where should files be made? Maybe we can talk about that in storage. Yes. There's different time formats. Yes. So slurm, about the time formats, slurm has like a million different parameters. So espach command has an srun has. So the most common ones we are listing here, but you can check them. Either our references or the documentation itself for all of these different variations. Slurm can do a lot of other things as well. But usually these are the most common ones you're going to be using, but you can give it all kinds of fancy commands. But we are not going to go through all of them. But time formats, we are using this hours, minutes, seconds format so it's visual pleasing. But they are there as well. So let's scroll. Scroll. How do we leave the editor? Yes. Do you leave quitting? Don't see any jobs running? Well, that's because it finished very quickly. Yes. Yeah, but when you run espach and you give it the script, you will usually see something like submitting a script and then you give a number. And that number is the ID that the queue knows. You can later on use this number to see information about this job. Everybody gets a ticket, like a queue system. Everybody gets their own queue number. And this number constantly goes up. So it doesn't mean that if you're number one million or something that there's a million people in front of you, it just means that you're the next number in the numbering. I don't think many jobs have gone through the queue previously. And then it will give control back to you. So then the slum will just take the script and submit it. Yeah. Okay. Debugging maybe we can talk about in the final Q&A at the end. Yeah. Is it possible to do anything that will affect or break the server for other people? And that's a lot of text. I think the last one is basically if you could break something for other people, then that's our problem. So you can't delete anyone else's files. You can't use up more than your quota of disk space. You can run a bunch of stuff on the login node. But that's not affecting all the other, most of the cluster where jobs running. One thing that used to be a problem was disk IO bandwidth, like making too many small files that would take up all the network access. Maybe let's talk about this at the end. Yeah. Yeah. This is something that can affect many clusters. So we can talk about it. I will also mention that one thing that wasn't actually mentioned there is that like one thing that you can do bad is to give access to like some other person to using your credentials to the cluster. So you shouldn't ever do that. You shouldn't give your password and stuff like that. Because then are like, let's say like a spam sender gets access to the cluster and suddenly they have like hundreds of nodes that can send spam emails or do crypto mining or something like that. Yeah. That's not something we want. So be mindful of your passwords and that security and that sort of thing. And if you see anything, if you see something reported like that, that's the usual thing. If you see something like that doesn't seem normal. Yeah. Next question. Can we include other thing in bash scripts like virtual bars? Yes. That's a really great question. And I think you can tell exactly where this is going. So anything you would do in the shell you can do in the script to set up and do things. Yeah. So the idea behind like scripting and writing these, like why are we using the terminal in the first place? Like the idea is that we can write a lot of different like procedures, arbitrary things like create a folder, make an output file here, copy this file here, like make certain that my code is running correctly, compile this thing. We can all of that write into these scripts. And then we can let the queue run these scripts on a machine somewhere else. So we can write all kinds of things in the script that like set up our environment, set up our virtual environments, set up the things that our code needs. And then we can let the queue handle the running in some other machine. And you can close the connection. You can just like leave the cluster. Once you have submitted the job and you notice that it's made it running, you can leave the cluster and go to have a holiday or something. I don't know. Like when you come back from the holiday, your jobs are finished. Or you can like put them running through the weekend or something like, you don't have to be there when the job is running. What you need to do is write the instructions how to run the job. And this is the power of these non-interactive serial jobs. So this is why we use the terminal. This is why we use these scripts so that you can do all kinds of stuff there. Like, yeah, that this is very important and very good question. Yeah. Okay, most of these I think are answered that. Let's do an example. Yes. So I think there were more votes for exercise number five. Okay. So I'm switching to CMO screen. Yeah. So exercise number five. So let's demonstrate this. So there were questions of how do you see the output of your code? We will talk about monitoring in the next section, but let's this exercise already like paste away there. So what we do want to do is we want to create a slurm script that runs the following program. So let's create a slurm script. So let's call it exercise file for example. And I will just copy this. So this is the program I want to run. Yeah. And over here we want to add the usual suspects. Yeah. So this might seem like something. So you notice that I'm not even looking when I'm writing this because I've been writing this so many times. So you get accustomed to the stuff. So like these kinds of things you always want to write the same literature. Like this is something you use all the time. Yeah. Can we set an output file name? Yes. Like what about output monitor test or something? Yeah. Or are we going to move this to the next one to monitoring? Well anyway. Yeah. So let's save it. And should we submit it? And then if we slurm queue, do we see it running? Yeah. Okay. So someone just submitted a program and they want to see is it actually working while it's still running? So we know the output name. There's this cool program called tail. And if you do tail dash F. And then monitoring test. So what does tail do? Tail prints the last few lines of a file by default 10. Dash F means follow, which means keep the file open every time it sees a new line. Then print that immediately. And this is basically the common thing you need to do. You have a big program that's writing stuff to a file. You want to be watching it to see as it's running to know if it's working okay. This is the pattern to use. So here we see every 10 seconds, which is what that little script does. It prints out a, the date. So you could be running a big machine learning training or something like that and tell it, okay, every X iterations, please write out the current state of the training. And then you go in your terminal, you do tail dash F for the output file and you're just watching it. Yeah, I would emphasize that like, like this terminal, like why we are watching the output, this is not doing anything like this specific terminal. If I cancel this job, it doesn't cancel the program. The program is still running on the other machine. We are just seeing what the program prints. So through the file system basically. So we are not actually running the program. And this also means that like, if you're not watching the thing, I wouldn't leave these tails running all around the place because like, if you're not watching the output. Is it even like important. So, right? Yeah, like, like the output is being written anyways to the disk so you can watch it whenever you want to watch it like if I now want to like go back and watch it. I can go back and watch it. But I don't need to like leave this tail running so that I can see the output. I can come back whenever I want. I can log out and I can come back and the job is still running there. Yeah. Okay. Should we go on to monitoring now or do you want to show canceling? Yeah. Okay. So with the S cancel command and by copying this job ID, it will stop running. Yeah. Do you want to tail the output file and see what it says? So here we see slurm step the error. It was canceled. So it tells you why it ended. Okay. And quickly, I'll mention that like in the exercises, if you didn't manage to do exercise three and four, I recommend like we and we don't have now the time to go through them in detail. But it's good idea to at least check them and check the solutions because like they try to emphasize that like what's being run like when you submit a script, the slurm takes that script and it will keep that in memory and then it will run when it finds the correct slot to run that script in. And if you modify that script, like it's not affecting that one that is already in the queue, but if you modify the code that the script will run, it, the modifications happen when the script runs in the future. So it counts becomes a bit like terminator to like this kind of like who did I kill your grandmother before I was even born is kind of thing so you don't want to do that you don't want to edit the scripts while they are in the queue. You don't want to edit the code you're using while you're submitting something else because then you create these kinds of like, okay, what will be actually executed what am I submitting. I don't mean anything anymore. So it's good idea to just avoid the whole thing and when you submit something do not edit the code anymore because that then that you know what you're getting at that point. Yeah. That's a good question. How to stop the tail control C should do it or command C or control. Yeah, I don't know what's an apple but probably control C as well.