 to real life examples. So, Yarno will give an example of using Julia and ArrayJobs on the cluster, and I'll give an example of using lamps and MPI. So what we're gonna say, you don't actually know what this is. Like you don't know what an ArrayJob is, you don't know what MPI is, and so on. But the point here is to show you what the basic idea is. So you can get excited and see, look by doing a little bit of scripting here, someone can run a thousand things overnight and come back and see the answer. So it's really more to let you know this is cool rather than to be able to do it right now. But you will in two days. Yeah. Okay, so this first example is one where I've actually run a hundred things in one command. So we'll see how it goes. So what you're seeing now is me logged into Triton just from the previous session. And let's just first show the folder that we'll be working in. So this is Julia Ising. Okay, if you know the Ising model, then you already know what is going on. Otherwise it's a simple model of veromagnetism that has this kind of interesting behavior that I want to showcase to you. But I ran 10 copies of the job manually of the simulation manually. And I got these nice graphs. I did this part in Python. And here is this transition. Now for a computational physicist, this is not good enough. This is not clear enough. Like I can't fit a line or the function I want into this one. So I want to have a hundred points and I want to have a lot of points especially in this region, right? So I don't want to manually run a job a hundred times with a slightly different one, changing one parameter as we go. So instead I will use an array job. An array means like you do the same thing, but it's okay, you do task one, task two, task three. Yeah, an array is essentially a list of numbers. So, or a list of jobs. Yeah, okay. So how much do you have to do to start it? Should I run it again with a slightly less white? So I just have the manual run results here and it's a bunch of files with different indices and then the plotting scripts. And I have also copied the plotting script to this array run folder. So I will want to put the results there. And now I will just show the script. So this is a script that will run, but we'll run the jobs that I want to the parameter, many parameter values that I want. And I'll just open it up in nano. So we'll see some of these things tomorrow. The details don't matter that much. The important thing is I'm specifying here that I want to do a hundred different things. They will have different indices. Then I'm doing some math to change, calculate a parameter value from this index. You could have, for example, a hundred parameter files that you generate somehow with the script. Or I mean, there's many ways of doing this, but... So is this basically saying that array task zero does temperature zero? Array task one does temperature point one. Array task two does temperature point two. Okay, so this little bit of math and using the command line argument for the program. Yeah, so what happens then is I run the program and I give this temperature argument that I have just calculated. And then also kind of important that the result goes into the array run folder. And then to a file that depends on the array index. I hope that looks somewhat clear. It kind of cuts off the line here. Yeah, but okay, it's array, it's just a variable there. Okay, yeah. Okay, so what Sloan will now do when I submit this is that it will run all of these on different computers on the cluster at the same time, basically. Okay, so espadge will submit the script that I just showed you. Just give the script name to it. There we go. And what I would usually do is kind of follow from the queue. This is a command for looking at the queue for my username. I think you will also see this tomorrow. Yes. Or today after tomorrow. Okay. So this will show, I'll make the window a little bit wider so that it doesn't go on two lines. It'll show a bit of details about all the jobs. But all of these are now running at the same time. This is a hundred different processors doing something for me on the cluster. And they're all doing slightly different things. And because it was a relatively small example, they're all done already. Okay, so what? I mean, it could be a hundred different things doing something that takes a couple of hours. Yeah. So what, in a few seconds we did an hour's worth of work or 10 minutes' worth of work, something like that? Yeah, I did 20 seconds for each. So it would be 2000 seconds. So, yeah. Okay, like. I mean, if I was doing it manually, it would take a while and it would be annoying. I mean, in a realistic case, I mean, this was a relatively quick one. I actually did one yesterday where it took a few minutes for each. And then of course, it doesn't happen immediately when I type it here, but it will still do it automatically. So one thing to note is that I've less to list the files. There's a bunch of these new output files for each job. But I actually wanted it to add the results to this ArrayRun folder. So here they are now. Okay, you got a hundred outputs. Yeah? Yeah. And then the same plotting script because it will check all of the. Okay. It will check all of the files in the folder. So, in fact, go to the other folder and run it here. Oh, here we have a bunch of, they call it magnetization in this case, bunch of these measurements and a nice energy to temperature curve here. And I guess it's really easy to run even more if you needed it. Yeah, so I could quite easily increase this to a, so oops, I used nano to last time, right? So, let's say that I'm not actually satisfied with this. I want a thousand points. Okay. In the same range, so every 0.01 temperature unit. Yeah. So write it into the file, submit the batch job and now there will be a thousand of them. So it might take a little bit longer but it shouldn't take too long. Okay. There are a few of them still running. Yeah. But also if you check this array run folder there are already quite a few of them there, there's 300 already. Okay. Yeah, that's. Unfortunately the way I wrote my plotting script it will fail if one of them is not complete but now it should work. No, there's one of them that is still not complete. Okay. That is like a better. We get to five. Like there's a lot of points now, but. Yeah. Yeah. So the basic point of this example is that with these array jobs you can run a huge number of similar jobs jobs in one go. Just set up the parameters or even calculate the parameters and let it run. Yeah. Okay. Should we go to my example then? Go ahead. Or is there anything else? I guess no, no, that's cool. I will grab the screen share from you. Okay. Here we are back for me. I will. Okay. I need to arrange things some. Sorry. I should have done this earlier. Okay. So. Yeah. So Yarno's example was doing one simple thing over and over many times. So my example is using something called MPI or message passing interface which lets multiple cooks work together to share information, share data. So this will be a molecular dynamics simulation. And it lets me, well, it lets you use multiple processors for the same single task. So, okay, cat has to go. The food time is in 23 minutes. So it's getting a bit excited. First, I'll remove everything in this directory to start from scratch. Okay. So yeah, let's get going. So there's a example here which I found online of lamps demos. And it's someone's lab that gives different examples. And here, gas.lamp is an input file for the lamps program. So I will copy that link. So here I am in my empty directory. So first I need to get that input file onto the cluster. And since it's connected to the internet, I don't have to download it to my computer. I can copy it directly to the cluster using the program called WGIT. And actually you'll find that it says it fails. And that's because it thinks I'm a bot. But there's a really simple way to get around this. At least this worked yesterday. Yeah, so here we see it says gas.lamp is saved. And I will list the directory and we see, yes, it's here. So now I want to run the lamps program. And tomorrow we'll talk about how to get software on the cluster. But I know that I can do something called module load lamps. And it says, oh, okay, I'm running a lamps from 31 March, 2017. And it's been compiled with these options. So okay, so when I'm starting, I will first establish that it works at all. And with that MPI run lmpmpi. So what's the problem with what I'm doing? The problem is this is running on the login node and that is not good. But it's okay if I stop it immediately once it starts just to show that like the program's not crashing at all. I only want to change one thing at a time. I don't want to get into a place where I don't know what the problem is. So I push enter, it's doing something. It's probably using a lot of the login node. So okay, well, hmm, that doesn't look good. So it says things like support for writing images in JPEG format is not included. So I know from my previous example here, I need to modify the file. And if you knew what lamps was doing, like if you were actually using it, you'd probably know how to do this. So I come out and I see there's a thing it's trying to save images. So I have used a nano editor to remove some of these commands about saving images, saving video. And this is another important point. So oftentimes you have to modify your programs a little bit to make them run on the cluster because you don't need 1,000 windows popping up showing you the output. You need 1,000 files that you can process later. So as usual with nano, I will do control X and Y to save. And I could try it again while we're talking. So what we will do next is to make a run script for this. I really have to wonder why this is taking so long to start because I'm using too many processors or maybe it's not giving any output because it's actually working. Anyway, I will stop it for now. And there's some really weird error messages but that's what happens when you have MPI programs. Okay, so what do we do? Let's make the script. I will call it gas.sh for shell script file which is what we use for the array jobs. I will write it from scratch here. Let's say five gigabytes of memory. Let's say 10 minutes of time. And now we're doing, is it end tasks? Let's say two. So this is what we need for two processors. Equals son in the end tasks. Yeah, Simo, do I need the end tasks here also or is this okay with MP6? That is okay. Yeah, okay. But I think it's LMP, MPR. Oh yeah, of course. Okay, so yeah, this is something that you'll learn and see tomorrow. So let's not go through it much other than this one number here is telling it how many processors to use. So I will save this. Oops, I forgot to use nano, but that's okay. If I list my directory, I see a log file from lamps, the input file and gas. So let's S batch it. And if I do slurm queue, this is what I will do. Okay, so it must have finished fast. So let's see what happens. So this often happens. So something, you submit something and it might have broken somehow. So we see slurm, this is the output file that's been made. So instead of showing stuff straight to our screen, it gives it to us where we can look at it later. I will use the less program to open this. And we see something, something, MPI abort, gas. Ah, it's gas.lam. Okay, so I made a silly mistake when submitting the job. That happens, it will happen to you without itself. But that's okay. Nano gas.sh. This gas.lam, the file name changed from last time. I will S batch it. So notice here I'm pushing the up arrow key to scroll through the history. So with the shell, I'm not typing the same thing over and over again, but I'm finding the previous commands and running them. Okay, it's also done. Let's look at the new output file. I'll make this shorter. Watch the problem now. Hey, well, that was fast. So look, we did all of these different time steps going down. Okay, so what now? So now we do the real part. If I nano the gas submit script, I will change this one thing and let's say we want 20 processors. Now this would be 20 processors. Or we could say like for a large enough traditional HPC code, people might be using hundreds or thousands of processors spread out over many different nodes. And by declaring it, you see that. Are there hack and de-comments? Maybe I'll give a demonstration while people, maybe y'all can talk more. Well, I make this change. I'm going to make this problem size larger. So it was a relatively small problem. Let's make it 10 times bigger. Yeah, so now we can think about like what happens if we make the box 10 times bigger in all directions and add more particles. How many particles did you add? I multiplied the two linear dimensions of the box by 10 and particles by 100. Yeah, so if we now think of like, which matters more? Like the box size is, it just means that like we have less gas, if we have a box with a gas inside of it, if we make the box bigger, that might make the simulation take longer. But what more likely will matter is that we added 100 times more particles. So most likely the scaling here is dominated by the particles number because that matters when we have interactions between the particles. So most likely the time will be like 100 times longer or something like that. Okay, and it's done. So I'd have to make this quite big in order to need the extra CPUs.