 So next we'll go to another example, which is running one program, but is using more processors for it via MPI. So while I'm getting set up, maybe Seema, would you like to explain what MPI is? Yes. So yesterday when I was talking about the tools of scientific computing, I didn't mention like some of these more complicated tools of high performance computing. And MPI is one of those tools. So MPI or message spacing interface is this kind of a way of programming programs that can work across like you might have heard words like Exascale and stuff like that floating around nowadays. So that means that you have lots of computers working together towards a common goal. So if you think about, for example, like weather models, like if you have weather forecasting every day happening in all around the world in these big clusters, what they are doing is that they have a physical model that is really big, lots of data points, really big physical model. They simulate the weather in this, in this kind of cluster, and they do it for multiple different initial conditions and they get the like an ensemble forecast of it. And for that to be able to be calculated, they need to have like these huge clusters where the communication is done between different, different workers. And MPI is the de facto standard. There's others as well, but MPI is the most popular standard of handling this communication in these complicated simulations. So this is like if you really want to scale up above and beyond one computer, MPI is the way of usually doing. Yeah. Okay, so this demo is using something called lamps, which is a molecular dynamic simulator, basically simulating how atoms are moving around. So here you can see I have my shell where I'm doing things, and then the history of the recent commands I've run. And I'll try to keep stuff organized so you can see all of it. Okay, so first I made a new directory in my work directory, which is where research usually goes. I change directory to it using a bash substitution here. And now I'm here. So first off is lamps installed on the cluster. So we have a command module, spider, lamps, which will tell us what we have. So there's different things. And I see, okay, lamps is installed. So from my trials today, I saw this is the one that I need. So I will, so case matters here. So I load the capital lamps one. So what Richard is now doing is he's using some of the software already is installed in the systems. So all of the HPC clusters, they're maintained by admin such as us, and there's often applications that are used by many people. And there's like, they might require some like I mentioned about the MPI some of these applications might be, they might be tricky to install because they need special things to, to communicate. So we sometimes sometimes install these programs, because it helps everybody because everybody can then use the programs so a lot of the software is already present there already available. But because we cannot simply like install everything for every user, we have to use these kinds of module systems to make it so that everybody can load whatever software they need, so that not everybody needs to have the specific software loaded. Yeah. Okay. So I want to download from this page, this input file. So I could download it to my computer and then upload it to Triton, but we can try to copy it. Use a program called W get that will download any given URL from the internet. And I run this and I see an error. So it says it's not acceptable. So what I think this means is that the site is trying to block automatic downloads of things, which is really annoying. So this is what I would normally do, but it doesn't work. So instead, I will make a new program. Make a new file gas.lam using my terminal editor. I will open this in the browser and I will copy everything and paste it in here. Yeah, these kind of stuff is something that you might like, it might seem like complicated at first, but later on, when you get more accustomed to the command line interface, like doing small edits and this kind of fixes from the command line, you don't want to necessarily program your whole program from the command line, but doing these kind of small edits, it becomes faster and faster. So I see by using tab completion, I can type LMP and push tab and see, okay, there's a program called LMP MPI here. And I happen to know the searching the site, it's in and the file name. So is this going to work? Actually, I don't know if it will work. I think I need, yeah, okay, so we see something runs, but there's an error. And I'm not going to get panicked, but I'm going to look at what the error is, and I see cannot, okay, there's some error, support for writing images not included. So basically this is using some features which I don't have on this computer. So I'm going to go back and edit the file. And then I'm going to look for the stuff about images, which is here. This brings like a topic like that is important to understand that because the HPC systems are usually made so that they are stable. They're stable, they're long term supported and everything like that. So the base software in the system is usually quite old. And that is basically by design, because we want to have it secure and stable, like the base operating system. So if the software, usually the software has to provide the features itself, like let's say the table writing. So the operating system is usually quite minimal in the cluster itself. So you might encounter this kind of stuff. And then you might be better to think that, okay, do I need to do the plotting in the compute cluster itself or can I do the plotting after the fact, after the code has run? Is the plotting integral part of the actual simulation? Yeah. So let's try again. So I run this and, okay, something happened pretty fast. But this has only about 100 atoms. So I'd expect it to be pretty fast. So how do we run this with more processors? So actually a quick way of timing it. Now I'll run it with time in front. And let's see what happens. Yeah, what Richard is now doing is basically what we don't recommend you to do is that to run on the login node. Like you noticed that Richard didn't talk anything about the queue or anything like that. So currently Richard is running stuff on the login node. In this kind of a small application, it doesn't really matter because it takes only a second to run. But if it would take an hour to run and use all of the processes on the login node, we would send you a mail that okay, go to the queue. There's plenty of resources available there. Don't run on the login node. Yeah. Basically, well, what I've always figured if it takes a second to see if it runs at all, I do that on the login node. Once it starts taking more than say 10 seconds, then I'll start submitting things. Does that match what you would do? Yeah, typically something like that. Yeah. So if this started running and was actually taking a significant time, like more than 10 seconds, I would cancel it and then go to the next stage. Because like all I'm trying to do now is make sure that it's not failing. And it's much better to do that interactively like this. So let's try running this with multiple processors. So for MPI programs, you run it via something called MPI run. And let's say this needs, let's try running this with five processors. So this is running on the login node again. Now it's using five processors here. But it takes less than a second, so no one's going to mind. So okay, so I've established that it can run with MPI and actually doesn't crash. Notice that. Yeah, here we see that the time actually increased because the problem is too small for any benefit from the parallelization. In a real case where the situation might be that you want to run a thousand times bigger system, then you actually start to get benefits. Yeah. Do you have any helpers who can be answering questions on HackMD? Ah, yes, there they go. Okay, so now we need to do the real thing, which is make this submission script just like Simo had as the demo. So I will use the editor at Nano because it's nice and simple to make something called submit.sh. So our initial boilerplate, so the actual code will be module purge. So I like to purge all the modules and reload to make sure the environment is what I expect because mixed up environments is one of the worst possible things that can go wrong. And now I can copy my command here. Let's see, it started like this. So we don't need time because slurm will time it for us. And instead of MPI run, it's srun because this is inside of slurm. And we don't need to tell the number of processors because we tell slurm that directly. So if we go up to the start, we start giving it the parameters. So time is five minutes. You have an exclamation mark? Yes. And the equals is missing. Is equals needed in the long option method? Yes. Well, this needs a very small amount of memory. And now how many processors do we use? Let's try four. Actually, so before we go too far, okay, let's leave it with five. So I'm going to go to nano and edit my input file. And we need some more atoms here. This is too small. So I'm going to increase the number of atoms and increase the box size to make it larger. Maybe I'll also have it run longer, which I believe is at the bottom here. Yeah, okay. So we will exit and save. Yeah, so again, it's so much faster to be editing these small files directly on the cluster like this. Okay, so are we ready to submit? You can try. Like nothing were wrong with trying. Something might go wrong, but if so, we'll fix it. So I use S batch and it says submitted batch job this number. And if we do slurm queue, we see it says it is running. Okay, it's nicer if this is longer. And we see, okay, it says it's pending. And if we test again, now it says it's running. And here's the note that it's running on. Someone had asked that. Okay, it's still running. Do you want to look at the outputs? Yeah, okay. So if we do LS to list the files in the directory, we see there's logged out lamps and slurm output. So this is the job ID of what got submitted. So less is a program that views files and lets you scroll through them. And if I scroll down, okay, yeah, it's going. So it's about this far so far. If we look at log dot lamps, then we see, okay, it's basically the same thing. So it's like in this, we'll talk tomorrow, we'll talk about different parallelization things, shared memory, MPI parallelization. So there's differences between like different codes use different parallelization features. And some, like we talked yesterday, do not necessarily use any parallelization parallelization at all. So maybe the array jobs would be fit better. But like what Richard is showing here, it's nothing magical with what you need to do. You usually need to specify what you want the computer to do and you give it the requested resource requests and then you get what you want. And now we, I think we are good to go and start running our first job. I wanted to show, there's a few more things to show here. So we've got the, I run Slurm history one hour to show my latest jobs. And I see here is the same job ID, the script, what ran. This tells me the amount of memory it used. So I guess my 100 megabyte request was about good. It shows me how much time it ran. It shows me how many, how much CPU time it used and what well time it used. How many tasks were running and finally what node it ran on. And there's one final thing to look at. So let's take this job ID and look, let's look at SF of it. And this tells us the efficiency. So through the stuff, the important things are CPU efficiency. So we see that it was able to use most of these CPUs. If I had run this and I had said it used like 5% of CPU time or 50%, then I would know that this code doesn't really work well for many processors. And then again, there's memory efficiency. So I see it used actually more memory than I requested because each of the five things used that 47 megabytes of memory. So I actually need to increase the memory. Or in reality used option memory per CPU, 100M like this. Yeah, so now I think that's done with the demo. So this is what you'll be able to do by the end of the day tomorrow. But what we don't provide you is programs that work like this. So it's up to you to determine does the program actually work in parallel? And then how to make it run? Yeah, we will, of course, like you can ask, join us in the garage and ask in the issue tracker and places like kind of what works and what doesn't. But like, yeah, like usually the workflow is quite simple. Usually you have something you want to do. You go to your cluster, you check the program that you want to do. Use you write something that tells the computer how to do run the program that you want to run. You run the program and then you take the output and then you do something with the output. So usually it's nothing magical. It's just you need to do a few of these steps to tell the computer what you want it to do. And for that you need to write it as this command line script usually. Yeah, I see an interesting question down here was lamps already MPI run aware. So yeah, this was basically built in a way that Slurm knew about it or it knew about Slurm and all that kind of stuff, which is not a obvious thing.