 Yes, okay, and this link is in the notes Also, so okay the queuing system and this is what I Wanted us to get to from the last one so now we're talking about what makes the cluster unique Should I give a shot at it? Yeah, go right ahead So well, how about we have this metaphor so at Alta universities. There's two Linux servers which are designed for students to use to run Calculations on if they need something with a little bit of computing power they're called brute and force and They sort of really are a brute force solution So what's the problem here anyone can connect there and run anything? So basically they're always overloaded and there's so many things going on on these single nodes That Everyone is slowed down and it takes longer for everyone to get through So the main difference of the cluster is slurm itself So slurm what's a queuing system workload manager? Whatever? So slurm is the thing where you can tell okay. I have this program that needs Five CPUs and I expect it to take about two hours and it will put it in a queue wait until you have those resources schedule it on these other nodes and then Return the results back to you and that's really the main thing we're talking about for the next two days how to tell slurm what you need and Then see the results come back What do you think was that a good? Yes? I think that was completely correct But I would also add that slurm isn't only about Managing the resources it it can give you a lot of other stuff besides that So it can give you monitoring output of what your job was doing. It can it can show that sort of like Monitoring output it can Help you organize your work. So you can you can run a lot of different programs at the same time so so you can Run like massive programs at the same time with minimal minimal changes to the code Yeah, and it helps you like act like get more productive and and it's basically the way that you Want to use the cluster that is the way that the cluster has been designed like that is the backbone that the cluster has been designed for so so like Everything in the cluster has been designed in a way that it's going to be used through the slurm so you really should get familiar with slurm So let's scroll down to this metaphor here So there's two parts to the metaphor So both are about food again. So sorry for the people that didn't want to talk about pasta anymore But we're about the scheduling resources part here. So there's a restaurant You go to a restaurant and there is more people who want to eat than their space. So you talk to the house manager and say hey, I'd like a table for two people and They put you on the list and when a table opens up. They'll look and see what's available So they might say this table is for two people in which case the first party of two gets it they might say This is a table for two people, but the next party is for people So they have to wait, but the one after them you can come or they might even say Okay, this is a table for two people But there's another table for two people opening up next to them soon and I needed table for four people So they strategically leave it be empty for a little bit and this is basically what's happening on a large scale automatically on the cluster So here's an interesting question. What happens if you go to a restaurant and you say yes, I'd like a table for 10 people and then You wait and then you actually only have two people eating What's the effect on you then? well of course the effect Effect on you is that you would be waiting a longer time because like the house manager thinks that there's 10 people coming So he will have to reserve a table where there are 10 spaces but then when only two people show up then all of the eight other spaces will be empty because Well, they didn't show up. Yeah, but you needed to wait wait for that table to be free Yeah, and they're probably not very happy with you and won't give you any kind of priority next time you come What happens if you reserve a table for two people, but then you're four people Well, then two people need to stand stand or you need to do it so that two people are seated and then they eat their food And then they stand up and the two people who are standing Get seated and then they eat their food. So basically when people They are eating they it will get crowded and that means that you are not going to get the effectiveness or the similar simultaneous usage of the Simultaneous eating. Yeah, you expect Yeah, and these are all real things that happened on the cluster people reserve too many resources They reserve not enough resources and so on. So that's why we're talking here Is there anything else on this page? We really need to say if we scroll down so so Yeah, I would add there that the the main big part here is that slur not only manages the The process of like Getting you the table that you need this kind of correct kind of table that you need but it only also manages the uh The cooking process in the background. So basically like you don't like You can just say that uh say that you want to eat the pasta or something like that Somebody will make you that pasta dish and bring it to the table and you get the results that you want Without having to do the cooking yourself. So so you can do this parallel thing Yeah, you order 10 different dishes at the same time and they all get made in parallel and brought out to you or or Or you don't have to like cook cook yourself like you don't have to book a table and then go to the kitchen and Cook your own food and then bring it up. So basically This analogy we just get in strained by the moment Is meaning that like yesterday there was a question of what happens if uh, if you lose a connection to the cluster when you're running something and Slurm kind of manners manage these jobs so that they are running on on the background in the In the correct places so that you don't have to be connected to the cluster while the stuff is running Uh, we'll be talking about this later today, but but basically this is something that Uh slurm can also do Yeah, so this next section we talked about the basic process. What are the resources slurm manages? So as in when you're requesting this table on the computer, what are you actually asking for? And that's basically Some number of cpus Some amount of memory and how long it is There are a few more dimensions here, but Yes, so these are the basic ones that that every program has And these are the the things that we talked about yesterday How you can like give it like a rough estimate on these things But these are the basic things that slurm manages And and we'll be talking about how how do you give this information to slurm in the next coming sections? So there are these flags that you can give slurm, but we'll be talking about those in Yeah, I guess we don't need to go through that. Okay. Um Yeah And there's all kinds of other submission parameters For example, you can request a table that's all by yourself As in an exclusive mode if you're doing performance testing or You need only the latest kind of gpu or latest kind of cpu architecture because of benchmarking and stuff like that But we don't need to talk about that now. That's for later um Partitions. Hmm. What's the restaurant metaphor for that? Yeah, maybe we yeah, maybe it's like uh, you want to to eat on the patio or in the veranda instead of Eating inside or something like that. So you want to only use the tables that are Um on the outside or something. I don't know but but basically maybe the restaurant analogy won't work with this Yeah, in in many clusters The the different kinds of of compute nodes are separated into these partitions. So uh, if you're using um Other clusters or in our our cluster also there's few of these partitions that yeah, like they're I have a good example. Yeah What about so you're doing a debugging task like you just want to test something out quickly So you come to a restaurant and you ask to sit at the bar So you can order quickly you eat and then you dash out of there right away And you can skip the long queue for the Sitdown stuff Yeah, that might be like a good analogy, but I think and maybe we should Go on and I guess we can also say at least on Triton the alto cluster. You don't need to worry about partitions that much so We have a script that will automatically set all of these things for us But on some clusters, this actually is important and you have to say where it is running But I I think it's uh The queue is kind of like like the matrix that you cannot be told what it is without like actually testing for what it does Yeah, yeah, maybe maybe let's do it Yeah, and actually we're right on schedule now if we go on so Okay, let's see from hack and D. I will switch there and I mean the notes Let's see if there's any interesting questions About slurm it's learned the same for terso or other clusters So slurm is the generic open source product that that manages the cluster and these days it's used by Almost every cluster you can see Certainly all the ones in finland Yeah, there are other other managers like pbs and uh Q sabba or something like that but but The main idea is that all of these do the same thing. They just have different kinds of commands and And and stuff, but basically they all do the similar kind of a thing. They all manage these resources and they have a queue So yeah, like it's Translatable, but slurm is the most popular one. Yeah And this next question is really good if I submit a job on slurm and after that It's submitted you modify the code to test something new When the jobs run which versions run so we actually have some exercises. We just added today About this so We'll see but you're thinking the right way here. So I think if you're asking this question, you know what the answer is and Yeah Okay multiple things to slurm at once. Yes, and once we get to tomorrow to array jobs You'll be submitting hundreds of things. Well, not in the demo Remember yesterday. Yarno submitted a thousand slurm jobs at once in order to run everything together And that's sort of the main point