 So here we go. Yeah. So, so, uh, our great jobs. So for a refresher, let's, could we, could we run the previous CQL job that we previously run? So like run the same thing? Yeah, maybe we can run it so that we can then look what we update when it comes to the, uh, our great jobs. So in a CQL job, we had this script. So Richard is now finding the script. Is it hello.sh? Yeah. Okay, yeah. Yeah. So here's a serious, simple serious curve. So, uh, we had this, uh, this is bad statements that specified some resources that we needed for our job to run. And then we specify those, and then we had this one magic line at the top, which specified that we want to execute them as command, command line commands. And then at the end we had something we wanted to execute. And these we often prefaced with this s run parameter so that we get more better out, better like information of what the job was actually doing. So this was the basic structure of a serial job and we run these yesterday. So what if we want to run the same serial job, but for multiple different, let's say data sets, we want, we have the same code. And we want to run it on a multiple different data sets, or we want to like switch one parameter there between each run. So we want to submit, let's say, 10 hundred of these serial jobs. So one solution, of course, would be to just copy paste the, like take copies of the same serial job, like the SH file, and make small additions there, like small changes to the parameters. And what arguments we give to our code that we want to run, maybe a different file or something like that. But that can get unwieldy really quickly. And also it's not best for the Q system itself. So there is a better way. And this is a structure that is reaching into the, to the slurm itself, which is this array structure with, which basically tells us that, like by specifying this array, like this s batch comment, similarly to the other s batch comments that we have in the serial script, we can tell the slurm to launch basically identical copies of the same job, like at the same time, it can launch 10 hundred thousand. Usually if you go above 10,000 or something, or if you start to begin being the thousands, you should probably ask us if that's the best way of doing it. But usually you can run like huge number of jobs anyways. And what what the slurm then does is that basically it does the copy copy pasting for you. It like basically launches all of these different jobs individually. So you get like, so let's say you have an array running from zero to 10, you will launch 10 jobs at the same time. So slurm effectively thinks they're different jobs, but similar with a slight difference somehow. Yes. So the slurm will basically like take all of the rest of the script as it is. Like if the rest of the script would be written without the array statement, slurm will will take that like identically for each of these jobs. Well, this sounds like, okay, we didn't like now we're running 100 times the same thing. What does that like, why, why would that be better? Like now we're running the same thing 100 times. But the main difference is that for each of these jobs, slurm gives this one environment variable, which is this array task ID. And that is a number and that is the number of the array index in question. So let's say you have running from zero to 10, you would have 11 jobs going from zero one, two, three, four, five, six, seven, eight, nine, 10. And each of these jobs that are running independently in the cluster, each of these will get different number. So you're number one, you're number 10. So basically, if you think about like, you have a pack of cards, and, and you want to shuffle the pack of cards. You can split the pack of cards to like four pieces and give every pack to your friends and everybody suffers their own piece. And then you collect them back in together. So this in this kind of way, like you can split, let's say your data, or you split your system in some way, and it's based on this array number, each different person does a different thing. Or if you want to do the kitchen analogy again, like, let's say you have four cooks in the kitchen, one makes pasta, one makes, or one makes the, like, Ravi, I don't know, one makes the sauce, one makes great cheese and one puts the plates on. And everybody chooses their work based on schedule on the, on the, that they have on the wall. What a better analogy for array jobs be like you have different homes all producing the pasta separately. And then you all bring it to the same picnic or something. Yeah, that's, that's, yeah, that's a good analogy as well. That's like one does something gets into something like they all have the same instructions and they have a sheet saying. Number one makes pasta X, number two makes box to Y, and you give the same instruction sheet to all the homes. And you say, okay, you're number one, you're number two, you're number three. And then they look at their instructions and do that line. Maybe, maybe we should know that we have like the basic idea explained with analysis that are not necessarily the whole thing. Maybe we should view the first simple array job that you might want to run. Okay. And, yeah. And we were thinking that like we will run this now. And after, like, you can watch this demo, you can watch it. But after this, you can try it out yourself. See, yeah, if you see how it goes. Yeah. So everything we just skipped over in the lesson is what we've been talking about. So we're getting to the real time. Okay. So should I do this? Yeah, let's do that. Let's do that into a new script. Okay. Let's write our first array. Array.sh, I guess. Yeah, that sounds like a plan. So we start similarly as we did with the serial job. So we first determine like the interpreter we want to use. So we add the magic line at the start. And then we add the same time, time and memory requirements, what we want to use. And these time and memory requirements are independent for each array job. So you don't have to do any multiplications in your head. Like how many, how do you split up the time? It's like each worker, each array task will get runs for 15 minutes with 200 megabytes of memory. In the output file, we have something strange going on. So we have few of these wild cards that Richard has written there. So this percent signs. And these are like the, these are the percent capital A is the like the common number for the whole array job as a whole. And the smaller case A will be the array ID for the individual task. So you will get output from all of these into individual files. I want everybody to write into the same file because it can become hard to read what the job is actually doing. Okay. And now, and now we are running array from zero to 15. Yeah. And let's add it here like. I'm copying this. There we go. So this is the number that tells the job who it is. Yes. So what we expect, like, like you remember yesterday we had these scripts that had these environment variables in them and they were the environment variables were only initialized when the job was running on the computer. Similarly, this array task is only initialized. Like we have this dollar sign and this specific name for this variable. But then we, when, when the job is actually running, it's going to be replaced by some number. Or are we like a sleep there? Like a small sleep before or after both is probably fine. Yeah. So that they don't go through the queue too fast. Yeah. Okay. Yeah. So are we good to go? Yeah. Okay. Control X to save. Y, enter. Okay. Here we are. Now let's submit, I guess. Yeah. We submit it similarly to the CDL jobs. We submit it with as batch statement. Yeah. And then let's learn queue it. Yeah. So we know, you notice that we submitted only one job in like we submitted job one, but you see here when we look at the queue itself, we have 15 jobs running. And you notice on the right side, they are running on a different machines like different CSL machines. And on the left side, you'll notice that they each have this underscore index. And this is the slurm array task index. So basically each of these tasks, zero is running on CSL 48. One is running on the same machine, but six is running at a completely different machine. Like each of these are independent now. Yeah. But slurm manages to like launch all of these like basically do the copy pasting of the script that you would have to do yourself. So you sort of manage to sort of launch all of these. Yeah. Like simultaneously. So let's see, should we storm queue again? Yeah. Oh, okay. So they're all done. We're going to look at storm history. Yeah. Let's make it wider. Okay. So here we see. Yeah. Like all of these different array numbers and they all look quite similar. So same steps, the same echo, different nodes. If we go over, should we look at the outputs? Yeah, let's look at the output. So let's LS. And we see, okay. Yeah. There's all of these here. That's the same number. The same thing. Yeah. So as Richard specified in the dash dash output like argument for slurm, the, the different files had different outputs based on the, like the main number of the array job, like the main shared ID for the full like big job. So basically you have one job that consists of smaller jobs that are basically identical with the exception of the array ID. So like when Richard is now concatenating this one, one output eats the output from that specific job and it will give it this specific ID that it was given. So in the end, I guess you can, since we have the whole power of the shell language here, we can do whatever we'd like here. And if I'm correct, the next section. Okay. So this is all what we've done. Yeah. The next section gives a bunch of examples of how to use this. Yeah. So, so we will go through a few of these examples, but we were thinking that it might be helpful for you to, to run this example now so that you get like a grasp of what's happening because like after that, we will look how you can do this mapping between this array index and some different parameters for your code or different data files. And like you can do all kinds of mappings from this not single number to even lots of complicated things. So, but, but before that, it would be good to get this grasp of what's happening. Yeah. Like how do you run this array job? Like this most simple, simplest case. Yeah. Should we give 10 minutes to, for people to play around or? Yeah. Should we go for a async exercise time for 10 minutes? Yes. And we can also answer in the HackMD if you have any questions related to array jobs. Then we will discuss, we can discuss them on the after the exercise. Okay. So until 31. Yeah. So, so just to reiterate, not the exercise one, let's do the, the same example that we just did. So the first example on, on this page, let's, let's do that so that you can test out how to run an array job. Okay. Yeah. And let's see any important questions. Well, the array jobs queues separately. So slurm. So they basically do queue separately, but the slurm manager can handle them efficiently. So it sees one thing that gets split into many separate ones. So as soon as there's room for the next one, it takes off of the array queue and starts that. Yes. So, so all of these are like, they all, because they all start at the same time, they usually have the same priority. And we haven't yet talked that much about job priorities, but basically like, if you have space to run 15 at a time, you will start the first 15. And once the first 15 are gone, then the next 15 will start. You can also yourself limit like this option in the array syntax that you can limit the number of jobs you want to run concurrently. So you can limit the amount of jobs as well. So you can say to the array that, okay, I want to run only 12 jobs at the time. And this might be useful if you don't want to like make it so that the array jobs takes all of your priorities, or you have leave something a little for yourself so that you can run interactive jobs at the same time as you run like an array job on the background or something. Okay. So see you in a few minutes then. Bye. We are back. Yeah, hello. Yeah, so, so there were good questions in the HackMD, like it was good to have this kind of a small little break so that we could like discuss it. So I will read a few of these. So first question was like, is like memory requirements, are they like multiplied, divided, like that would be the array job and no, they are not. So each of these rest of the sperm statements or these SBAT statements that you give or these comments that define the requirements for your job, they are identical between each array job. And then there was another question that like do the different jobs wait for each other, like do the different array indexes wait for each other and the answer is no, like they will run parallel. You can probably create a structure. I will have to think about that, but you can probably create a structure where you run them as like this kind of dependency job, but, but, but that's more advanced trickery. But in general, you want to use array jobs in this kind of situation where you want to run independent things together, like you independent things, you want to run independent things. And so each of these jobs should be considered as individual ones. So you can think of it as like, if I would just write instead of this sperm array task ID, I would write here a number five. Would this job run like independently? Would it run as it, as it is? And if to another, if I would change that number to seven, would it run again, like this kind of IDM? Okay. Should we look at how do you actually utilize these array jobs? Should we look into that? So let's, that's the more example section here. So what are more examples of array jobs then? Yeah, so, so if you know, like, let's say the the contours diagonal argument, if you're fan of mathematics, like, you know that everything that is like countably infinite, you can like map to each other. So basically, like if you have numbers from zero to infinity, you can map them to anything that you can count, like not real numbers, but but something that you have like, how many apples do you have or how many people are one number for each person in the world and so forth. And the same idea, like you can, you can map these integers that you get from the array task ID, you can map to various configuration options, for example, like you can map them to different parameters that you want to run your coding, you can map them into different data files that you want to analyze, you can map them to different code files that you want to use if you have like multiple code files. But the useful case is that you have this so-called embarrassingly parallel situation. So you just want to do a thing and you want to do the thing again and you want to do the thing again and again, but with different variations. And all of these tasks are independent of each other, but you just want them all done. And in these cases, array is the best option of usage. So should we look down? First, we have reading input files. Yeah. So can you explain how this goes? Yeah. So in this example would be that you would take the array task ID and then you would have in your code some structure that the code would like, let's say, read an input file and it would read the parameters from line 20 and then use those parameters. So let's say your, oh sorry, your code would, given an input file, it would give it an input file with a certain name based on the task ID. So you would have a number there. So like here, I guess. Yeah. So basically you have some application and it takes input like the number zero file or something like that. So here you would rename all your input files to input data underscore zero, input data underscore one and so on and so on. Yes. Yeah. And I guess if you don't want to do that, to rename them, then the next option works. Yeah. Yeah, you can, you can. Should we scroll down? Yeah. Okay. So what's the next option? Hard coding arguments in the script. Yes. So like, like we, yesterday mentioned like this for loops and all kinds of like command structures and code structures in the command line itself like the vast interpreter can handle all kinds of like structures. So you can, for example, use this case, case structure, which basically says that based on the number of the slurma array task ID do something and you can do this to let's say in this example, the, you can change the seed number for the pie generator that we previously tested out. So let's say you want to run a code with different seed values. You can, of course, you use the array task ID as the seed value itself, but let's say it's something other seed, other you want to choose or you want to choose arbitrarily what you want to do, then you can, you can choose it based on the task ID. So this is a good way if you have, let's say, let's say you will have like three algorithms that you want to test out at the same time, like you want to test out three algorithms that you want to use. So you want to, based on the array task ID, choose which name of the algorithm to use, and then you give it as an argument to the code that you're using. Yeah. Okay. Of course, your code has to, like, understand the argument itself. Yeah. It's like, in this case, it's opening. Well, yeah, okay. So, okay. Let's continue going down. Parameters from one file. Yeah. So let's say you have a, like this file that contains, like, like parameters that you want to run, like some, some parameters. So in this case, let's say, iterations that you want to use, like different iterations, you want to test out how the code performs with, like in this case, the Python generating code, how it performs with iterations. This, like, this is, like, 1D hyperparameter. So, well, like, this kind of parameter testing that you have, parameter space that goes from, like, you have, you run from, but you can, of course, do it in... Yeah. So, like, here, the first array index is 100, then 1,000, then 50,000, then what's that, 1 million? Yeah. And in the code itself, we use shell trickery as, again, to pick a correct line from the file. So we use this set function, set program to get the correct file, the correct line in the file. What you might also want to do is, like, in your code itself, you would give it just the input file of the parameters, and then you tell the code which line of the file it should use. And then the program would read the input file, and it would choose it if you don't have to necessarily write the pin shell, but you can, like, write like a CSV file of all the parameters you want to use, and then give it as to the code itself to determine which line it should work on. So, like, this thing here, yeah, this is using bash to find a single line in a file, but doing that in Python or R or whatever it would be probably better. Like, I remember back when I did some work, I basically had a Python module that would define parameters for any given run number I'd have. And then in this case, I'd run an array job and just say array index, like, 50 through 60, and it would run that generation of runs. And then would add in next 60 to 70 or whatever as I'm refining. Okay. Yeah, in one code I used I had this basically like the array index. The array index can run from whatever number to whatever number. Like, there's no there might be some limit, like, I don't know, maybe in the billions, I don't know. But basically you can put arbitrary numbers there. So what I had was that I had like running from 4,000 to 5,000 and then it would like divide the number by 1,000 and give it like a decimal. So you can use you can calculate straight from the array index some floating point if you want to like, you can do all kinds of trickery with this, but it's all up to you how do you want to interpret the array indices. You can also go the array you don't have to go like from 0 to 100, you can go from 2,000 to 3,000 with the increment of 10 or something I think that's sort of like here. So it says you have an array and then you do each array becomes 100 other chunks or something like that. So if your jobs are too short then you're like split them up more. Yeah. In the HackMD there was a good question. What was like, what is the this like, what is the optimal size of these individual jobs. And that is like usually something of an hour or something like that would be Yeah. It depends on the system, but the recommended is something of an hour. So let's say you have a job that basically takes something into into it and it you need to do like, let's say 100,000 parameters and you need to each of these parameters takes only like minutes to run. So you have this kind of a problem where like if you put 100,000 array jobs each of them take a minute to run there they will clog up the queue and it's not fun. You get lots of output files it's nothing it's not fun. So what you might want to do is let's say put 100 jobs in a job like 100 individual like iterations into a one one array job and then you have 1000 array jobs so you have like 1000 times 100 parameters and then you have 1000 array jobs that run a chunk of 100 each and in the example there in the documentation there's an example how you can like do this chunking like you have a for loop within the array job itself. So if you are if you're facing this kind of situation where you have like a huge number of parameters that you need to go through and all of these iterations are very fast you can clump them up together into like chunks. So basically like instead of each array representing array index representing each like parameter it might represent the hundreds of the parameters so let's say the zero both means that you run parameters from zero to 99 and array index one means that you run the hundreds parameters and so forth like you might do all kinds of things like this to get stuff done but basically the idea of the array job is that if you have something that basically you would copy paste your code you would copy paste logic of the program like you would just need to do it a lot then you can use an array job and why this is better but let's say you have something that is like the example I presented yesterday the real world example where it takes four hours to go through all of these different like GPU feature extractions like you can do it in less time by running it across multiple GPUs because each of these feature extractions is individual process so you can just make an array job out of it and you run one feature extraction per per job okay should we go to exercises maybe there can be 20 minutes for exercises and 10 minutes for break does that sound good so let's see so we have two basic well one exercise where you're trying to run this Python memory or the memory user with different parameters something where you're mainly thinking about it maybe you can put the answers in HackMD so how it works there and then several advanced things you can play with including if you want to try something you can do basically like we said above so iterations with Pi and then save them different files and then combine them yeah maybe we should have like 15 minutes for exercises and 10 minutes for break and run the exercise one that is the actual kind of exercise where you write this kind of like more advanced structure you can choose of any of the example structures that are in the page to use like all of them can handle this program or this problem so if you try the exercise one and then in the HackMD we can have a discussion like you can ask let's say if your problem would be something that could be parallelized with this array structure like can you split your problem into these array jobs then we can answer in HackMD so that we don't get too much over time so we do okay like exercise one if you do it in code and exercise two let's discuss in the HackMD so array exercises until you said 15 maybe we could say oh oh then is this good yeah okay so see you in about 25 minutes then yeah and for downloading the Pi file if it's in that section description so it shows how to get it here under in a git repository so um see you later welcome back well I guess we're the ones that are back not you all uh yeah so uh I guess we'll go over the first array example so Seema would you like to demonstrate yeah sure okay it is your screen now yeah so let's check the first example here uh just a second so it was that we take the memory hug and we run it with different um different kinds of values of memory requirements so I'll open array .sh let's name it like that for example and I'll do the usual liturgies hope it doesn't cut yeah so we we want to have some memory limit for a job let's put um let's say 500 megabytes let's see how it performs and let's add um 10 minutes okay yeah and let's so we had we want to do from 50 to 100 500,000, 5,000 so we want to do an array of 5 so from 0 to 4 and we can choose now which uh which way we want to do it like which kind of a structure we want to use and for this kind of like we have such a small number let's use the case structure for example so so for case I will put from array task id in 0 let's put let's put an environment variable mem with 50m and then I'll put another one mem what were the numbers 50 100 1000 500 as well yeah and then we have this ISAC or case backwards close the thing yeah so that's correct yeah so then we can so good thing to remember with these kinds of things is that like I don't personally remember these myself I always look them up because who has time to remember or who has the capacity to remember all of these the main thing to remember is the concept of an array job that you can do this mapping so then you can you can do this kind of a thing so I had it in hopefully in this place yeah so usually you have like I mentioned in the first day that's argument oh yeah that's true yeah good point so we need to use that so does it require an option or I think maybe not we can look at the serial jobs because who wants to trust their own memories and let's look at the syntax so it was actually was probably an interactive jobs without any syntax so yeah like so now we give it the main parameter from here I will also add here like echo requesting mem memory amount amount of memory so we get a bit more like should you use double quotes so that way the variable will be substituted yeah well then I can yeah there's a few ways of writing the same thing because in bash the single quotes would mean that the variables inside are not substituted yeah but I omitted the mem but it doesn't matter there's different syntaxes for different things yeah different ways of writing things and everything as long as it works it's fine so let's submit this let's see how it goes so let's see if I can catch it before it's okay it's now pending and it's already run so what we got was because we didn't specify the output while we got four different outputs here so let's go through them one by one so the one tried to use 100 megabytes second one 500 megabytes third one tried to use 1 gigabyte and it succeeded and the fourth one ran out of memory because in this job the memory requirement was 500 megabytes for all of these each of them had a limit of 500 megabytes but the last one asked for 5000 megabytes it ran out of memory the third one which also had higher memory requirement managed to run because of leniency from the cluster side but it might have failed for a longer job so don't do that add up like a correct limit so this is the basic structure of agate jobs and it's like the sky is the limit with these like you can you can also mix and match with the other parallelism that we're going to be talking about but the basic idea that if you have something that you need to do like a lot many times and all of them are independent you can do it with another job yeah okay so if you have any further array job questions please oh there's an interesting thing in HackMD here let's let me select it this one so here's someone who used the array index as the memory directly so yeah I mean this works like there's nothing wrong with it it might start getting a bit complex when you have memory amounts too high or something but I mean actually a correct well I guess your goal is to think of something clever and this is clever and correct but there's also like a point that we probably didn't mention about when we talked about serial jobs that it's good idea usually to write serial like the scripts in a way that they are easy to read because that way like also the execution part of your workflow will be documented so basically if you write the serial jobs and you write these array jobs and you write them in a readable format and maybe add some comments here and there it's easy to then afterwards look at the code and see how do you run it again like how do you run it again and so usually it's a good idea to write them as code treat it as a code and make it readable that way it will like automatically document itself for you and you can even add it to like the run scripts that you used you can add them to the git repository that you have maybe that way it's easy for other people to replicate the results they just need to run a similar kind of a job with similar kind of resources so there's nothing wrong with the example there is pretty nice but I would add at least the memory and time stuff there becomes more clear maybe of what's happening yeah really nice solution actually really interesting solution so let's see next up is parallel stuff