 Hello, hope you're back from a refreshing break. So now in this next section, we'll continue where we left off in the previous one. And previously we were talking about parallelization and these cooking metaphors. Should we make the note about notes.code refining? Oh yeah, that's true. Yeah, yeah. So yeah, before we start, so good mention that was done in the notes, the collaborative notes. So previously we were using this service called HackMD for these notes. Now we have our own server running this. The technology is pretty much the same, but sometimes we sleep up and we call it HackMD when we should call it collaborative notes. And so that's on us, but you could just get so used to the name and that you start using it all the time. So when we are... It's conveniently core short. Yes, so now we'll try to say notes all the time when we talk about the collaborative notes. But sometimes if we mention HackMD, that means the collaborative notes, we try to keep the syntax more clear so that you know what to refer to. But these collaborative notes, these are the main place to ask questions. Yes, so immediately forgot it. So yes, but back to the topic of this session. So we were previously talking about pots, pans, pasta, all kinds of stuff. And the question usually comes when people come to HPC clusters and we are talking about these parallelizations and all sorts of things. Is that people widely might not know what their program does or how big their program is. Like what is the size of the program? And usually when people come and ask us what size is the program? And if they don't know what is the size, we try to give these kinds of handy tips that can be used to estimate the size of the program. So you don't need to necessarily know exactly what the program size is, but you can give ballpark estimates. And these usually make it a lot easier to decide on what sort of resources you're going to be asking. So we'll talk about resources in a second, but first of like, why should you care? Well, there's a couple of things. So first off, the cluster system or the HPC systems that we've been talking about, they're shared systems and they're shared based on your users. So how much you use depends or changes how much resources you get in the future. They sell like a cutoff point of like, it usually calculated like a few weeks or something that your priority bounces back up. But if you use a lot of resources at a certain time, your priority in the queue with respect to the other users will start to drop. And the bigger drops you request, the more it drops. Yes. So the more resources in total you request. So getting a sensible size for your jobs and a sensible estimation of how much resources you need is good for future use because otherwise you might end up waiting a lot longer than would have been necessary. Definitely. So this is the, let's say it helps you. So first off, the second thing is that it can help you organize your work. So if you know that how long something takes, you know that, okay, I can make this in one week or I can do this in one month. Or if you notice that, okay, if I calculate this, it will take me like a year to calculate, then you know that, okay, maybe I need to do some optimization or modify the code a bit so that I can do it. Or try to run it in parallel. Yes, yes, to parallelize it like, or you can also notice that, okay, this part, it doesn't really matter as long as I get it done once and you can let it be inefficient if it doesn't matter in the grand scheme of things. And the third thing is that like, if you have some estimate of how big your program is and you suddenly, you realize that the program doesn't match that estimate. Like for example, if you estimate that the pasta will take eight minutes to cook and suddenly the pasta took 60 minutes to cook, you might wonder like, what's happening? Or like, what was wrong? Like this didn't match my expectation. So maybe the burner was such a low level heat burner that it doesn't boil the water. So it was like lukewarm water and the pasta was- Or your pasta was a lot thicker than you expected. Yeah, something like that. So it might be related to the data, it might be related to the problem at hand or it might be that you're using like a really crappy burner or you didn't, like you, instead of using all four burners, you only used one burner in, so this is quite common. So you might, there might be some like change in the, like you didn't request the resources that you were using or planning on using. And in these cases, having some estimate really helps you get like, get something that you can verify then based on the results. So it's always good to estimate first and then modify the estimation based on what sort of things you get. It can also help if you have, for example, a small problem that you're running and you have a slightly bigger problem that you're running and you expect that this is skating linearly. But suddenly the time that it takes is more like an exponential function of it. You know, you can easily say, okay, well, I seemingly do have some exponential runtime in here. So a really huge problem with the same program probably won't run because it's just, there's just not enough resources for it. Yes. And then you can decide on better algorithm or something. But now Thomas, when we talk about resources, what resources are we talking about really? Well, the two main factors on the cluster are CPUs and memory and GPUs if you need them. So how much CPUs does a program use? How much memory does my program use? That's kind of the questions that, all the resources that are most, well, that the cluster essentially limits. Yes. So like the amount of burners and the amount of like, like these kinds of things you cannot really change because they are hardware dependent. And then there's the time dimension that how long do you want the cluster to cook basically? So you can reserve the kitchen for an hour but the cluster takes eight minutes to cook. And then like you had a reservation for an hour so it's going to be empty for the rest of the hour. So these kinds of like estimations when you estimate time memory and CPUs and GPUs, of course, if you're using those, it's good to know what resources you're going to be needing. And one mention here, while if your program ends earlier, you will only be kind of built for the time that you have used to actually be running, your runtime is probably the most important factor because the manager tries to push you into the right place and for a very long running program, it just might not find a good place for it for quite some time. So the queue will estimate your time based on what you tell it. We'll talk about how you request this later. Let's try to get an estimation like, okay, so how do you estimate like CPU and RAM size? So the first good measuring stick is to use your own computer. So that's like, have you run the program on your computer? So that's quite simple, like estimation. So, and for these, like there's some ballpark numbers, you can say like not every computer is the same, but they are about like a modern laptop is about like four CPUs and 16 gigabytes of RAM. That's about like the ballpark. So you can use that an estimation of how big is your laptop and the desktop computer might be like double that. So it might have eight CPUs and 32 gigabytes of memory. Like if we're talking about like performance desktops in universities usually. But so this is like this kind of ballpark thing that you can say that, okay, like, if it fit into my laptop, I know that it fit into these resources. So these, well, like if it didn't run on my laptop, then you know that, okay, maybe I need more than this. So you can give like, you can get a measuring stick and like it's kind of like range of values that you can fit the job into. And you might even know why it didn't run on your laptop. It was it, was it, you ran out of memory or was it it took just ages? Yes. In the first case, well, you need more memory than what your machine has. In the second case, you probably need to check how many CPUs did it actually use? So did it use more than one? And if it did, then you can think, okay, well, maybe requesting it more CPUs than my machine has will speed it up. Yes. And like to get like another estimation of like, how big are typically like the compute nodes in the clusters? They are usually like about eight laptops. So they have like to up from 32 to up to 128 processors. And they have from like 128 gigabytes of RAM to like 500 gigabytes of RAM. Those are like the typical compute nodes that we're talking about. So like, you could say that one server is about two desktops or like... Four desktops. There's four desktops. Yeah, there's a mistake here. Yeah, four desktops or eight laptops. So yeah, that's about the ballpark numbers that you can use to estimate. But if you want to get a better estimate, you could use your task manager. So I'll be showing like the task manager here in my Linux laptop. So you can see that there's like a constant usage based on Zoom and all of the other programs, Firefox and everything that I have open. But now that I'm running, I'll run a small sample program that we'll be using in the cluster later on. So Thomas, if you want to explain what happens when I run this. Yeah, so essentially what you're seeing the top panel is that there's, well, initially one and then a different CPU starting to run. So essentially what this shows is there is one thing, one CPU that is being used here. There's never any additional CPUs that are being used. This is a program that only makes use of one CPU. So if this would be, even if you want to run this to even more trials, it doesn't make any sense to add additional CPUs because it can't use them. Yes, so this is quite common. So even if you get like the first ballpark estimate like, okay, I will get all of what my laptop is. I will get that from the cluster. We get a similar amount of resources. That is the like the first ballpark you could use. But the second estimate, like if you want to improve the first estimate would be to check like what happens in your laptop or your machine while you're running the code. So this is a very simple way of doing that. You don't need any profiling, anything like that. And one other estimate you can use is that does your laptop get really hot or does it start to like, do the fans start to wind like constantly? So if it does, like if you cannot hold it in your lap anymore that might be suggesting that it uses all of the resources in the laptop. But if it doesn't, if you can like still open browser windows and something while your code is running, you, it might be that it's just running something small in the background. But like using these kinds of... We're using one of the CPUs. Yes, yes, because it can cool itself off. But like using some of these kinds of estimates, they give you like a ballpark where you can start from. Like these are very simple tricks that you can use to like say to yourself, okay, like this is about the size of my program. So now that we have talked about like the RAM and the CPU sizes, we could talk about like, how should you should think of them in when you are going to the cluster. So usually the cluster itself thinks of these slots. So like every, like if you have some amount of memory in a computer and you have a amount of CPUs, what the Q system itself thinks about this, these kinds of like, how big of a like a block of memory and a CPU is needed by a certain job. And it thinks of them as these like blocks. And nowadays, like based on like economical reasons, so it's not like a nature law or anything like that. The slot size is about one CPU per four gigabytes of RAM. So this is about like the size of a, like a typical like this kind of a unit. Yeah, you need natural measurement unit in the cluster. So your laptop would be four, your desktop would be eight and the server might be like a compute server where we will be running the course. They will be like 32 slots. So this is something that you can use to like think of your job in terms of, so it's like a meter of a cluster. So instead of like how much you, how long something is, you can think of like what is the slot size basically. And if your job is something that it needs, for example, for CPUs, but it only uses one gigabyte of RAM, it still needs four slots because it needs four CPUs. But if your job is so that it uses one CPU and 10 gigabytes of RAM, it needs three slots because it needs to have memory more than eight gigabytes and less than 12 gigabytes. So it's like, it needs about three slots. And this is important when you're talking about like this parallelism within your jobs. So because you are already requesting this amount of resources. So especially if you're using a lot of memory, it's also a good idea to try to parallelize the code or reduce the memory requirements so that you can either reduce the number of slots that you need or you can use all of the CPUs allocated basically for you. Of course, there's some jobs like, like even if you request this amount of memory, there might be the queue might be able to fit like a job such as these to use few of those CPUs, but it makes it harder for the queue. It makes it harder for the queue to fit stuff into the cluster. So it's about... What I would say is don't worry if you have something that only has one CPU, but you need quite a lot of memory, request one CPU and quite a lot of memory instead of requesting 10 CPUs and 40 gigabytes of memory because it still leaves the resources open. They might not be usable, but they still might be. It depends on how other jobs are running on the cluster. So don't be afraid that that's something that doesn't fit to these slots will mess with the cluster. It's just something to... That's kind of the natural unit for the cluster that you can think of. Yes, this is a very good addition. So now that we have talked about the RAM and CPU, let's talk about the execution time. So what Thomas already mentioned is that a good measurement stick would be to use, again, your own computer. So if it took you some time to run on your computer, you might assume that it takes the same amount of time in the cluster computer, but sometimes this isn't the case. So Thomas, do you want to explain why it might... Essentially what I would say is most modern laptops or desktop machines have slightly faster CPUs than the cluster. That's partially because sometimes some of the cluster resources might be a bit older and also because the cluster resources are more for efficient... Yeah, for also energy efficiency. And the fastest that you can get is commonly not the most energy efficient, but the advantage of the cluster is it has more. So it can do more stuff at the same time. So you might experience that something that ran an hour on your machine takes an hour and five minutes on the cluster or even an hour and 10 minutes. Yes, but the main benefit, of course, is that if during that time you burned your lab while having your laptop in your lab, if it's running on the cluster, it's not burning your lab anymore. You can work on. Yes, so you can transfer it into energy efficient CPUs that then do it on the cluster side. But if you haven't finished what you wanted to do, like if you started to run a program on your computer, but you didn't manage to finish it, it took you like a day, like it's going on and on. You don't know when it's going to end and you just want to know. There's also tricks to do that. You can estimate the runtime in multiple ways. And the easiest way is to check if your program is this kind of like iterative program. So if your program does something more than once, so like for example, like physics code, they usually have you might have agreed or something and you integrate like some physics equations in time steps. So you do like one time step at a time until you have done all the full of the time steps and you read some end time. That is like a iterative thing or you might have like a Monte Carlo simulation, sorry, like a Markov chain simulation where you do like you bounce from a state to a state and you go through and select something. Essentially, yeah, and every time you calculate roughly the same equations or it will take roughly the same time for each step in your chain. Yeah, and like deep learning where you do training in epochs, like you do like this kind of epochs. So anything that does something more than once in those kinds of situations, you can only run a few iterations and then you can just like estimate that it will just be like you can multiply with the number of iterations as long as you know what time it took to run one step. So most likely these kinds of programs, it takes some time to start up, which might be different than each, like the iteration length, but if you let it run for like, I don't know, like 10 iterations and then you divide the runtime by 10, so you get about the time of one iteration and then you multiply it by 10,000 or whatever is the overall iteration time you want to do. Then you can get an estimate of, okay, like I expected to run this long. And this of course applies to a situation where you have like multiple parameters or multiple different data sets you want to go through. So for example, if you need to, if you need to cook 10 different kinds of pasta, if you cook one kind of pasta, you know that, okay, it's probably 10 times what it took to do this one pasta. Of course they might be like in some cases, the analysis times might differ based on like data set of parameters or something, but usually you could estimate it like this. Tim, there is a question in the notes asking what, or how would you estimate the runtime for a process that you don't know how many iterations it will take? Like a breadth-first search in a graph. Hmm, that's a good question. That's a very good question, interesting. But I would say for that search, for example, you have an upper limit, you have an upper limit and that is going through the whole graph. So if you take that as your maximum number of iterations, it's very unlikely that it would need that much, but that could be your maximum number of iterations. And also I would say that often, like you have some sort of like a tolerance or something calculated, like you have a tolerance, like when do you stop like a stopping condition? But you also have like a hard code at maximum, like when do you stop? Like if you haven't found a solution, let's first like stop here because this is not going anywhere. So you usually have like, let's say in the code, if you have like a wire loop, it's usually a good idea to have some sort of like, okay, like get me out of here if I get stuck on this infinitely. So if you have like a bug or if you have some sort of like a problem that the optimization problem that doesn't have a solution or something like that, or you can't find it in like reasonable time, in those cases it might be good to the program itself add some sort of like a fail-safe and so that you know that it won't get larger than this. But of course, like it's very hard to tell, it's very hard to tell beforehand, but you can still calculate what it takes to do one iteration and then expand upon that. Like you can assume that the maximum number of iterations that I'm willing to wait is like 10,000 and one iteration takes like a second. So the minimum maximum is like 10,000 seconds. I think we should go on because we are heading towards the end of the time. Yeah, so the last section, last thing to mention is that you can also, like when you're talking about programs, if you like this question that was asked, it relates to this as well. So if you have a problem that you don't know, like you could run, let's say, the easier problem on your laptop and you want to run harder problem in a cluster and you don't know how long it will take on the cluster, you can usually try to like calculate the ratio of the easier problem to the bigger problem and then estimate based on that. So for example, if you have like a matrix calculation thing that uses matrix calculations, you might like if you solve with the n times n matrix and the problem you want to solve is like n times m. So in this case, it would be like four times bigger this problem. You might want to estimate that the runtime is like four times as big. So in this case, because the big problem is as big as like this for smaller problems. Of course, this is not really what will happen. So because the algorithms and stuff like that, they don't necessarily scale in this sense, but it's a better estimate than having no estimate at all. Because then you can add up the estimate. Like it's still better to get some sort of estimate. Like if something happens, what is my assumption, what will happen? Because then you can refine it and you will get more intimate with your program. You know it a bit better because you suddenly got some more information. And it's very important, like this graphic example, it's very important to spot if you have a problem that is like a traveling salesman problem, which is like or exponential scaling problem that you have permutations and you want to like a million permutations and that sort of thing. You can easily get into a situation where it scales like you suddenly get like millions and millions of parameters you need to do. And suddenly you cannot anymore solve it. It's just not computable. So in those cases, there are ways of doing it, but it's good to like find out if your problem is such a problem. I want to mention I think one last thing. The resources that you request on the cluster, the hardest barrier kind of is the runtime. If you go over your runtime, the job will be killed and it will just end. You can't really go over the number of assigned CPUs. If it could use more CPUs, it would probably run quicker, but as long as it has CPUs, it's just running and it is doing something. And for memory, the cluster is relatively flexible. Essentially what that means is as long as there is memory on the free memory on the node that you are running on, it will try, it will give you more memory unless there is, until it reaches the point that there is no memory left, even if it's over your requirement. And then it will essentially look, okay, which job is violating their constraints the most and that job will be killed. And I'll mention that we are here talking about the general first estimations that you can give, but we'll be talking later how to monitor, after you have run this first estimation run, how do you then monitor and adapt on that estimation? We'll be talking about that and also about how do you actually request these resources. But this is mainly just to give a motivation that you should constantly have this kind of feedback loop of, okay, based on what your jobs did, what are you going to do in the future? Because that will help you and that will help, yeah, that will help you and even to help everybody else as well because the queue will be more compact. But I think you're essentially finished here. You can read the whole text again if you're interested. Should we switch to HackMD to let people see the kind of- You mean to the notes? Or the notes, yes. Okay, I'll show here. Yes. So as you can see this time, there's lots of different questions here. People asking about what happens if you underestimate the resources, what happens if your job is killed? For the most part, you can read this yourself. I think everything is answered pretty well there. I think the last question that's on there will be discussed in a bit more detail. When was the data? Which one, the data? Yeah, tomorrow, we'll talk about data storage and how that affects things. Yes, as these questions sort of show, there's a lot of different things that all come together here and make this quite a- There's so many moving parts in doing HPC kind of work. Also, one thing that I wanted to mention was what Simo and Thomas were talking about now, is that it is largely like an empirical question that you try out things. And first you try out on your own computer or your own laptop, and then you go to the cluster and you try out things there and you kind of check that, like, I tried this, oh, it crashed. Maybe if I add a little bit more memory, oh, now it didn't crash. So yeah, it's, in that sense, it doesn't, you don't have to be too scared about trying out things and because that's how it is. What I would say is if you have jobs that are relatively short, let's say an hour or less, I would be a bit more conservative than if I have a job that I know that takes four days, then I would probably just add a bit more memory, for example, just to be on the safe side of that. Yeah, okay, I'm requesting a bit too much and I know for the whole time and I will be billed kind of for it, but I'd rather have that thing being done after four days than after three days crushing, being killed because it did go over memory. Okay, but I think we are giving the floor to Damon and Richard.