 Parallisuus understanding, different methods explained. So maybe the first question, what does parallel in general mean? You've been saying this a lot. Parallel in this case means that in general, it means that we're doing more the same thing at the same time. Yes I guess that's the whole point here so. you don't want to run it basically with the power of your computer. But you want to run it with the power of hundreds of your computers. And there's these different paradigms here. Well first if we look at our roadmap picture here. in this part of the course we're looking at the additional screen or actually just a little bit more of its background. Well, first, if you look at our roadmap picture here, in this part of the course we're looking at using multiple of these CPU nodes together. Yes, so we'll talk about the different ways of working so that you can get your code going faster. At this point it's good to iterate that doing something faster or more efficiently or something, it can be faster in the sense that you get all of your work done faster. So you get a bigger throughput. We are often talking about throughput, which means that you get all of your stuff done faster. But it doesn't mean that individual things are done faster, but it might mean that all of your things are done faster. And then there are the situations where your piece of code runs faster, so you get it done instead of ten minutes it takes five minutes. And that is faster in the time sense. But there's also this idea of like you get all of your work done faster. And we'll be talking about this in detail when it comes to these paradigms. But maybe you should explain all of these different ways of working. So before you explain them, what choices do people have? Like do people choose which one to use for a code or is it sort of already, if you know what code you need, it's sort of part of the code. So that's an excellent point. So you have some choices. So for example, you can choose what code you use, you know, often. Some of these ways of working, for example, this embarrassingly parallel way of working it or data parallel, also known as data parallel. It can be used with whatever code you use. But in some, in many cases, you are limited to what your program can actually do. So different kinds of programs like I mentioned the word salad, like different words with different paradigms. So different programs work with different parallelization paradigms. And if your code doesn't use those, it cannot use that way of parallelization. So you need to like pick up on these patterns that okay, like this my code talks about this kind of stuff. So it means that it uses this parallelization scheme or idea. And this is what we are going to be talking. So for example, your supervisor says you need to use this code to run stuff and you look at the code and it's MPI code and then well, you're using MPI. And if you're writing your own code and deciding which to use, that's another level that we're not really discussing today. There's plenty of other courses for that. Yeah, if you're writing your own code, of course, you can, based on the problem that you're trying to solve, you can choose which method might work best and maybe we'll go into that in some point. But yeah, not really, it's a full kind of worms that we probably don't want to open. So what are the main paradigms? So first is embarrassingly parallel, which you'd already mentioned a little bit. So yeah, well, let's look at the picture here. Yeah, the picture is the same one. So if you scroll a bit down, you can see both of the pictures. Yeah, so here's two diagrams that show like the first diagram basically shows how the code would be situated in the cluster. And the second paradigm shows a bit more like this kind of an overview diagram of what's happening in one slurm submission. So array jobs or these embarrassingly parallel jobs, these basically that you're doing like two things at once, but with different inputs. Like if you think about like you have like you'd need to do like like Richard said, Richard had to process multiple videos. So he has one code that processes videos and he has multiple video files that he needs to process. So what he can do is he can run each of these video files and process them in individual jobs. And all of these jobs can be run at the same time in the cluster in completely like completely independent of each other. And then they finish well independently. So they're completely independent and they run in their own like on worlds, but they reuse most of the most of the code. So the only difference in each of these jobs when I was running stuff would be each one gets a different input video file. Okay, I think that explains it well. So yeah, we'll talk about array jobs in the next section, but this is like this can be used with any kinds of code. And so basically you run like multiple copies of the same thing. And there's this structure called array in slurm that allows you to easily like parallelize whatever program you have, so that you run multiple copies of the same program with multiple different parameters. Okay, and that's the next lesson, if I remember right. So what's next? Share memory. Yes, the next one is shared memory parallelism. And in this situation, in this situation we have one computer, like one CPU node, where we use multiple processors at the same time. And the shared memory parallelization means that the different processors communicate with each other and they discuss using the shared memory of the node for doing stuff. So your laptop, for example, it's a shared memory parallelization, like it's possible to run shared memory parallelization on your laptop, because your laptop has like memory inside of it, this ROM that we were talking about yesterday, and it has CPUs, multiple CPUs. So if you're running, for example, like I don't know, like NumPy code or R or Matlab or whatever program on your computer, or even now when I'm running like Zoom, Zoom has multiple, like these kind of threads running around them. It's doing multiple, multiprocessing all the time to run, or let's say Firefox, if you start a browser, it starts multiple different processors at the same time. And all of them communicate through the memory of the machine. And in these situations, you need to reserve, like one of these, your job needs to reserve multiple processors for it, and all of these processors then communicate with the memory. And different programs, most programs or many modern programs utilize this, and it's quite a simple way of parallelization. Like, for example, last night the video encoding itself, not the subtitling, that uses multiple processors. So I was using my desktop, so it has eight processors, and I'd run one job that distributes among all eight of them using shared memory. And that speeds it up fairly well. Yeah. So basically, like, if we return to the, sorry for the hungry people out there, but cooking analogy earlier, like, let's say you want to cook like a full meal on a stove, you need to, at one burner, you have a pot full of sauce that you're making, like your bolognese sauce or something like that. And on another pot, you're boiling the pasta. Like, these are not independent, because both are important for the meal. Like, if you do only one thing and you only have the pasta. They have to communicate somewhat. Yeah, they have to communicate somewhat, but they might do different things. And then you combine the things together. So this is kind of like idea of what's happening here. So let's probably move forward. Okay. Move on to MPI parallelism. And what's the example here? Yeah, if you scroll a bit, yeah. So in MPI parallelism, so you might have heard this, let this MPI. So it stands for message passing interface. And it's this kind of technology that has been decided upon by people, scientists, so that they can run these like a large scale computations. So if you think about the supercomputers running like some weather model or something like that, they're usually running MPI codes that enable you to communicate between multiple processors at the same time in like across the boundary of the computer. So in the first diagram, we can see that there's like two, the reservation here is spread across two different nodes, but these are not independent, but instead MPI programs usually work as this kind of like a collective. So they communicate with each other and they are all running the same program, but they all are running their own part of that program, and then they communicate with like usually neighboring tasks or other tasks, so that they can like have a discussion. Okay, where are we? And they usually run as this kind of like a collective. So let's say like a weather model might be split upon like cubes or something, like each processor. Yeah. Or yeah, let's use the weather model example. Yeah, so each processor or each of these MPI tasks will run its own like own cube and run what's inside there. Yeah, like just to model the weather for the whole planet, the weather in Finland doesn't affect the weather in Australia, so they divide up the world into let's say several tens of kilometers squares, and then each square only has to communicate with the squares adjacent to it, and it can scale up very large that way. Yeah, maybe I would say that it actually affects, like this is like if you know like if you have like chaotic processes and that sort of things like to make it so that it affects, like let's say a weather in Cuba affects the weather in Finland, like you need to have all of them running in the same model, like you need to have all of the stuff in the same like simulation space, and that's all of the workers need to run the same model at the same time. And this is kind of like a paradigm that if your code doesn't use MPI, it doesn't use MPI, right? Like you write programs from the ground up using this paradigm usually, like you don't like decide to use MPI just for the kicks of it, like if you already have like a program, you usually build the program like using MPI as kind of like this base layer that clues the whole thing together. Yeah. And I guess given the power of processors, MPI isn't as often used unless you want to scale to truly massive sizes of stuff these days. Yeah. Like once the last time a big MPI code was written. They are written all the time, like they are written all the time. Yeah. But it's like nowadays they're talking about the extra scale and that sort of thing and it brings like different kinds of problems. Like if you think about, we'll hear about the Lumi Supercomputer for example today from CSC people. And in those scales you need to really think about how the communication happens because you might have thousands of computers working together. But let's move forward to the... Yeah. Moving on. Next is GPU. And this is actually someone asked the question how about... How are LLM's trained? And that's this. So what's the GPU thing? So yeah, so that's like... Yeah, I'll first answer the GPU and then I'll return to the LLM. So GPUs, like GPUs are graphical processing units. Like the parallel execution is a bit different than like the parallel in that we previously mentioned. Like previously parallel execution was done by processors, like CPUs or central processor units. But in GPUs you have this one card which has this like massive amount of like small calculators inside of it called like GPU cores or CUDA cores or computing units or something like that. And these can each execute their own piece of the program independently. Like they usually work like... They all work with like different data or something like that. Sometimes they share data but they usually run their own like program inside the GPU that is written specifically for the GPU. And the CPU part in these programs, it usually like sets up the whole thing. It reads input and output and that sort of thing. It creates like the data that the GPU needs to have. It moves that data onto the GPU memory. Like the GPU has its own memory. And then these like thousands of... We previously in the past analysis we used the cheese grater analysis. Like basically there's like a million little holes in the GPU. Thousands of holes that each individually creates like a strand of cheese when you have a cheese grater. And this is basically what happens that you have like these, a lot of these. And when we're talking about the large language models for example, these are so big that they don't fit into one GPU. So in those cases you usually have like MPI plus GPU grating. So you have some sort of communication. Or there's other frameworks like NCCL. It's hybrid down here. No. But basically you can mix and match these different paradigms. But you really must be certain that your program uses these. Like quite often for example, when you're doing stuff with GPUs, we'll talk about this later in the GPU part. But you usually want to have also multiple CPUs because the GPUs are so fast that you really need to keep them occupied. So you usually need to have multiple CPUs doing work just to keep one GPU doing stuff. So you can mix and match these different paradigms. But these are the main like different like parallelization schemes. And you should be like you should check on your program which one it uses before you start doing. And also like yeah, maybe we should quickly check on like the, does my code parallelize? Does the code parallelize? Yeah, I like the figures you made here. Yeah, so yes. So if you have a program that you currently have and you want to parallelize it, you need to think about like okay, but how much benefit will I get from the parallelization. And each program has some part in it that is serial. So this is like a computing like fact. It's part of the whole computing thing. So all programs have some part that cannot be done in parallel. And you should think about like okay, how much benefit I get from parallelization if I do it. Like if I for example here in the first diagram, we have a program with a really small part in the code that is done in parallel. But most of it's done in serial. So even if you do the parallel part with two CPUs, you don't save a lot of time. But if you have a program that has like a large parallel part, you might save a lot of time. So you should like throughout the whole thing, we'll talk about this, but you should keep this in the back of your mind as well. But okay, like this is a massive amount of words. Do you do an exercise where we talk about this? Yes, okay. So this has been massive amount of words. And like don't worry about it. You don't have to yet understand all of the different paradigms. We'll go through them in order to read actual examples that you can try out. And how you ask Slurm to give you these resources. And I'd say, yeah, like one of the most annoying things is you get a new piece of software and it's like okay, I want to run this on the cluster. And it says this uses multiple CPUs, but you read the documentation and it really doesn't even tell you how it does parallelization. It just assumes it has a computer and runs magically. Yeah. But I guess not our problem. Or something to talk about later. And again, come and talk to us if you have questions. Don't try to do it yourself.