 But now for something completely different, let me check if my full screen slideshow works. I think it's looking fine on Twitch. So now we're gonna get an Italian and a German trying to cook pasta together. It's not the beginning of a joke. You can guess that I'm the Italian and Thomas is the German. And usually, as it came up many times, Germans apparently they break the spaghetti, which is a big plus for me for Italians. But you can get it into a smaller pot. It's okay, just exception. So in this other motivation are somewhat funny. So we tried to introduce the concept of parallel computing using kitchen, isn't it? So my task is to cook pasta for four people. And the ingredients if you want to make it to pasta for four people is 500 grams spaghetti. Hopefully you have a mother from Italy like I do. And we have some amazing ready-made tomato sauce that she made. Four liters of water, even though it would be five liters of water, but we care about the environment. And then, of course, we need some two. We need the stove, like we see here. Stove at multiple burners. You need at least one burner for warming up the water. And then, you know, you need the pot to keep the water and the spaghetti tends to steer the spaghetti. So the algorithm that I see for doing this is that you boil the water for five minutes. Add the salt quickly, and then put the spaghetti in water and then cook for eight minutes and stir every now and then. So, you know, for those who missed the joke, the metaphor, sorry, I'm not joking, metaphor. Then we're talking about here is that the chef, the cook, the person is the process, the computing process, you know, the managers, all the virus resources. And then the pot of them in computing is called the threat that, you know, in this case, I have one pot. So I can only run one thread, one computation per time linearly, serially. Then there's the kitchen, which is the node or the computer where, you know, like the laptop, for example, where all these components are stored inside. So where there's one or more CPUs with, for example, in laptops after you have four, four CPUs these days, if not more. Then there's the burner, which is the single CPU. There's some water, which is the memory and the pasta in this case is the data that we need to process. All right, so now what the main key thing is about time, making the pasta with big time, and especially specifically for this type of pasta. Every pasta is a different amount of time, you should always check it in the envelope. For this type of spaghetti, it will take eight minutes to put them properly. So now the question is, if my store in my kitchen has four orders, would I make the pasta in whatever time divided by four weeks? You know, wouldn't I parallelize the pasta? And these yellow boxes that you see there in the bottom are the quantity examples. So, you know, you go to a cluster that has four CPUs on a node, you know, but just throw, you know, I'm unused with my laptop that has four CPUs. Can I just move my code there and suddenly run 10 times faster? Well, the answer is no. So multiple codes at once. Well, nonparallel. So, even if we're going to try to, you know, cook the pasta, you know, let's imagine that the task would be that we would need to cook four times the amount of pasta. So then, you know, I would need to have four pots, four boxes of spaghetti, four of them are the sources, these are things that we can buy. And your laptop ready as four burners ready to be used. But me, myself, the process, I don't know how to take care of more than one pot. I never learned how to take care of two pots at the same time. So unlimited. What I can only do is that I can cook one pot of pasta, wait that it's over, repeat, you know, so to cook pasta for four times four for 16 people, it will take me four times the total amount of pasta. Unless your code specifically knows how to use multiple CPUs, it will not get done fast. And it's actually not, not so easy to learn how to use multiple pots at once. At least in the competition sites because your code commonly, well, does not do multithreading initially. Unless you're doing a really different task that you're, you know, cooking some sauce that maybe you can let rest for a bit, and then taking care of the pasta, you're not. I don't know if I would actually be able to cook four pots of pasta in one. So let's assume that I go to cooking part of the school and I learned this new technique, which is the open MP, which in this case is open multipot. Realize open MP is the language is the API whenever it's the system that is used to basically have multiple process running. So now I can put pasta for, I can put four pots of pasta in one because now I learned this, which means I rewritten my code that you can actually see the four CPUs and run four processes independently. We just now have to be, you know, a little bit more careful that our process don't get into each other way. So, you know, if I'm steering one pot, I should be careful that I don't forget the other one. So now I can actually cook pasta for 16 people. It's all in the same in the deep time because it's all done in parallel. So in eight minutes, the past 16 people will be ready. Thanks to the parallelization. This is quite a trivial simple parallelization, the four pots of pasta. They don't really need to talk with each other. As long as we the process I remember that one started maybe even a few seconds later than the other. It's, it's, it's quite a simple process to scale. But now things a little bit gets, you know, a little bit more complicated. Maybe, you know, I'm not really able to manage the four pots in parallel. I could hire, you know, I could hire three more Italians to cook pasta with me. So then the, I need to basically talk with these three other Italians, they're sharing the same kitchen, of course to talk in the kitchen. I guess you can see all the dishes. Both Italians cooking, but what they feel and think company is this a good messaging. Interface, but what was MPI? Process interface. Yes, messaging process interface, but now suddenly, you know, we need to talk with other processes. And what is interesting here that, you know, when we, if we really want to scale things out, they don't even need to be in the same kitchen. They can actually be in the kitchen upstairs in some other kitchen in the same beauty. And we could just pass these messages and okay, now it's time for the pasta and everyone can throw the pasta inside. So by adding more processes for people, we can actually, you know, get more things done once again. Things in general, don't really scare winner. It's not that by hiring 1000 chefs, I will still be able to produce in synchronous way. This, whatever 1000 times 500 grams of pasta, because then the messaging, you know, the overhead and the messaging to synchronize with all the 1000 chefs will be, will be quite, you know, we will take some of the time. Some of the cooking time, some of the sort between the different parts. So I think you think about Enrico calling all the different chefs and having a short chat. Yeah, like how do we, maybe it's easy if one person broadcasts and messages to many others, but it's nice to be interactive and one has to say, hey, my water is not boiling, please everyone. You know, it's, you know, the overhead of this message passing. But now let's assume that, you know, I go to a computing cluster, meaning that I live in a building apartment that has 25 apartments. I can actually scale this pasta 15, 25 times because there's 25 apartments and each apartment has a store with store where there's will do my pasta time again be able to scale so that I can put the pasta in. Well, no, because yes, the cluster as, for example, 300 nodes and every node is 40 CPUs, but I can't just throw my code to the cluster and hope that now suddenly everything goes, uses the 300 nodes and whatever 340 CPUs. It's not going to happen unless specifically by things to make it. So, so for example, you know, this could be the case that I know, or that remote nodes, meaning not just my kitchen but someone else's kitchen. And I could start funding some of the pasta. So, you know, I will have a friend upstairs, for example, another very well given the call when I use your for learners. And then you have a new flat. And so, yes, I could not have a parallel box or pasta. But sometimes the kitchen upstairs might be busy. So if I'm requesting for four points from a friend upstairs actually not every now will actually most of the time pasta and so on. So then I go to the house manager, which is sir. The company is going to slur. You learn a lot about slur in the day to entry. So slur. This sir learn knows everything about the kitchen. So if I can start asking to slur, you know, is there any free kitchen with four burners that I can immediately use. So I might need to wait to learn those that actually the names of stairs in the last floor they're soon done. So if you wait 10 minutes, we can then start your for learners. You know, so again on the cluster, there are so many computer that it would be impossible for you to keep track of them to know which one is available and which one is not available. So then enters the work of managers slur, which basically does this for us to request the amount of sources you request in this case one note, my kitchen with for safety use for foreigners, you will have to wait a little bit if there's not available. So once you get it, we can start the process. You know, there is a certain. Then, you know, there are other special things and special tools that a burner is not made for them. So now, let's suppose the case of the cheese, which is great, which is a great example of architecture. The hardware is actually as parallelization. So cheese greater than that many things at once. That doesn't work for any slicing program. Of course, here the metaphor is that the cheese that is a GPU, the GPU is the graphic processing unit was somewhat similar to the CPU. But the difference is that the architecture, the actual hardware, the actual pieces of wire that are inside there, they're actually be in the realization, which means that it can work much faster than, you know, 10 CPUs or one of the CPUs in part of it. But of course, the code and the program is to be written specifically for GPUs. So for those first, for those friends who want to achieve some of that late, you might want to use the GPU, because if I don't have a GPU, I will need to break the computer or a job to spend the life with a CPU knife. And even take me ages to break the local computer. Now the cheese creator, the GPU greater, you know, suddenly I can write the full on the job log and get all the cheese there. So a GPU can do very many things at once, but only very specialized on the job. You need specific programming skills for specific problems to use. You probably need to adapt your cheese a bit to the cheese straighter if you have one of those cheese grating mills, you will need to cut the cheese small enough so that it actually fits in there, which is essentially adapting your code to work on the work on the GPU. It's an excellent metaphor because actually the block of Parmigiano in general, the most ever seen a guy like wheels, really huge wheels, they laid many, many kilos. There's no way that you can fit them in a small GPU in a small break. You need to break the Parmigiano in smaller pieces so that you can actually hold it and create it. And the same is with the GPU. And with the kind of speeds and type of amount of data that can take the points. All right, so this was just a little bit funny thing. I'm sure this metaphor, this joke can be proved in the years. But in general, there are some five take on messages from this panel. An introduction to Parallelization can speed up your tasks, but not any faster than the smallest serial pass. So if the smallest serial pass will always take eight minutes, you know, you're going to go faster than eight minutes. There's no way that you're going to make a fast class. And for example, in our case with the pasta, if you load data, what's commonly restricting you is actually loading the data from the hard drives. So this can't be speed up. You can boil the water faster potentially initially. That's more CPU power, but the pasta loading. That's eight minutes or in that case, in the example, cooking it is loading the data. That's eight minutes. Your hard disk is not faster. It can't provide faster. Exactly. So it's important then to think of your workflow. Sometimes you, if you feel that your processing is slow, and you need to see, okay, what is the kind of the smallest serial reading and processing. Because then if that is the, you know, the length of the process that maybe there's nothing that you can do that can, you know, take a bus to make a bus that will always take eight minutes. But maybe then there are things that can be run in parallel that don't need to depend on the whole process. And so then you, you start understanding where you can apply Parallelization. And when you can just stick with the previous execution of the code. The second thing I mentioned is that the benefit from Parallelization comes after modifying your code. So you really, you can just take off them very, very often. Most of the time you can just take your code and move it, you know, to a stove, to a kitchen that has multiple, multiple nodes. So, you, you need to not just modify the code and use this multi thread, multi processes. We also need to test that it's using this market. And this is also going to exceed the two and three that there are ways, for example, in the HPC cluster to check what was the performance of my code wasn't really using that for the four CPUs that I requested or 40 CPUs that I requested or was it in practice, I was only using one. And then the point number three is that sometimes we have special hardware that is built for Parallelization GPUs, but then once again, it will require us extra. And not all the libraries, not all the system automating the translator code so that it will use the GPU and work on the GPU. And sometimes it's not enough to just change the person. Sometimes it was something to change the full data loading, like, as I wrote there, sometimes we actually need to break the spaghetti that they can feed in this smaller memory. If your GPU only has a limited amount of memory, then yeah. Well, you need to divide your spaghetti into smaller pieces. That's how it is. So it's also the components, which is not the city compromising that I need to process this data only once we really need to spend all my time in rearranging the data reloading and it's a way of writing for the GPU. Or should I just wait, instead of doing it in one day, maybe I wait one week. And that's it. So this type of thinking on the resources that you need this needs to come from you. This to come this later to the specific. And then the fourth point is that requesting the resources from a big cluster, we should learn will inevitably require you. You are happy to request just one stove from one kitchen, and very often there is always a CPU that is available and you can start to you. You need to get to the conflict. But sometimes you really have a special cooking system that really requires 40 burners in the same kitchen for the CPU in the same mode for whatever reason, because you know the system software using. And then, you know, yes, there are those resources that are those notes available by you might need to be a little bit longer. And then once again, is it better to queue for two days and get something done in 10 minutes, or is it better to queue for one minute and get something done for us. Again, you know, this is not that there is no answer to this question, but it requires, you know, time and understanding what is the time and what is the cost. And finally, sometimes that's always remember that we might not even need organization. Not just because we need to do something that is serious, but also that sometimes the effort as I was just saying on rewriting everything into power doesn't really take forever still, it's beneficial to move the water on a remote HPC cluster. Because I see what was saying earlier, the need to keep my laptop running for 24 hours if it's a 24 hour processing job. When there's already a machine in the data center, there is already switched on with that there's already CPUs there already running, I will waiting for something to be so it's much more efficient for you for the environment. I don't think there is already on, there is already running where you can just send your strip there, most likely the data is also there so you don't need to start with the data around forgetting which version of the data is the most recent one. Let the machine, the remote machine to the company and then you can get the results to your local machine and make the pictures write the paper. And you can even work on on your computer without it being blocked by oh all the memory is used by my computation job or stuff. So your home can actually suck on your laptop while the remote machine is attached to the full available. So I think we're done with this. Some what funny introduction to Paralympic we're exactly in time. Let's check if there's anything on. I can make my presentation mode. Yeah, there was a. There were a couple of extra slides that I didn't include but you can look at them that some people might think well you know what I'm going to buy a mega pot. The mega pot is like okay now I have this mega pot I can make possible for people in just a single part. Yes, it's true, you can do that. I don't know how many times you have to make possible 100 people so this very expensive hardware is very useful if you rather than buying just for the set if you're sharing with others because today you need to make possible 100 people. And in general, I think this metaphor can be further. So maybe you people are listening to learn these things right now at the end of the three days you can think of other metaphors and metaphors to be able to understand and understand. Yeah. There's a question. How do you figure out what form of parallelism a code uses. I'm not sure if it's if it's analyzing code. Well you have to essentially look at it and if it's using some libraries. It's very likely that this is some kind of multi threading with within a certain framework. And inside some more low level library. If it's, if you did it, you or the top level of your code does it, then it's very likely that you can see it directly from the other code of the code itself what kind it's using. But if it's more the question what you should use and then this depends on the problem. There is no answer. Yes, like I would also add that like if you don't know if you're if you use already existing programs like like most of us do like who wants to write something that is available for everybody for free and made by somebody who's usually better at coding than you are. So personally I use everything. Like if it's if somebody has already created a tool that this does what I need it to do. I will happily use it because then I don't have to waste my time doing it and maybe do most likely doing it worse than the original author. So if I usually look at certain key buzzwords like they were already mentioned like multi threading multi processing MPI. Like if you see mentions about number of jobs or number of threads or number of workers it usually means usually means multi threading or multi processing and if you see MPI mentioned somewhere it means MPI. I usually like look at those buzzwords. If I see if I see GB you mentioned or CUDA it usually means that it's GB or it can utilize GPS. So just checking those can help you find out whether your program has a possibility of utilizing multi threading. And so many like let's say machine learning frameworks they often have like some sort of parallelism that you can use to do analysis just by adding not one parameter somewhere that makes it use multiple processes instead of one. And about what which framework is the best we are like Thomas said you you really cannot tell like this there's so many different problems and so many different ways of solving those problems that you need to pick the one for the for the situation. I didn't use the word embarrassingly parallel but it will come up in the future days but in general something that is embarrassing parallel it's like you really need to run the same process if I just change one. Because you want to test if you want to support the parameters like you want to test the script with the one with the two with the three. So that's where you should start if you know that you have a problem that could be splitted in this embarrassingly parallel workflow. Then that's the way that's where you should think. Yeah. So so in the also in the analog like if you can add more kitchens with more Italians that's like it can be it can feel stupid that you in hire more people to cook pasta instead of teaching one person to use multiple. Pasta pots but it produces the same results they or if it produces some results it's it's fine. So like adding more more cooks. If the and if the every cook does its own job. If it gets the same result it's it's fine like it doesn't need to be pretty if it works. Well I think the parallelism would essentially be there if you have a cook. If you if you can hire cooks on a temporary basis, then that works quite well. If you would actually have to hire them long term, but or buy new equipment, then it gets difficult. But since we can hire people for a certain tasks, yeah, should be fine. Okay, should we go to our break and then we can keep answering questions in the hack MD, even during the break. Sounds like a plan. Okay. Yeah, so keep asking and answering and we will see you back soon.