 this page. All right. If there's nothing else to mention from the RKND, I could now switch into this wider mode, Richard, for the next part. And Thomas will join me. Hello. Should I can I resize my... Yes, go ahead. Yes, I'll drop away. So have a nice one. Thanks. All right. Let's see how it looks in the stream. Yes, looks good. So now this is again something light. We're still kind of getting you inspired. And what's happening right now is that there's an Italian and a German cooking pasta together, which sounds like the beginning of a very funny joke. And you can already understand that there are differences on how Italians cook pasta, which is the right way. And how Germans might be want to cook pasta, but the discussion here is not about cooking pasta. But the idea is that, you know, there's a recipe. And all of this is maybe something that you are very familiar with. The usual pasta for four people, you need half a kilo of pasta. Of course, if you have access to the amazing sauce that my mom makes, that would be, you know, really good. But if you don't have in every any shop, sauce is good. You need some water and you need some salt. You need the tools, specifically you need a pot and a kitchen where, you know, where you physically your pasta will be cooked, a stove with as at least one burner so that you can actually, you know, do this pasta. And then the algorithm for the pasta. And this is important because so many people put the pasta in the cold water and then they boil it. That is wrong. You don't do it like that. So first you boil the water, then you add salt and then you add the pasta and then you stir every now and then. So the actual pasta part, you know, the actual computing of the pasta is basically eight minutes. So it's always a bit not that funny to explain the joke, but Thomas, do you want to explain the joke? Yeah, essentially, it's not it's not really a joke. But yeah, you have in this metaphor, you have your cook person, which is essentially the processor, the process that does the computation. You have your pot, which is which you can assume as one threat on your processor. So in one pot, you can boil a certain amount of pasta. You have your kitchen, which is your computer, which could have multiple pots, multiple plates, and so on. You have your stove where you have one or more CPUs that you that can essentially process information. You have your burner on your stove, which is one processor, one CPU. There could be many in more in yeah, one note essentially, or one computer. Yeah, well, the water memory is somewhat tricky, but you need a sufficient amount of water to boil your pasta. It's not going away in contrast to boiling your pasta because while boiling pasta, there will be some loss of water, but it's roughly staying the same. So let's assume the water is the memory that you need. You have bridges, pantry, all the stuff where you store your information, where you store your pasta, or if it would be a hard disk, where you store the data that you want to process. The pasta is the data that you take out of or that you take from the pantry and put it into your local node. You could imagine that in a restaurant, you have essentially multiple cooks, multiple of all of the above, except for a kitchen, it's just a bigger kitchen. So this could be somewhat of a cluster. And in a restaurant, you also have some kind of an order of the orders that are placed by the guests that essentially say, well, okay, I now have a table that needs these things, so I'll put that into work now. And the seating manager, and to some extent the kitchen manager is kind of the workload manager on the cluster. So this is kind of the analogy that we all metaphor that we are using here to explain the workings of how the cluster works. And our parallel computing works. So I'm the chef, I'm the process, I'm your Python process, our process, whatever it's done. And now my task is to actually cook the pasta, it will take me eight minutes in practice to do this actual cooking, so that with the water memory and the spaghetti data, I can process them because the pot, the thread is sitting on top of a stove, which is one CPU. Now a question for Thomas, if my stove has four burners, can I actually make the pasta in, would it be eight divided by four? So can I do the same amount of pasta in only two minutes? No. Well, normally not, because your pasta needs eight minutes to be processed. And each individual piece would take eight minutes to process. So no, you can't. What you can do is you can use more or cook more pasta at the same time. Exactly. And this is something that seems like a trivial thing when we put it like this with pasta, but it's actually not trivial anymore when you look at your code, and you have your code running on your laptop, and it takes 10 minutes or less, it takes one hour, you move it to the cluster, and it doesn't magically takes one divided by four hours, if four is the scaling factor. And this is something important. But Thomas already said, all right, we could actually, those eight minutes, we can't, that's how it takes to take the pasta. But of course, I have four pots. I have four, no pot, sorry, or four burners. I could at least make, you know, four times 500 grams of spaghetti. So I could make pasta for how much is that 16 people. But whoever me, the chef, I still don't know how to, you know, make multiple pasta because I'm a very beginner. I can only know to take care of one pot per time. So there's absolutely no way for me to start using the four, the four pots, which in practice mean that I will have to cook one pot and then wait, and then give it to the first four people, then cook another pot, give it to the other four people, and you can understand that my guests will be very disappointed because some people will eat the pasta first, and some will have to wait. I guess you, Thomas, know which type of metaphor we are talking about here. Well, essentially, you're talking about a non-parallel program or program that cannot use multiple CPUs. If you would have something that would understand it, yes, you could do four times the amount. And well, it's a bit of a problem that essentially, the individual pasta boiling or your individual pasta cooking is something that cannot be parallelized, while multiple times pasta cooking can be parallelized. And this is something where the plastic becomes quite useful because that is, well, you have a lot of machines, you have a lot of CPUs that you can use. So you can potentially run similar things in parallel without each individual computation being parallelized. Okay, now let's assume that I'm a little bit more skilled cook. I've actually learned a new technique, which is open MP, open multipot, so that I'm actually, you know, it's not like having four chefs, each of them taking care of four, you know, pots on the stove. But, however, I can still do, you know, at least because they're all in my kitchen and they're all in front of me, when I stir one, you know, immediately after I remember to stir the water in the other. So now I'm actually able to do, to basically cook four for pastas in my kitchen. And so more or less, the 16 people will have their pasta all at the same time. I guess, do you want to add something, Thomas, basically, what do we meet with multi-threading? Multi-threading means that you run independent things, mostly. The individual computations are independent, or the individual cookings are independent. The final product of those might be dependent on each other again, because at some point you will serve that to a table. But yeah, the individual ones are pretty independent here. Exactly. Because in practice, you know, I don't need to, you know, each pot doesn't need to know what happens in the other pots. But let's say that I want to scale things further, and I'm actually going to hire three more Italians that, you know, each of them can actually take more pasta. And of course, three is just the number I could start adding and increasing more. Now each Italian will work independently, but sometimes we will need to, you know, talk with each other. So the new technique that I'll learn is this MPI message pot interface. So that, you know, I can agree with all the other chefs, let's all start at the same time, let's all, you know, finish at the same time. Do you want to comment something on this message passing? Essentially, the only thing that I would comment here is that, yeah, now you are getting more into actual parallel processing of things because you start communicating in between the different chefs. Exactly. And of course, things can even scale more. I mean, now we're still in my kitchen and each chef is, you know, having a part of the pot. But, you know, we might want to scale more because I really need to cook the pasta for the whole apartment building where I'm leaving. And so let's imagine that in my apartment building, there are 25 apartments. Again, the question for Thomas, do you think we can actually prepare the pasta in tea divided? No. Well, it depends if you, with pasta, definitely no. But with other things, you could divide it further. It's the problem of what kind of problem you're running into. Yeah, in practice, we can't escape from those eight minutes. We'll always be needed. But of course, what can happen right now is that I can actually use 25 times four. So if there's 25 kitchens and each kitchen has four burners, I'm actually settling, you know, I could potentially cook 100 pots of pasta for, you know, for so many, so many people. And then of course, you understand that I can't me, myself be in the same kitchen. I need to, you know, start hiring other chefs in other kitchens. And basically, as we wrote there in red, start using other notes. So, you know, there might be some extra time walking up and down the neighbor synchronizing, you know, with the, with the chefs. Of course, maybe we don't walk up. But now maybe we can just call each other or join all the same, the same zoom. But in the end, we can make huge amounts of pasta all at the same time. More simultaneously, but it will probably take a little bit longer than one single, create, well, the setting up of pasta, because it's, yeah, you need, you need coordination between the different. And coordination is very important, because actually, okay, we have this great plan of using, you know, multiple kitchens in my apartment building. But it might be that the upstairs friends kitchen is actually busy. So it's not ready to, you know, start cooking the pasta. So I need to talk with Sir Lerm, also known as Slerm, which is the housing company manager to basically say, you know, can you tell all the floods that, you know, we really need the kitchen? Can you find me in an empty kitchen where I can, you know, get these extra forburners that I need for my, for my task? And so Slerm is something that we will see a lot in the next few days. Do you want to add a comment on that, Thomas? Not really. But in practice it is, you know, whenever you start needing more resources, you need to request those resources, and sometimes they're not available, and you need to wait, and you need to wait a little bit. And how to do that is stuff that we are going to go over in the next few days. And then finally, the pasta is ready. But there are some type of problems that, you know, for example, if you need to grate the cheese on top of the pasta, yes, I could go with a knife and slowly, slowly grate little pieces of cheese. But I could acquire, you know, some faster machines, some better hardware, which is able to do many parallel things, in this case, the grater, who is able to create lots of shreds of parmesan cheese at the same time. And this is actually an example where things can actually be parallelized, because it does in parallel do more. And if you have only one slicer, then yeah, it would take ages to get your cheese grated. But if you have more, yeah. And of course, I mean, the situation that can be, and this is the situation that you will often see in computing, it could be that the kitchen where I'm sitting actually has no grater. And I can't really, you know, I could just do it manually with the knife one by one, pieces of little cheese. But I actually might know that there's another flat in the, in the apartment building that actually has this amazing piece of hardware, which is the grater. And so I could request the grater for, you know, a few seconds in this case, borrow it, do the fast shredding of the, of the cheese, and everyone is happy because, you know, everyone can get, can get their pasta. So this funny thing, motivational talk for you. I hope we are good with the timings. We didn't take too, too long time. But in practice, the take on messages is that parallelization can speed up your task, but not any faster than the smallest serial task. Pasta will always take eight minutes. And remember to read in the pasta packet, what's the recommended amount of time for cooking it. The benefits from parallelization, of course, you know, they come after you modify your, your code tool to use multiple threads. The code, you know, there are some libraries that are already able to see they have more pots or more, what's the name, the burners, but often you need to modify, to modify your code. Then the special hardware, the GPUs, that we require that we talked about, it might be that it requires further adaptation of your code, that your code doesn't magically see that the GPU and doesn't magically work on top of the GPU. And sometimes it's not only enough to modify the code, sometimes you also need to adapt the data to the GPU, which as I wrote there, it makes breaking the spaghetti, which is something that you should never do. But if in this funny metaphor, you know, this is sometimes what you need to do when you work with GPU. And then when working on a cluster, you know, the resources are not there ready for you, you need to request them. There will be some queuing time. Sometimes if you just need one CPU, the waiting time will be really, really short. But sometimes if you're requesting 100 CPUs, it might be, you know, that you need to wait a lot. Depending also on the tasks, you need the 100 CPUs in parallel, because they need to talk with each other all at the same time. Or is it that some CPUs, some processes can get started. And we will look at this better tomorrow and Thursday. And finally, sometimes you might not need parallelization at all. But anyway, it would still be beneficial basically to move your workflow on a remote HPC, because in practice, what, you know, what I was mentioning also earlier, from a sustainability point of view, you don't need to let your machine run in things locally. You don't need to leave the computer on all night, even though you only need one CPU, you can let your task, you know, basically happen on a remote HPC cluster. The rest of these slides, we will not really cover them here. But in practice, they are left there for you. There are further metaphors that will basically go into these limitations that, you know, this parallelization doesn't grow or doesn't scale linearly at the beginning. Yes, it scales linearly. But of course, the more you start adding computational resources, the more, you know, you might need to have a communication between resources and queuing and other things. So in practice, parallelization is not, is not linear, does scale linearly. And then one final thing, where we still have five minutes, we can then check if there's something on a candy. But the rest of the slides are further metaphor that you can, you can basically check later if you need more inspiration and more details on this. And so I leave the rest the rest for you. But did anyone see if there's anything interesting? How can be that we should that we should comment? Let's see. People like the metaphor. How much does weight a lot mean when requesting resources? Well, that's that's difficult. I would say the max at the moment is somewhere probably like five days. So one full job length. But if you would want to have the whole cluster, you would probably have to wait more quite a bit longer. But I don't think anyone actually ever needs that. But this is a good point, because sometimes, let's say that I really want to request 100 CPUs because I want to run, you know, my pasta making machine really, really fast, because whatever. But maybe I end up waiting in the queue, you know, 100 amount of times that it could have taken just to request one CPU and then slowly, slowly. So of course, it's a tradeoff. There's no magic number that you will immediately find. But you start playing with the amount of CPUs or amount of course, sorry, or amount of RAM or other, other basically resources that you can ask. And then there you can, you can basically find the tradeoff. How much you will wait in the queue and how much your computations will actually, how long your computation will actually last. And I would say that this is a good like a bridge to the next topic that we'll be covering after the break, which is about how do you determine how big is your like, how much pasta do you have? Basically, how, how, how, what is the cooking time and what is the, like, how big of a pot do you need? Yeah, yeah, basically this kind of how many birds or how many pots. Yeah. So we'll be talking about this after the break. So, so like Enrico said, it's very hard to say it beforehand, but there are tricks that you can use to find all of these kinds of information out. Yeah, this is, I forgot to mention this, but of course, I could take a huge gigantic pot that fits, I don't know, 100 liters of water, but I only need to cook 500 grams of pasta. Then, you know, it will take me, I need to queue for this huge gigantic pot for the 100 liters of water when in practice, you know, I'm happy with a small pot. And so, you know, this, this type of practical issues, you will face them inevitably when you start working with the shared resources like HPC clusters. And, and in this course, we will basically look at how to request them efficiently. I don't see any other questions in HackMD. We still have one minute exactly before the break. Maybe I pass the word to Richard to wrap up this and then confirm until the break. When will the break be over? Well, I don't really have much more to say other than, well, like I said, this is the preparation for tomorrow, getting it where you can think about splitting up the problems and using the things. And we'll be continuing this cooking metaphor in the next days. Anyway, let's go off to the break. Remember to actually go and walk around some. Don't just stay sitting and see you at the 00 of the hour here. And you can always keep asking questions or commenting on things here while we're going. Okay, see you later. Bye. Thanks.