 Hello. We're back, and hopefully you can hear us well. So, the last session of the day, Parallel. Should we just get straight to it? I think so. To my screen here, Parallel. Yeah, so Thor is the leader here. So why are we doing, like, what's parallel and why do we care about it? Right. Well, we care about it because of performance, which is not always great for Python. Sometimes running things on one processor is not enough. It takes too long. But as we'll come to, I think there are a number of things you should do before you actually try to parallelize your code. But parallelization is increasingly important. I mean, you don't only have one single core on your laptop or your desktop computer. You have eight or maybe more, and then you have all these HPC high performance computing clusters around the world, which have tens or hundreds or millions, hundreds of thousands of processors or even millions. Yeah. So how much do you expect or should one of the watchers of this course know? We'll say at the end of the course. Well, I think everyone should, the take a message will be that there are these options for parallelizing Python. It's good to know about them. You might need to use it sooner or later. We will of course not have any time to delve into any details. But it's good to know what's there, what you can do. You can come back and learn more in the future if you might need it. Okay. Yeah, that sounds like a good thing. So what do we need to know about the modes of parallelism? Yeah. So there are different modes, but I think there are, we should first focus on these five steps, which one should consider when something is taking too long. So have you noticed sometimes, Richard, that you're running something and it just takes too long. Like you don't have, the data set is too big or the computer keeps churning and you don't get the result. Back when I did more stuff with data, yeah, sometimes. Yeah. So the first step is what? Well, measure. So it's a very common mistake to start parallelizing, to start optimizing before you actually measure. And there is this famous quote by a famous computer scientist. How is it now? Premature optimization is the root of all evil. Right. Yeah, that's it. It's a bit drastic, but yeah. So the point of that is that you should first measure and there are these profiling tools in Python and in any other programming language where you get information on where time is being spent. And you would use that to actually see where the slow spots are. Maybe there's a function that gets called a million times. And that's where 90% of the time, total execution time is spent. Yeah. And that basically matches what I've seen. So my code's slow. It's usually not something radical like doing parallelation or something. It's realizing there's something that was just written quickly and doesn't make sense. And with some low effort changes, I can make it work well enough. Yes, exactly. So parallelization brings in another layer of complexity. So if you get adequate speed up just from looking at that bottleneck and improving it. I mean, it can be about changing the algorithm. Maybe you don't need to use a for loop. You can maybe use broadcasting in NumPy. I mean, bring in these really fast libraries like NumPy and Pandas and so on. And if that's still not enough, there are these packages for pre-compiling. Like you add some extra decorators and some extra code to actually make your Python functions get compiled ahead of time or just in time. So NumPy and Scython are two very well-known packages. So we're not looking into that here. But there are links at the bottom of this page with tutorials on Scython and NumPy. And after that, you can start parallelizing, right? Or thinking about it. So what are the main categories of parallelization we should think about? Yeah, I guess that's up to, I mean, there are sort of different definitions. But one clear distinction you can make is first the first category would be embarrassingly parallel. It sounds a bit negative, but what it refers to is simply if you're, you have to run some code a thousand times with a thousand different parameters. That's an embarrassingly parallel problem because you don't need any communication between those thousand runs. They can all run at the same time. And that's usually pretty easy to do. Yeah, and there are some tools to do that. Like one of the tools that are shown below. Dask will probably not cover it here, but you can look at it later. You can do embarrassingly parallel things there, but there are also like workflow managers and so on that enable you to automatize these tasks and run them maybe all at once on a cluster or something. And then beyond that, what comes next? Yeah, then there's the, I guess, somewhat more complicated approaches, multi threading on one hand. And on the other hand, multiprocessing slash message passing slash distributed computing. And the difference is that multi threading is when you have many threads, which will be running a different course CPU course. Share memory so you can have sort of like a loop or something with the different iterations of the loop are handled by a separate thread. But they're operating on the same on data in the same memory. While the lack the last category there multiprocessing or message passing. You can do that on a single computer with so called shared memory too, but you can also run it on different computers. Or on different servers, whatever. Yeah. So there's there's a question in HackMD is my code automatically going to use all CPUs on my machine. Oh, excellent question. And this actually leads us to something we wanted to show you. Yeah. So I'll talk about this big elephant in the room when it comes to Python and parallel. Yeah. So the global interpreter lock, what is it? What is it? So it's a design choice that developers of Python made a long time ago, that only one single thread in a Python process can actually compute can actually run Python code. And this sounds like it makes it just parallelism completely impossible. And this under the hood, technically this is called the global interpreter lock. Yeah. So we have a little demo here of it. So I guess the main point is that things like NumPy aren't bound to this because they're running stuff in C code. Exactly. So that's another reason you want to use these languages. So I have a quick demo. I would say I'll do this too fast. So don't necessarily try to do it yourself. But let's see. So here's my Jupiter. I'm making a new file. I'm going to make a new Python file. And I will copy and paste this into it. So this is something that uses NumPy. And remember that NumPy is written in C, mostly. Yes. So I'm saving it. Now I'm going to make a new terminal. Actually, I have my old terminal here. Python NumPy test. Should we do it like this first? Yeah. Sure. Okay. And I'll use this time built in. You don't need to. The code actually spits out the time. It doesn't. Okay. Then good. I need to save it. Hopefully. Okay. So it took five seconds. Five seconds. And while you're explaining that, I'll do the next part. Yeah. So there is. So there are some environment variables. That control. How many threads are you are being used. And this particular thing that Richard just wrote on num threads. This has to do with open MP. Okay. It doesn't make much of a difference on your machine. Because most of my processors are being used for streaming right now. That's probably it. So when I run this on my computer, a laptop, a Mac, it's four times slower. And that's because when you just run NumPy code without doing anything, it's automatically parallelizing it. It's using threads under the hood. So called open MP. Threads. Yeah. But I guess this is the main point here is this is a demonstration that NumPy actually is using multiple processors. Yeah. So. So that's the take home message here. So a lot of things come for free. But we'll also show you how you move beyond this and actually parallelize things yourself. Yeah. And next up is multiprocessing. Yeah. I think this is a go to library for many Python developers if they need to parallelize. So which paradigm would you say it uses? Yeah. So this is the message passing the multiprocessing paradigm. You can run multiprocessing on different servers on different nodes of a supercomputer. You can also run it just on a one machine. Yeah. And it's also, you can also classify it in a different way. So it uses this map reduce paradigm. That's where you map a problem over multiple processes. And then you reduce the results from the different processes into one. Yeah. Which looks like here. So we have a function that does some computing. It squares a number. Yeah. And we have here a list of input values. And the function. And map. So here we've squared each of these numbers. But here we've done it. Maybe you could do this, of course, with a for loop. Or I in 123456. Or this, but this is just a different way you can use this map function. Yeah. And once you have the map function, then you can do cool stuff. Like if we look down below here, we've got a multiprocessing pool. And the pool. Instead of using a built in map function to Python. We do. Multiprocessing pool that map. And it will detect how many processors I have. So you can use all of them and run all of these at the same time. Yeah. Or if you only have four processors, you know, you four will be computing one thing at the time and then the next batch will be shipped, you know, so it's, there's some automatic distribution of work to those pools of workers. So if you can't structure your code where you have some code. And variable input data. And then this can do things very easily. Do we go straight to the exercise then. Yeah, but there's this big caveat there. And the warning box, you have to scroll up a little. Yes. This might not work for you. It does not work for me. And according to the documentation of multiprocessing, this should not work. But we found out that it worked for you Richard. Yeah, which is. What doesn't work, or is perhaps not supposed to work is to run multiprocessing in an interactive environment. So Jupiter, or the Python command. You know, the Rappel or I Python. Yeah, or technically technical reasons. So what is supposed to be to work always is that you put your code into a file into a script, and you run the script. So that's the non interactive way, but clearly in some cases it could work interactively too. Yeah. But there is a workaround. So there is this other package called multiprocess, not multiprocessing, but multiprocess that one. And it's a fork, it's almost the same code is just modified a little bit, and you can install it with this pip install command. And instead of importing, you know, in the import statement there from from multiprocessing you do from multiprocess. That's the big caveat so you can try both if you're working in Jupiter, you can try first multiprocessing, it might crash. Yeah, or like it might not work you might give you an error. Which, and it's pretty cool how you've written it one way and by simply changing some imports you can change the way it's running. Yeah. Is. But okay, so how long should we give for the exercise. I think 15 minutes. But let's shall we describe the exercise a little bit. Yeah, okay. So one minute just to get everyone on the same page. So what do we do in here. So it's a toy example. We're computing pi. We have this function sample which takes in a number. And it checks if that number random number between zero and one, whether it's inside. No, it. Yeah, so it takes in a number of iterations. And then it computes random numbers x and y and checks if the sum of their squares are below one. Yeah, the unit circle. And then it increments a number so it's it's sort of a way to compute pi. Yeah, it's like putting random or putting random points on a square. How many of them are inside a circle on the square. And from that you can compute the pie. Yeah, and the idea here is that you can call it with a million and is equal to a million and you can run it on one processor. It might take some seconds. The task will be to use multiprocessing and this pool construct the the work. Yeah. And split up the work between different processors. Yeah. So sample takes an argument how many times to do it and we use multiprocessing to call samples say 10 times with 100,000 each or something like that. Okay. Yeah. And there's a solution as a hint, if you'd like. Okay, well, I guess we will send you to it. Yeah, and keep the questions coming in. Yes. Okay. Talk to you soon. Hello, we're back. All right. It seems that some of you had some issues. Yes. But I think they will be addressed in HackMD and in the document so it's understandable that it doesn't work for everyone. I mean things can can go wrong. Yeah. I hope it will be useful at least to have a look at the solutions to see how it's supposed to be done. Yeah. And this is like once you get to the parallel stuff debugging it can be quite tricky. I mean, this is we're not dealing with the parallel code itself, but yeah, don't get discouraged and keep working at it and see our examples. And if the examples don't work, let us know and we'll fix them. Yeah. But for the next half or 15 minutes. We have two other topics to go to. See, I'll switch back to my screen. So there's MPI. And I believe this is a demo. Yes, it's mostly a demo. If anyone really wants to type along, you can do that, but we cannot guarantee that everything will work because MPI. MPI, it was passed you to install MPI for PI was part of the software installation instructions for the environment. But that's like the most ever prone package among the packages that we are to install. Yeah. So there could be an issue, but let's see, you can try. Yeah. So MPI, the, the lesson says it's message passage passing interface. So what does that mean exactly. It's an old package, right? Like it's an old library. It's an old standard. I mean, been around for 40 years, maybe, or maybe more, I'm not sure. And it's still a standard workhorse of HPC it's used, it's installed and available on every single super computer on the planet. And it's a very different model from what you're used to. I mean, if you have written many serial programs, you just thought about, yeah, running things on one core. You really have to take a different perspective. You have to think differently when you use MPI. That's the bad news. The good news. It's, it's really powerful. You can do a lot of stuff. And you can paralyze a lot of different problems. And maybe let's just walk through those items quickly. So you talk about tasks or ranks. So that's the parallel processes that are being run at the same time. And they each have an index. They have a number, zero, one, two, three, and so on. And they manage their own memory. So this is a distributed memory paradigm. They communicate explicitly. So when you use multiprocessing, you use this parallel map, the PMAP function to automatically sort of send out work to some workers who will then communicate back their computed result. And a lot of stuff happens automatically there. With MPI you really have to do it manually. You send messages manually. And what the main sort of gotcha, the main thing that you have to keep in mind is that all the tasks also known as ranks, they run the entire program. So it's not like you saw with multiprocessing. There was only one line of code where you actually submitted jobs to the remote workers. Okay, it's the case instead is that, you know, every rank runs through the entire code and yeah, at the same time independently. And I think we have to look at an example to make this more concrete. Yeah. Should we try to scroll down. Yeah. Okay. So this is an MPI example. So, first is from MPI for pie import MPI, which I guess handles everything. It imports everything you need. And, and the code is familiar right is the same code you did for the multiprocessing. So yeah, okay so sample takes a number of arguments or number of iterations and returns to values. Okay, yeah. The rows that are marked in yellow are the MPI specific ones. The three first ones that come size rank they are absolutely standard you do it every time every single MPI code has this. First you have to. Well, the calm thing is the communicator. It's like the world in which your ranks live. So every rank every process that's inside a communicator can talk to each other. And then size and rank. You always can call this get size and the get rank methods to first the size is the number of ranks you have. And the rank is the rank of the process that is currently running this line. As I told you, if you run this with four ranks, all the ranks are running the same code, and they will all get a unique value of this rank. So basically process zero can say, okay, I'm the leader. Yeah, typically. Yeah, usually you have one master rank master process that you will see how. Okay. So scrolling on down we see and tells us we do 10 million iterations. So if size is greater than one. So I guess that says if we're actually parallel at all. Yeah, yeah. Yeah, you'll often see conditional statements like this and MPI so if my rank is zero do this if my rank is something else to something else. In this case like if the size if we're running on more than one course, then split up this number and into the size. I mean you split. So yeah, you divide by the number of parallel processes. Yeah, otherwise you just and task becomes the value itself. Okay, and then we have this is about timing here. Well, timing and then it actually runs it with the number of tasks we need. Yeah. Yeah, and there that this is important to keep in mind so each rank will run this line independently of each other. And they will get their own value of this and inside circle variable. And will go into this function, the sample function. And, you know, generate the random numbers, everything that's inside that function and return a unique value of this and inside circle. Okay, yeah. For separate processes for different values of inside circle, and then they're collected or gathered an MPI terminology here. Yes. So this is one out of many methods that are common in MPI. This is the gather one. Maybe the most fundamental ones are the send and receive methods that's to explicitly like send a message between one ranked and another rank. But this gap and that's called point to point communication, like two ranks explicitly are communicating with each other. But this gather function here is it's called a global. It's belongs to the global communication methods. And in this case, what happens? What do you think happens, Richard? Well, an inside circle and root zero. Does this mean that if you're zero then or zero gets the values of everything else? Yeah. And does it add them together automatically? I don't see a sum here. No, it's not a reduction. There is another collective communication method called reduce, but this is just to gather the data. Okay, so this becomes a list. It becomes a list. Inside circle. Okay. Okay. Yeah. So down here we see so if it's rank equals zero, then we do some of that list. Yeah. And the final calculation formula. Yeah. Okay. And that's just demo this, right? Yeah. Shall we do that? Okay. So I'll be typing somewhat fast here. So maybe just watch. I copy all the code. I'll come back to Jupiter and make a new Python file. I will paste it all. I will rename it to PI test. I will remember to save it this time. I will go back to the terminal. And then to actually run it. What do I do? Yeah, you don't run it the normal way. You can run it the normal way. Python actually try doing that just for good measure. This works. Yeah. Okay. It's not crashing. So crashing, but it's running up. Yeah. Yeah. Which I guess it's good because then you don't have to use the MPI. No, exactly. Yeah. Okay. The MPI way. Yeah. The MPI way of running things is not to just run it like this, but to append at the beginning of the command. MPI exec. So MPI execute. And then you specify the number of racks, the number of. So let's try four. Okay. Yeah. And then Python. Python in parallel. Well, 1.1 seconds. So that was actually four times faster. Yeah. Four times faster. Unlike the multiprocessing example I did. Or no, I like the MPI example. Let's try. Let's have a look at the output. Yeah, you can run this too. Okay. Yeah. That was a bit faster. Okay. Yeah. So the output. So there's a print there before gather after gather. Yeah. So it shows all the ranks and there's something and inside circle becomes none. But for rank zero, then it got this list of everything. Yeah. So after the gather, the rank zero has all the results from all the other ranks. Yeah. And then computes the average and prints it. That wasn't too hard. No. Well, it's that these collective communication methods are. Sort of high level. It's many algorithms can sort of be mapped into that particular. I enter, you know, using community global. Collective communication. There's a lot. I mean, this is a whole science. I mean, there are week long workshops on MPI. Yeah. We learned there. So, yeah. But if it's interesting to you, if it you think this could help you in your project, whoever is watching this, then look closer into that. There are some links inside this lesson to further reading. Yeah. Okay. So what comes next? We've got a few minutes left. So. Yeah. Coupling to other languages. We mentioned in the library ecosystem some. So the basic idea is if you already have some code in another language, you can usually pretty easily use it from Python. And this is probably one of the big reasons why Python became so. A common in science. It was good for doing general work and good for connecting to things. So you could glue stuff together really well. Should we talk about desk and task use. Let's just mention it briefly. Again, this is another huge topic. You can do so many things with desk. Maybe some of you have heard about it before. You can do. You can do this embarrassingly parallel stuff. If you like, if you import task and you use task, you can sort of specify that. All these independent steps, you identify some independent steps should be run on different processes. But another very cool thing about task. Are these distributed data structures. So there's like an analogy to NumPy array called a desk array. Okay. There's an analogy. Yeah. You see it down there. Yeah. And also the same thing with a data frame. So there are these distributed data frames. Yeah, go ahead. If I have a data frame that say 100 million. Rows, I could use desk for that. And it's sort of automatically parallelizes it for me. Yeah, you can split your data frame over. Any number of computers, any number of ranks, not ranks tasks. Okay. Nice. That's specified down there, right? So the, with the array example. You're creating chunks, you're telling, you're creating a two by two random matrix. Okay. And then you're saying that you want to chunk up the matrix. So it's a 10,000 by 10,000 matrix. You create 1000 by 1000 chunks. And that's will actually put those chunks on separate CPUs, separate processors. And then you have all the other common NumPy stuff like transpose minus mean or whatever. Yeah. Okay. So is that the end? Task use, is there anything to say there or? This has to do with these embarrassingly parallel use case of desk. But I mean, it's such a big topic that people will just have to look at these links. There are links down there to other tutorials. Yeah. Okay. So that's the day. What was the summary of parallel? Oh, sorry. What is the summary? The summary of the parallel lesson. I guess three modes, profile first, use existing libraries. And if you need more, it's available even in Python. Yeah. Okay. But ask any questions on HackMD. Yeah. Is there any high level general questions or? Let's see. Or the technical integrity questions. We need feedback of the day. Oh, yeah. I think the day worked pretty well. Yeah. I think I hope everyone found it to be interesting and not too fast, not too slow. Yeah. Thank you a lot for these positive comments. Very nice to see. I'm a feeling one of the keys to a good workshop is don't let me teach too much. Unless I'm actually focusing on it. The more we have other people teaching, the better it is. I think your teaching has been awesome today. But today, but that today the day I haven't been teaching yesterday, not so much. Right. I dumped the panda stuff on you. I mean, it just was too busy. Yeah. But teaching in this collaborative way, I can just tell everyone listening that it's so much fun. Yeah. And it's efficient. Of course, takes more manpower, but less effort for every individual person. Yeah. Yeah. If you want to join us. So the good way is the code refinery chat. There's anyway lots of interesting discussion there about a scientific computing and technical topics. If you would like to teach with us, well, hang out there and let us know you're interested in being a co instructor. If you're in an organization that's not offering this suggests them to point, point us to them or point them to us and say, look, like if you advertise this to us, you can run your own breakout room. You can have a co instructor. This is an international partnership among. Is it only two countries this time? No three because separate here from Norway tomorrow. And we can take in other countries. I mean, there's no limit on how many people can watch Twitch. So, you know, the more people the better and we can even offer more of these other courses. Yeah. Maybe we should ask at the end of tomorrow. For what as people what they would like to learn next. I mean, is there any new lesson that people would want to see that's not already out there that is not being taught. Yeah. We're in a few weeks we're planning a course in February called a workflows course, where we'll go into more details about specific ways people put all of these tools together. Well, one person actually takes parallel stuff and array jobs and so on and puts it on a cluster. So if you're interested that might be a thing to contribute to. There's several more follow up courses we've been mentioning but the code refinery course is really important because it teaches sort of the software development side of stuff like the version control the automated testing and so on. Well, I have to use the opportunity actually to advertise the ENCCS workshop on Python. So it takes the next step after this one. So with a focus on high performance computing and high performance data analytics. What is it offered. Initially we were hoping to give it already in February but February is a bit packed so it will be moved to April. Should we do a live stream. I think we should opt for that. Okay. Well, that would be nice to do. Let's do that. Let's try so keep maybe I can even throw out some dates here. Yeah. We have tentative dates set. Anyway, the lesson follows the same format as what we're looking at now for this Python for scientific computing. We are thinking no actually may 18 19 May. Hmm. But yeah, block your block your calendars 18 19 May Python for HPC high performance data analytics. There's a lot more on desk there. There is more an MPI. There is number of sites on profiling GPU computing. Yeah. So tomorrow what do we have. There is so we aren't continuing parallel tomorrow because we finished it today. So yeah, I agree with the comment there's so much more to do but well, it's basically multiple week long courses. So tomorrow we're not so much about the Python programming itself but once you have the program how do you use it. So one is the dependencies. So how you keep all of these other things you're using organized. And then we talk about binder, which is a quick introduction to how you can allow this service called my binder for other people to run your code easily in the cloud. And then packaging, which is basically taking your code and making it where it's installable, which even for you, I've been happier once I've made stuff installable. And then there's a panel discussion. So we basically take all the instructors and anyone else who'd like to participate, and we come around and will answer any questions you may have. And you can see how much we agree or disagree and stuff like that. Then there's a quick outro and then I'll paste a zoom link into the Twitch chat and HackMD and anyone can join us and sort of talk about what we're doing together. Like, you know, you can talk interactively that way, if you'd like, sort of an after party. So be prepared for that and come with questions. And then will we do a similar course on R? I guess we don't have enough R experts to do this, but I'm sure there are. I can flag that there is a workshop being planned as a collaboration between several Swedish HPC centers to teach over three days, Python and HPC R and HPC Julia and HPC. HPC focused, but yeah, so it doesn't really perhaps exactly match what this person was hoping for, but I just thought I'd like that. Yeah. If you look around, there's countless courses on all these topics around. So as far as I know, we're the only massively open live stream course using our kind of strategies, but we hope there'll be more later. Yeah. So should we get going? Yeah. Yeah. So I guess we'll leave the feedback open for a bit more and see you tomorrow. Yeah. See you tomorrow. Thanks for today. Thanks a lot. Bye. Bye.