 We are back. Hello. Hey Okay, so in the next section we'll talk about parallel programming Yeah So what does this even mean? Yeah, okay, so this Couple of objectives for this, but I guess understanding what parallel programming is is Probably the only real objective here and then we can try it out using a couple of different libraries But of course those are not all the options available So yeah, what does parallel programming mean? it The short of it is that you are running a program on multiple computers or Really multiple processors on Either one or more computers So these days essentially all laptops or essentially all computers have more than one processor available at any given time and By default you will be running on only one. Yeah, so this can make your code faster, but Yeah, I guess I guess it was the theory of multiple processors and it's sort of like a processor does one thing at once so Well Hmm. Yeah, maybe that can do one calculation at any given time but in some cases you may be able to do Do things at the same time and therefore be twice as fast But that really depends on your on the problem you're trying to solve So yeah, I guess the first thing really to talk about is Do you need or can you even have a parallel version of what you're doing? When is it useful? So The first thing when you start thinking about that is to well I mean you when you start thinking about it You probably have a code that is slower than you would want or you would want it to do more than it's doing now do something faster and Then the question is is it slow because It's doing everything it can as quickly as possible, but it's just not possible to run any faster Or is it slow because you are using something less efficient than Then it's optimal that it's possible So the first thing you should really do is Unless I mean if you haven't done it yet is to profile your code and find out where it's actually running slowly where it's actually spending time Mm-hmm, and there's many options, but just for example G pro for Python is really useful Yeah, so you identify those areas and then you think Can you make them somehow better? Can you make them faster? So Is there a library you can use to replace that replace some Python code? Can you use a a better library? Can you use a parallel library perhaps? Like it's something else to do the parallelism for you. Yeah, just if somebody's done it for you already then there's no reason for you to do it Any other low effort optimizations Yeah, I have a feeling that really these days I've very rarely done Parallelism in my work other than embarrassingly parallel and it's mostly been other libraries that already support it and I have to use In this one case of a really big physics simulation where I've used and I actually written parallel code for my on my own But otherwise actually never done it Yeah, so Okay, but yeah after that you think about writing parallel code Yeah, okay, yeah, so what's next okay, we have a quick Kind of introduction to a different paradigms of parallel code Of paralyzing your program. So the first one is called embarrassing and parallel and there's nothing really embarrassing about that but Sometimes you can just run two copies or many copies of your code and it will be that times that many times faster So sometimes you just want to do the same thing for a hundred thousand files And the things don't depend on each other in any way and then you just one Ten hundred million will not mean in hundred thousand copies of the program And it's equally good. So it's that much faster. And how do we do that command line? I guess Command line command line script that stuff. Yeah You can also use some Python library to If you're not familiar with command line way methods for doing that, but yeah probably command line Yeah, okay, okay, but then Somewhat more complicated Situations, so there's shared memory parallelism. That's when the different things you want to do to those different files for example They do need to communicate with they depend on each other in some way So they need some information for from the other processes And if that happens, maybe you can still parallelize it, but you need to really think about how it would work And shared memory means that they have access to the same variables and they are running on the same computer. So I mean that means it it's a It's fast for them to talk with each other And this is mainly a different way of thinking about programming rather than Combat the message passing so in message passing you write a program that runs a single copy Of your parallel application and then you really you write what what this program will send to this program and what this program will send this program Yeah So in in shared memory those all just have access to the same variables in the code So shared memory easier to write I think if you're getting started. Yes, I started with message passing and it's just more natural for me to think in that way But I think from mostly especially if you're starting from a code that already works It's much easier to Parallelize a small section of it in shared memory Because then you don't need to touch most of the code But there's one kind of important but also kind of an important detail Python is not really meant for multiprocessing And there is something called a global interpreter log which basically means that one process in Python can only have one Interpreter running at once you can't have to Python processes interpreting the Python code at the same time Within one process so it's like in one process. Yeah So really it kind of means that there is no shared memory in Python, but In it's not as big a deal as as it sounds or because you can always use libraries that run see in the background or other languages And you can use libraries built around I mean You can use libraries in Python that are built to go around this problem Like multi processing Yeah, so There is anything any questions before we move on So it's like saying I think So in Python you can have shared memory, but you can only do one thing at a time with that shared memory Yeah Yeah, I guess but then why have shared I mean, yeah, it's one process running one thing Yeah But um, yeah, you can always call like you can install a parallel version of whatever NumPy is calling in the back end Yeah Yeah, like actually things like dump I inside pie They can be parallel and use shared memory because they're not running a Python. They're running at C So yeah So anything that runs outside Python can Can you shed memory parallelism In a relatively straightforward way Yeah, which is one of the reasons of using these other numerical libraries So use that Export your work to that and then they can run in parallel And it's also that running anything that's actually interpreted anything that's actual Python code is much slower than something that's written in C Like we already saw in the NumPy section So If you're calling NumPy sci-pi pandas That will be faster in any case Yeah Okay Okay, well should we never the last try it out Let's say we are at a point where we've determined that we want or we need to use Parallelism and we want to do it in Python Okay, so I'll switch to my screen since I believe I'm doing the demo here So mildly processing is a Python library that goes around the global interpreter lock Do you want to know how it does that? I should just try it out Can you say it in one sentence? Basically it just creates multiple processes It creates multiple Python processes And it's actually message passing they actually send messages to each other So Okay But yeah, it feels very much like shared memory Yeah When you're using it Okay So Yeah, let's define a function And This we will want to apply this function to a huge bunch of different data So Let's Let's see. So there is this function called map And it will do exactly that it will apply the we can use it to apply the square function to a list of numbers So we can try to see what comes out So this is a very common case actually that you might want to do to have a function and apply it to a huge bunch of data Yeah So it returns this map object you have to actually ask for an element before it does any calculation It's a very lazy library But I think Yeah, I think one thing you're gonna said like this concept of defining a function and multiple data and you apply it That's probably one of the most common ways of doing parallelism stuff So Like when you're thinking I want to do that like think of this mapping kind of thing and you'll get very far Yeah So this is either split apply combine Or map reduce depending on who you ask Those two isn't I mean they're kind of the same thing So yeah, okay, so it okay. We are only applying it to six numbers, but We can still parallelize it and if we were doing it for a much larger set of numbers, it would be really fast. So so let's just do that Yeah, okay, that makes sense. We're defining the data first So let's import the multiprocessing and then let's actually import the pool function a pool class from the multiprocessing And in this case it needs a pool of workers or something Yeah, yeah, it means a pool of Python processes or a pool of workers Yeah, okay All right, and then you get a new pool of workers using the by calling or creating an instance of the pool class And pool has map. So just like Python has its own map function pool dot map is the implement multiprocessing implementation of that So you can run the same thing as the map function But in this case it will run on multiple processes Let's see Data I guess we need to save the output Yeah, we do want to print it or save it Somehow Okay, it worked Yeah, it worked Okay, so There is an exercise where we do it for a much bigger set of data Yeah I think we'll do that at the same time as the MPI exercise Yeah Or you can do it Yeah, sorry There's hack and de-question. So multiprocessing can call several instances of Python and combine it at the end Basically exactly Yeah So it starts multiple interpreters and like sends the data to there and it runs and then it sends it back So there is an important limitation actually that comes from that You have multiprocessing needs to be able to take the function and send it to the other processes So the other processes are running Python, but they're not running this Python notebook They're just running this multiprocessing library somehow listening for commands So Only we have the function and the worker processes don't So that can sometimes cause some problems. We'll see if we'll run into any Essentially the function needs to be in the current namespace and it needs to be possible to turn it into a string and send it Yeah, okay So we had this exercise that we'll do at the same time as the next one, I guess Yeah I think so Yeah Okay Alright, so next we have a quick introduction to MPI MPI is the message passing interface Alright So MPI is the message passing interface. I just mentioned the message passing paradigm And that's what MPI does So And I already kind of very quickly mentioned that you think about writing MPI code in a quite a different way So in MPI you write code for a single process And you have them send messages to each other So when you start an MPI program it will run multiple processes Each will get a number And you can use those numbers to figure out what this process is supposed to do and what data it needs to send to the others They all have their own memory so they cannot Like here we are actually referring to the same data here in the multiprocessing But in MPI they just all have their own data, you can't refer to the other one's data unless you send it first Okay, so I guess that's mostly what you need If you don't have any MPI library installed, okay If you install it from the condi environment, it probably does install an MPI library If you just install MPI for Python, MPI for Pi You will also need to install some MPI library in your system, so that's just another warning So we're saying Almost certainly already have it in the condi environment So we're saying it's better to install it from conda Or Yeah At least for now Yeah, if you're working on a package that uses MPI then you'll need to take that into account So it's probably best to just go through an example Yeah, okay, I'll come back to this example It's very commonly used, you may have seen it before if you've taken any parallel processing courses It's one where we evaluate Pi by throwing dots at a dart port and just basically randomly Randomly throwing points into a square And seeing how many land inside a circle Yeah, and from that we can figure out the value of Pi Okay, so that's what the sample function does I guess we don't need to go into that much more detail about that And then we have these yellow lines You were about to say something? Oh no Go ahead Okay, so these yellow lines here are MPI things I guess the first one is here, so you have to import MPI to be able to use it But Once you've done that you will have access to this Well, Commworld I'm not going to go too much into it, it's something MPI needs It defines what processes exist It tells MPI what other processes you have started And the size is how many processes there are And rank is the number of these processes So it's kind of the name of this process Okay So it's like MPI starts and then there's a size number of different workers Yeah Which are communicating and then you're one of them You are one of them and you can identify yourself by this rank Yeah So what we do here is check that there are more than one process More than one process And we divide the number of We are calling this sample function this end task times But if you have more than one worker Then one of them calls it this many times Yeah End time Overall end times and end task is how many times we need to hold it Okay Yeah Okay And Well then we just recall it We gather the data Collect the data on this process And then at the end we need to communicate So now each process has some data But they don't know anything about the other processes really At this point So they need to send their data to some place To collect them in one place And here we decide to send it to process number zero And well we have, we're printing out what happens So that you can easily see when you run this And then process number zero actually does the estimating Calculating pi and printing out the result But I guess the important bit really is here So everybody does only a part of the work Then they only have a part of the result And they need to communicate it to one single place Yeah So I guess it's like a very careful balance Dividing up the work is sometimes easy But then sending and receiving it If there's a slight mismatch there Like someone sends some data and it's not received by others then I guess that's really bad Yeah Actually if you send or receive data in MPI And the other process doesn't know about it It will just hang there and wait Until you stop the bug Okay And this is, I think something that happens often In parallel programming It's really easy to get messed up And then stuff doesn't run or runs wrong And it's super hard to debug Yeah Okay Yeah Okay So So should we give some Yeah, let's give some time to do the exercises So you can pick the order Yeah Should we give maybe even 20 minutes or 25 minutes I think these are going to be really interesting exercises to do Yeah So 25 minutes and then Okay We will want to go to the desk section if we have time Yeah Okay So 25 minutes is until 40 Okay Yeah, okay Great And I'd really encourage you if you Yeah, like try these exercises because they are I'm going to show them first These are quite good and something quite common If you haven't done it before Especially multiprocessing Okay Good luck then See you later We'll chat via HackMD So we're back This was certainly an interesting exercise session So Yeah, plenty of problems getting things to run But also sometimes it works well So I was wondering could some of the problems with the Multiprocessing pool On some windows be If it's a university-managed computer And they have some sort of security policy that prevents it from Spawning new processes It's possible Like it wouldn't think that it would Affect security that much since you're still running things but It is a program automatically spawning processes rather than threads But some people were using thread pool and that did work On Windows So at least there is a solution for some cases Yeah So if it's not working on the windows things I think there's not much we can do right now So this is something maybe go Work with your colleagues and try to debug it and see We can try to debug it over the next few weeks also Who knows if we'll find something Yeah So Quickly demonstrate a way of A better way of doing it you could say Sure Maybe not always a better way But Dask is a very useful library And it is worth showing here at the end You've got this great Yeah So essentially what Dask does is If you're using NumPy it makes parallelization easy for you At least for many familiar NumPy operations So Dask has its own array class But when you do operations on those Dask arrays The operations get done in parallel So they're just distributing It is automatically distributing the data Doing the operations and then when you need it It returns it to the right place I already said that You can use parallel backends for NumPy and so on When there is a library that does the thing you want And it does it well in parallel Then you should of course use it Rather than try to write something yourself Yeah Because this is a good example of doing it in parallel Using the NumPy interface Doing things automatically for you So you don't have to worry about it So should we just do the demonstration? Yeah There is also an optional exercise That we will not have time to do But you can take a look Should I do the demonstration or do you want to? Maybe it's good if you do it Okay I'm not actually 100% sure I have Dask installed But I can of course always run this command I think it's there It says it's there Okay Great So we have Dask It's nice that you can run pip directly from Jupyter Okay So what we do is import Dask.array Okay Now we have Dask.array And we call it da here So this is Very similar to NumPy.array So we take For example to create a random array We would do Dask.random.random Which is the way it happens also in NumPy There is a random.random function We give it the size So let's say 10,000 Times 10,000 Yeah, it's about right And we need an additional parameter Or rather we should provide it It makes sense to provide an additional parameter Which is chunks Did I spell it right? Nope No, it's correct Chunks So this will tell it to split the array into Sections into chunks In both directions So we'll do chunks of size 1,000 Do you know if these are the chunk size Or the number of chunks, I'm not sure That's a good question I'd hope it's the chunk size Or what's here But chunks kind of to me Means number of chunks We'll see So we can do operations That work in NumPy For example x plus x works We can take the transpose And let's also subtract the mean So the mean function works We'll take the mean across the First axis And yeah, this should run Ooh, this is interesting Okay, whoa So maybe demonstrate I'll demonstrate this first So when you display a desk array It will show you The size of the array in bytes The size of each chunk And a nice graphical representation Of the chunks And the whole array Okay, it's type float 64 This is a little bit of a thing By the way, in a command line You wouldn't get graphics like this So this is like sort of 100 chunks Which is So we have 10 times 10 chunks The size of each chunk is 1,000 times 1,000 So that's what you expected And 100 tasks So when you do an operation For An array like this It will get split into 100 separate tasks That will be run on the processors That you have available I only have four So it will of course run many tasks For each processor Okay, now we can run this operation And let's put it in another Let's call it R for result Or let's just call it result Okay And if we try to print result It doesn't fortunately print all the numbers There's a huge amount of numbers here It just prints some information about it Yeah Okay, we can Okay There are numbers there though So let's do 0,0 I think Is the syntax in NumPy Okay Okay, and it shows Array 8 bytes Okay, I shouldn't go off script Because I don't remember the number Because it's a single number If you do print Directly When you have three minutes time left That's not the right time to go off script Yeah Okay, so anyway It Yeah So, well Is this the conclusion of the day Is this the summary here Yes, this is the conclusion There is a short section on task Task queues That you can read, but yeah So the key points Peer Python is not great for highly parallel code But you can do it There are interfaces for libraries That do things really well So you can use them And usually that's enough And then You can combine vectorized Functions So NumPy, SciPy, Pandas functions With The existing parallel strategies In different ways So BASC is a great way Or using some Proper backend libraries a great way And that will get you very far But There are options if you need to go further I think that's the most important thing Yeah The most important things And I guess parallel computing Is this like huge Field Which Has many different tools and techniques And all kinds of things So If you start needing parallel things This is just the smallest introduction To get you started, but there's plenty more You need to Learn yourself I found a way to get the numbers From the task array by the way Do you want to show us? Yeah, if you want to switch to my screen I had this This result object It didn't actually compute anything yet This is something I definitely should have mentioned I'm not sure if it's In our results So you have to call the compute function To get the numbers So it's like Until you do compute It's not actually computing anything It's just like It's constructing The Constructing essentially a code to To compute it Instructions to compute it You can Slice it before you compute it And that will make it much faster to compute Yeah Okay, that's compared to this Yeah