 So welcome back everyone after the break and after the exercises. So we will now have the NumPy lesson where we will learn about NumPy, which is a library for numerical calculations with Python. This has become a cornerstone in numerical calculations with Python. So we have encountered Python lists and seen how flexible these objects are. It turns out though that these are not always so fast for numerical calculations. So instead here with NumPy we introduced the concept of arrays. So Richard, what are these arrays? Yeah, maybe it's best to discuss what an array is compared to what a Python list is. So an array is compact, meaning that it just brawled data. There's not any overhead for every object. And second it's of uniform type. So basically everything in an array has the same data type. And this means that you can do operations on them much faster. So basically it's at the same as C or Fortran or any other kind of numerical code would do. Can you scroll down a bit more? Yes. Yeah, there we go. So here we see or a bit above we see an example of a Python list there with all these different data types in there. And running any kind of calculation on this, the Python interpreter has to do a different operation on all of them. But with NumPy it's different. So it's like the raw grid. And if we scroll down the main properties of it are the data type, which would be like float or integer or date time or things like that. Also including the size. So is it a 32-bit floating point number, 64-bit and so on? Then there's the shape. So basically what does it look like? And then there's the raw data, which is just the raw memory on the hardware. So NumPy is basically a wrapper to the C or Fortran routines, which do these raw calculations. Yeah. So should we do a quick performance test of these two? That's a good idea. So we will compare Python and the Merkle Python working with 10,000 elements. So we will start with Python. So I have here a Jupyter Notebook where I create a list. This is a regular Python list. So yep. And then I can create the empty list B. All the elements here are from the beginning having the value of zero. Okay. So we've made it. And now we're going to square every number in A and store it in B and use the time it magic function, which we started exploring in the previous lesson. So here we do it with our for loop for I in the range. And there's the length of the list A. And then for its iterating the loop, we calculate the square. And this for loop is basically exactly what we want to avoid. So anytime you're telling Python to loop over things yourself, well, it's going to be slow. And here we see. So it says it took about four and a half milliseconds, which seems fast, but let's see how fast it is with NumPy. So NumPy is a Python library. So you import it as usual. Indeed. And by convention, we do it as empty. Yeah. It saves some people time. So let's see. And then we make the two arrays. We're going to learn more about these functions later, but it does basically what you think. So A range is like the range function, but array and zeros makes an array of zeros. Let's time it. So notice here we have only one operation. So we see B equals A squared. So basically Python says, oh, I'm using a power operation. It sends that into the NumPy code and NumPy does it all in fast C. And let's see how long it takes. Oh, this is much better. Nine microseconds. So what that's like 500 times faster by using NumPy compared to the Python version. So this is a good speed up. And also the notation is more compact here, writing simply B equal A squared. That's also an advantage. Yeah. So what are the ways we can make these arrays? So we saw zeros and A range, but if you want to make an array out of some certain values. So I see here that we can use the array function and it converts some other Python object into the array. So for example, this first one takes a list and turns it into an array. And it will look at it and infer the data type. So here it will probably say it's integers. It will see it's one dimensional with three elements and then create that. And yep, there we go. Yeah. Then we move to a two dimensional array. So syntax is similar, but we need additional brackets. Yes. So these extra brackets, basically it's a nested list object. And that's how it knows it should be a two dimensional array. We need to carefully get them correct. That looks good. At least I think it looks good. Can we look on it? Yeah. Yeah. So having two rows and three columns. And we will work with these natural numbering of the elements because that's convenient to see where the elements end up. Yeah. So let's look at the shape and size, I guess. So the shape is always the dimensionality and the length on each axis. And size is basically the raw size of data. There's a good question in HackMD. So appending elements to NumPy arrays is slow. So yeah. So with NumPy you basically don't want to be changing the size of them that often. So part of the point is they're compact. So you allocate the exact amount of memory you need. And then changing that memory size takes a long time. And this is the same way with C or any kind of program. So you would allocate all the space you need and then go filling it in. If you're slowly accumulating things, then like the question says, I would use list to collect the objects and then convert it to an array. Okay. So how else can we make arrays? Yes. So there are plenty of options. And you can see here on the lesson, NumPy zeros, NumPy ones, NumPy lean space. And just to illustrate, I'll do the NumPy full. Yeah. And then I go for a square array. And I set all the elements to seven in the beginning. Yeah. Okay. So I guess the point here is that for any kind of function you might ever need, NumPy probably has one that creates it for you. Or you can compose it out of these different kinds of things. You can save and load NumPy arrays. Yes. And this is very core functionality. Okay. So I will here use mp.say and have a file name x.mpi. Then I store the variable a. Then I read back to another variable name x with the load command. I guess it comes back. It comes back with the same values. Yeah. If you trust me. But we have declared a to 1, 4, 7 above. Yeah. Okay. Should we... Yeah. There's the conversion. So this dtype argument tells you the type of an array. Yes. So I work on a instead. a. dtype. Yeah. It's an integer. Yeah. And we can convert it to a different type by using the asType function. Let's go for a float. Oh, a. Yeah. Yes. Yes. So you had the d was declared about declared as a boolean. And here now for a we converted it into the real numbers floats. Yeah. And we see it's float because there is the period after the numbers. So we will come to these exercises quickly. But we'll do it at the same time as the next section. So we have a quick section on array math. So this is what we used in the example. So basically an array or a way to we'll do these operations in a vectorized form. So while Johan's doing this, there was a good question in the HackMD. Why is the NumPy calculation much faster? And that's basically because here, so with the non-NumPy1, we had the for loop. So that means Python is looking at every element and doing something. But here, there's just this one operation. And then it runs and then like sends it all to the C, the C code of NumPy, which can use all of the fancy like processor function say, I have these 10 megabytes of numbers, I need all of them added, and it will just go and do it in whatever optimized way. Yeah. And you see these kinds of functions, there's many different ones here. For basically all the basic things you might need. And there's both syntax forms with things like the asterisk on minus sign, division sign, and also functional forms, where you call them. Are we ready to go to exercises? Yes, more or less. I just want to show here that with A and B declared, then having C and D both being additions, the result actually came out the same. Yeah. So there were two ways of performing this addition. I see. Yeah. Okay. So now we will scroll back to exercise one, which is here, where you will play around with the data types. And then further below, we have the exercise two. And for this, you will be working for, I think we said 15 minutes. 15 minutes, yeah. So that means we're going to 10, well, to 32. Yes. Okay. And we'll put the info in HackMD. So explore these things. It's probably not too difficult, but you can check around some of the other functions, the other documentation, and read the HackMD things, which have a lot of interesting questions. Okay. Thanks a lot. See you in 15 minutes. Hello. We're back. Let's see. Let's take a look at what we've got here. There's some questions about the exercises. I'd propose we continue to the next exercises, and then we can do a big Q&A round at the end. What do you think, Johan? This is something I would like to proceed. Okay. So you've got your screen again. So now we've got indexing and slicing. So once you have erased, one of the main things you might want to do is select things out of there. So basically, instead of like with the for loop where we would go and select like the first element, the second element, and so on, you can do things like select whole rows or columns or even more advanced things. And by combining these selections and then the different functions, you can really do most, well, maybe we shouldn't say most things, but many things you can do in Python with C-speed. And this is really the key to everything, so to say. So Johan, would you like to give some of these examples? Yes, I will. So we start off with an array to dimensional one. We test the size 4 by 4. So as you can see here, I use a range 16. Then I reshape to a 4 by 4 form. Yes. So if we print this, it's, yeah, 16 elements. Okay. So you'll try to select the first row. Yes. And this is with one argument. Get the first row. Using a leading colon, comma, and then a zero. I will instead get the first column. Yeah. And in that syntax, the comma separates the different dimensions, and a single colon means everything along the axis. Okay. And can we combine these two together somehow? Yes, we can. So I will now cut out the middle 2 by 2 array doing 1 colon 3 comma 1 colon 3. Okay. So this is selecting from both the dimensions, the second and third rows and columns, because indexing starts at zero and the end point is not there. So, okay, we've just selected a little sub array. What about this next one we see? This starts getting sort of interesting, doesn't it? Yeah. So let's see what happens. I simply type it. First, it's 0.1. And that looks like first row, second column. And then one, one looks like second row, second column. And what do we get? One and five. One and five. So I'll bring up A, the whole of A again. You can see. Okay, interesting. So we can see here, it's this one, this five. So coming here, just like it's a column, but it's only two elements of this column. Yeah. So it was like a list of coordinates, and you would select this and this and this and this. So that's sort of interesting. And you can combine it in some powerful ways later. What about this boolean indexing? Yeah, this is really interesting. So create idx equals, then we have a condition here, A equal A larger than zero. And what is index? So it's like, yeah, it's like a bitmap kind of thing. So it shows true or false in an array of the same shape where the condition is true or not true. And now we can slice by this, I believe. Yeah. So I wanted to highlight that we can do, I do here larger than five. Yeah. Okay. Let's see what we get. Oh, we get false for the first elements here because they were not greater than five. Makes sense. Yep. So we need to come to the exercises. So we move on now to coming to the universal functions, which are important. Yeah. So I mean, this is something, this section is sort of to make you get familiar with the idea that this exists. And there's actually a whole lot that's very uniform here. So there's sort of this generalized way of writing these things. What am I trying to say? So once you get into NumPy, you find there's this beautiful consistency behind how everything works. And it's really, yeah, it's really nice. And part of that is these universal functions. So they take one, two or three input arguments, like for example, when we add it takes two input arguments, it has an optional output argument. So basically instead of making a brand new array to store the output allocating the memory, you can store the results of the operation into an existing array. This saves copying data excessively and can basically give you even further speed-ups. And well, there's this full reference here you can see. Yeah, I think, yeah, I think in the interest of time, we will just highlight one cool feature. So I have the array A and B and let's see what happens when we execute this. Yeah, so here we see there's a two-dimensional array and a one-dimensional array. And they are, you can add them together, which I think doesn't make sense. Like, how can you add two-dimensional one-dimensional? These shapes aren't correct. You can't do it element by element. But this is something called broadcasting. So basically it says, okay, the vertical access is too short, but it has only one element. So we expand the vertical access and duplicate it to make them be the same size. Yeah, okay. So this is something we can't, we just really don't have enough time to go into more details about. But I'd recommend during this next session, if you have time, read the manual page on broadcasting and try to play out, play with some of those different things you see there. It's really amazing. So array methods is the last thing. Yes. So here we have some rather important functions. So I typed the whole code here. So, okay. So now it was, the output was here from the very last statement here. So what did we do here? So we see X, can you show us X? Yes, I'll show you X. Yes, X is an array with three rows and four columns. And the last statement that I used was this one. Yeah. So we calculated what is the maximum element if we go along X is equal to one. So what is now X is equal to one? Let's see. Yeah, if you look here, we see that we're going along the rows. Because what we got here was three, seven and 11. That is what came out. Had we written X is equal to zero, it would instead have gone along the columns to fetch out the maximum elements. Yeah. Okay. And you'll see the same, these like X's arguments, the same for some mean standard deviation, all kinds of different things. Yeah. So we can highlight here, just announced that we have further below in the lesson, have linear algebra and another advanced math with many links and some nice exercises. So please have a look on them at the later point. And now for the coming 15 minutes, the idea is that you get going with exercise three and four. 15 or 10 minutes. Yeah. I think you keep it to 15 minutes and then we have the concluding discussion after that. Okay. I was going to propose make it a bit shorter and have more Q&A after. Well, let's yeah, let's do 15. We can come back early if people seem to want to. Okay. So exercises three and four. Someone will add it to HackMD and keep asking the questions we will answer. Yeah. We'll answer them when we get back. See you later. Hello, we're back. So we'll have a quick Q&A session and wrap up before going to the last part of the course. Let's see. So there's lots of good questions in the HackMD, most of which are already answered pretty well. There were several we wanted to highlight. Let's see. Johan, do you remember what the first one was? Yes, there are questions here. Are there any rival libraries to NumPy? Oh, yes. Yeah, that's a good question. So I think NumPy, so one doesn't have to only like use the NumPy operations, but sometimes it's used as just storing the different like the data buffers and then can pass that on to other packages that use it. So even things which are like rivals or do like extend, do other numerical work, they often use NumPy in the background as the thing that stores the data and then they're operating on it. So like there is more and there's probably more appearing now, but NumPy is really this very central thing and the concepts you see there extend to everything. It will be interesting to see how things develop in the future. Let's see what was our next question. So just a question here. So we have a two-dimensional array and we select the column, then what is printed is something which looks like a one-dimensional list. And what one can say here is that in NumPy we don't really have the concepts of columns and rows. So when you do this operation, you reduce from a two-dimensional array and you come to something one-dimensional. Yeah, like MATLAB, everything is a matrix and there's no concept of one-dimensional thing. Like it's always somehow either a row or column, but Python is this very expansive idea. It can be three-dimensional, four, whatever, and each operation defines how it affects the dimensionality. So when you select just one of the axes, then it reduces the dimensionality by one and so on. Which I believe is similar to how it's working in Mathematica. We also have the notebook workflow that some of you might have been using. Okay, we've got about a minute left before our scheduled end time for this episode. Was there anything else that looked important to answer? Yes, there's a question on hardware here. So the question is, I noticed that NumPy is way slower at AMD processors than on Intel. Has anybody tested NumPy with the recent Apple ARM M1 processors? That's a very good question. It's open-ended and there's a need for exploring this, which you cannot do now in session. So I would like to do the observation that it's interesting now on the side of CPUs because we have Intel. We have the new range of AMD processors and ARM. And working at the Supercomputer Center, I'm very aware of that AMD for instance are very on target on trying to see how can we get the side performance as Intel is getting on their hardware. So this is a very nice competition here. Yeah, I guess a good thing about using these big libraries is that it's not your problem. So when something is slower, there's plenty of other people who will work on improving NumPy or whatever it is to make it as fast as necessary. And then you get those benefits eventually. I started looking into this and maybe I can give what I found which is not much yet. But basically the important thing is that NumPy uses some existing big numerical libraries in the back end. So it's using, it might be using a library called MKL, which is heavily optimized for Intel CPUs. There is also another one called Open Plus or just Plus BLAS. And if it's using that then I would expect a much smaller difference between Intel and AMD. But yeah, I'm not sure what the person who asked the question has installed and depending on the operating system, it might be easy or complicated to change. Yeah. Okay. Well, thanks for that. I'm glad someone had time to research that. So I guess we're coming to the end of the course. And the main points for NumPy here is basically that so this is sort of the one of the core things across many of the packages. And yeah. And like the concepts, there's not many concepts here. There's slicing, there's functions, there's the data types and shapes and so on. And these you see across, well, many, many different things. And there's plenty more to read and you can read the different manuals that are linked and see more. So I guess we should go to our break and we'll resume at 12 past hour. Any other comments, Johan? Yes. So stretch your legs and fill up your coffee cup or tea cup and see you then post. Yes. Okay, see you then. Bye.