 So, yesterday we went over quite a few of the basics of Python, covering classes and lists and dicks and the standard data structures that are available in Python. And today we're really going to shift gears and focus much more on the scientific aspects of Python. And there are kind of several layers to this, I guess, and the first one is NumPy. NumPy is a library that allows you to do vector operations on arrays in Python. There are a few other layers that we'll talk about. There's SciPy, which really sits on top of NumPy, and is a set of algorithms for things like ordinary differential equations, signal processing, image processing, optimization, things like that. So you can think about NumPy as really the data structure and the basic capability to do math operations. And then SciPy as the domain specific tool kits that you might use in an application. All right, so NumPy, which is the base for all of this, it's available at numpy.scipy.org. And a number of you had said you had used MATLAB. It really is a tool kit that offers a library that offers MATLAB-ish type capabilities within Python. And as you look at legacy applications that are written in Python, you may notice that some of them are still dependent on two older packages, one called numeric, one called NumPy. Numeric was written by Jim Hugenen out of MIT now at Microsoft. He is also written Iron Python and Githon, aspect J. He wrote a nice tool called numeric, but the code base eventually, after a lot of people had dealt with it, got fairly crufty. And it was very hard to do some of the things that the community wanted to do. One of those was to be able to subclass from arrays. And there were a few other things that specifically the people at Space Telescope wanted to do, Space Telescope Science Institute wanted to do, to deal with large arrays and to deal with what they'll call record arrays. And I'll talk about those in a minute. And that became num array. They wrote that sometime in 2000, the problem was it worked for very large arrays that they were dealing with, but it wasn't very fast for small arrays. And so Travis Oliphant started, I don't know if it was 2004-ish probably, maybe early 2005, and started working on an attempt to unify these two, to get the capabilities of the num array, but the small array speed of numeric. And he was successful and release NumPy late last year was the 1.0 release. And it's been very successful. It really has now kind of unified us back into having one package to do these basic things, which is very nice. And that's what we'll be talking about today. It is a robust community. Travis is definitely the main committer and still continues to be on the NumPy tool set. But there's also a large set of other people that contribute around the edges. It's also quite well used. These are fairly old actually, but probably six months old. But there were 16,000 downloads per month on SourceForge about six months ago maybe. And that doesn't count the downloads of these other things. So it's a widely distributed, widely used tool. So how do you use it? These slides are set up. If you have a laptop and you have the iPython there, they're set up for you to kind of work along with me here. If you import NumPy from the command line, and you'll note that I use the carrots here as if we're at a standard Python prompt. But you can replace that obviously as the note down here says, just with what iPython has on its command prompt. So if we import NumPy, the demos I'll be showing here are from a version 1.0.2. And I'm not sure if that's the latest release or this is a dab version. But the 1.0.2 release won't be any different than what I show here. And we may be to 1.0.3 by now, I'm not sure. So in my examples, a lot of times we'll use the from NumPy import star. That's what I use on the command line when I'm working on things. If I'm writing a script, the approach where you do from NumPy import and then list out the items that you're going to import. It's really critical for allowing other people to understand your code knowing where functions are coming from. So use that approach as much as possible all the time when you're writing scripts. On the command line, if you use this ipython-pylab to start up Python, Prabhu talked yesterday about having event loops when you have a plot or a GUI up and you still want to work on the interpreter. Both of these, while the plot's up, you don't want it to block you typing. And so, PyLab, they've worked very hard to make sure that these two event loops can coexist peacefully. They don't stomp on each other. And so the other thing that PyLab does is it imports quite a few functions that are common to really the MATLAB community and there'll be things like plot and commands like that that are very standard. Also it goes ahead and it does the from numpy import star as we see up here. So that all of those things are just available at the command line. So if I just go over here to an ipython prompt. So if I type in here, I haven't imported anything new. Functions like a range that we'll learn about and linspace are all already there for you. So you don't have to go through the import process. It's kind of handy when you just want to pop up something and start working on it. All right, so what does numpy do? Here's the first example we'll show. Simple array math is illustrated here. And remember if you, yesterday, the way you create an array here is we have an array and then we're just passing in a list. It's almost like a cast operation or it's a function that converts a list to an array. Yesterday when we looked at adding two lists, you remember that we had a concatenation of the two lists. So it would just add one list to the end. That's not what happens with arrays. Arrays do a different sort of operation. It's an element by element math operation. And it's always an element by element math operation with numeric arrays. Except when you get to special objects like matrices. And I'll show you those later. But remember that unlike other tools such as MATLAB where if you have a multi-dimensional array multiplied by a vector, you get a matrix vector multiplication. Numpy doesn't do that. It always does with the basic operations, an element by element add or multiply. So here you can see each of the elements have been added to each other. And we can just come over here. A is equal to array 1, 2, 3, 4. There's A, and if I just add those together, there you go. Nothing too fancy there. So on the math, if you want to create an array, there's a handy function called A range. You remember yesterday we saw range operations where we could take the range and give it a value and we'd turn a list. We said range 10, it would give us 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Well, A range works in a very similar way. You can hand it a value. Here we've handed it 11. And it's going to give us an array from 0 to 10. The one difference here is note that I made that 11 dot. That means that's a floating point number to Python. And so A range is a little more, it has a little more information that it can use. It uses that type to determine what kind of array it has. A few minutes we'll talk about the fact that arrays aren't just a single data type. They're not just floats or they're not just ints. You can have a whole wide range of different types that are available to you when you make your array. In this case, A range is going to become an array of floats or double precision floating point values. And pi is also imported from numpy import pi. That's a constant. You also have E that's available to you as the two constants that are available in numpy. So we can say 2 times pi divided by 10. That becomes a scalar constant value. Now we can multiply A times x. And when we do that, we get an element multiplied times that scalar. So each value in this array is equal to the multiplication of that times that scalar. So what did I have? I have A equal, oops, we'll do x. So A is equal to 2 times pi divided by 10. Now A times x. So that's the array that comes out of that. So just each element in that array was multiplied times the scalar. There's also the option to do an in-place operation here. We multiplied A times x and we created a new array. x was not modified. Sometimes you can get away with actually modifying. You don't care that you keep x around. It's not important to you in the future. And so there's the multiply equal operator. And there's also plus equal, minus equal, all of the operators that you would expect for in-place operation. And here we end up with the same value, but it's stored in x. There's also a host of other functions. And we'll get to know a whole lot of them later on. But sine is an obvious one. There's cosine and arc sine and those values. And if you pass a vector in to those, then they do every element in that array is called with that function. And so here, if we do sine of A times x, it's the sine of each of those individual elements. In iPython, when you've done the pi lab, the plot commands are imported from that plot lab. So I said Prabhu will talk more about it. But if you are following along, I just wanted you to have these commands. You can sit here. And now we can do plot x and A times x. Oops, what have I? Not a very interesting plot, slightly more interesting. You can plot those. You can kill these plots and then bring up new ones. You can overlay things on the plot. So there's quite a bit of capability for inspecting your data as you're manipulating it. There's also the same set of functions available or a subset. It's actually not as full-featured in the Chaco shell. And that set of functions is growing over time to be very similar to the Matplot live set. Arrays have a whole lot more capability besides just being able to add them or call functions on them. So we'll start looking at some of the attributes that an array has. So here we say A is equal to array. We created our array. Now we want to check what is the type of it. Well, the type of that item is an array. That's what we created. However, that array also has a type that it specifies that each of the elements in the array has. And so we can ask, what is your D type or data type? And the D type here is in 32. And so in 32 corresponds actually to the integer type that Python uses to store its integers. And so whenever you create an array with the same values here or with using Python values and you don't try to overload what the type is that NumPy uses to create that array, it's going to default to whatever the standard Python type is. You also have an item size so you can say, well, how long are each of the elements in memory? So this tells you, well, each of these elements are four bytes long. And you can ask for the shape. You can either ask for the shape as an attribute. A dot shape is going to return to you the shape as a tuple. And so if we have our x here, x dot shape, returns that there are 11 elements in this array. And it's a tuple because we handle multi-dimensional arrays. We happen to have a one-dimensional array right now. And so it just returns to you one element. If you want to know what the size of the array is, now the size of the array is critical for just telling you what's the size of the entire or the number of, let's see, this is the number of elements in the array. So in a single, in this array, it's really obvious. It's the length of the array, right? If you have multi-dimensional arrays, there's a really quick way. You could either ask for the shape and take the product of all of the elements in the shape to get all the rows and columns and find out how many elements. This is a really handy function for doing this. Oh, one other comment, there are two types of this. You can either ask for the attribute dot shape, or you can call the function shape. Shape is there somewhat for historical reasons. In the past, that was one of the main methods of asking for the shape of an array. But it's also useful in that shape will handle any kind of sequence that comes in. If you hand it a list of nested lists, then in each of those lists is the same size, so you may have a list of four, four element list. Then it will return to you the shape as four by four. And so if you want to handle different data types that may be handy. So the number of bytes used by an array is also important if you're trying to manage your memory or know how much memory you're using. So you can ask your array, well, what are the number of bytes that you're using? And this will just be the item size multiplied by the item size here multiplied by the size. Well, maybe we'll touch on later the fact that if you actually looked in memory, there's actually more memory than that allocated for a numeric array, or a numpy array, because there's a bit of a header that explains how the memory is laid out and then the data. And this only returns to you the size of your data, how much memory it's taking. That overhead is small, especially if you have gigabyte arrays or large arrays. So the number of dimensions is the indim. In this case, we have one. We can ask for a copy of the array by calling a.copy. And so this will create an entirely new array that you can manipulate. And then there's some methods for converting to lists. So a.toolist will convert our array back into a list. You can also call the list method on that. The toolist is actually more featureful when you get to multi-dimensional arrays, because it will convert a to an actual list of a list. Whereas the list a will not do that. It'll take each row, and they'll still be arrays. And you'll end up with a list of arrays. All right, so arrays like lists can be indexed. And they use the same operation. Here we're using the square brackets. And so a of element 0 is going to pull off the first element in the array, or index 0. And we can see that that's a 0. We can also set the value in the same way. So here we've assigned 10 to a0. And that got assigned into our array. Nothing too exciting there. There's also a fill command that's sometimes handy. You can say, listen, I don't want to go through this whole array and set every value. I just want to fill it in with a single value. So we call a.fill. A is filled with zeros. You can also use, you remember, the slice assignment operator, where if we use a open bracket, we don't specify the beginning element. We don't specify the end. That means the whole slice of the array. We have a slice. We assign in 1. All the elements in the array are equal to 1. There are a few gotchas here that you have to pay attention to because NumPy supports a lot of data types. And so the first one is, imagine we have our array. It's a d type int 32. And now we come in and we try to assign in 10.6. Well, we've specified that this is an array of int values. So NumPy has to make a decision here. One of the decisions it could make is it could convert that whole. Well, one could be it could raise an error. So you can assign a float into here. Another option would be for it to convert the entire array to 8 by floating point numbers. And so all of a sudden, you've changed the data type and you've expanded your memory by 2. A third approach would be to have the array win, so to speak. And if you get assigned a scalar into the array, the array keeps its data type and it just casts the value to its type. And that's what NumPy does. Here we have a 10.6, which is a floating point value assigned into our array. And the first value you see gets truncated to 10 here, instead of being 10.6. So be careful about this. You're going to run into it at some point. If you're using this heavily, this will be one of the things that get you. And just know that it's there. The same thing happens if you try to fill in with a float. It can't do it. It's going to truncate the decimal part and put that value into your array. 1D arrays are covered there. Let's look at multi-dimensional arrays. And this is one of the really powerful capabilities that NumPy adds to Python. And so we have here A is equal to an array and we've handed in a nested set of lists. So we have a list of lists. The first list is 0, 1, 2, 3. That's going to be the first row of the array. The second row is going to be 10, 11, 12, 13. So now that we have that array, we can ask it for its shape. And its shape now is going to return a tuple of 2 comma 4, which says that there are two rows here and four columns. You can also ask for the shape in the same way. You now see that the size is equal to the product of the shape values. So if you just did 2 times 4, you're going to end up with a size of 8. And that we indeed do have a two-dimensional array. You can get and set the element sizes using indexing. This is different than previous. Now you can index with, actually, that's a tuple in there. If you do, just at the command line, if I say 1 comma 2, that creates a tuple, right? So the reality of what's happening here is 1 comma 3 creates a tuple that's handed into the indexing method of the array. It sees, ah, this is a tuple. This must be a multi-dimensional indexing. And then it pulls the individual elements out. And here we're asking for the index 1 row in the index 3 column. And now we can set that value. A 1 comma 3 is equal to, we retrieved the 13. Now we can set it to negative 1. And we have a negative 1 down here in the corner. If we only index multi-dimensional arrays with a single element, NumPy doesn't throw an error. Instead, it says, ah, you must want, in this case, the first row of the array, or the index 1 row. So if you leave off the secondary indexing, we haven't specified a comma 3 to pull out a specific element in a column. We've only indexed handing in the index of a row that we want. And so it pulls out that row for you. This is a really useful feature if you have a large set of, say you have a time domain series. You've taken 1D data sets at multiple times. You can just have the first one on the first row, second on the second row, all the way down. And then if you want to pull out the 30th index 30 signal that you read, then you just say a bracket 30. And that will pull that whole signal out, and you can operate on it. So it's a very handy way of doing that. This will work. You can extrapolate this to larger data sets. If you have a set of images that you've taken in a series, then you can arrange them the same way. It's really nice. Maybe you want to take a time, a slice through the time domain. Or if you want to get, how is one pixel varying over time, then you can pull that out easily. Or you can say, listen, I just want this image. Now I want the next image. Now I want the next image with these slice operations. Slicing is one of the other killer things that's available in NumPy. And it's actually been enhanced significantly in NumPy versus what it had in Numeric. So we'll talk about that in a few slides anyways. This slide shows kind of standard indexing operations. And so we have the color over here denotes if you had the orange slice here, you can look over here and see what it's chopping out in your two-dimensional array. So here what we're asking for is the 0th index row. And we're asking for the 3 through the, this is inclusive on the lower end, exclusive on the upper end. And so we do 0, 1, 2, 3, and the 4th. And you don't include the 5th item. The way that I think about this is you have these indexes. The 0 index is here. The 1 index is actually in between these items. And so if you think about it that way, what you're saying is I want the elements between 3 and 5. Well, we have here 0, 1, 2, 3 is this line. 4, 5 is this line. So we're asking for the elements in between the 3 and the 5 or the barriers between these elements. So that's sometimes helpful if the exclusivity on the top end confuses you. So again, we can use the implicit indexing that NumPy has in that if you leave out the upper element here, we're going to grab just the 4 through the element 4. We start on the 0, 1, 2, 3, 4, this line, and go to the end because we didn't specify an endpoint here. And the same for the columns. We're specifying 4 through the end on columns. So we end up with this blue area down here at the bottom. And we just showed chopping out a column exactly as you had seen before. No low, no high. So that's going to get all the rows and index to column. Now there's another capability of actually providing a stride. You can do this on the list as well. A is equal to, we'll just make a list, 1, 2, 3, 4, 5, 6. And if I ask for A and I do, this is saying, well, we'll do 0 through 5. Or I'll do 7 just to get to the end. And then 2. So this is saying, give me all the elements between 0 and 7. And give me every other 1. That last, the 2 is saying, that's the stride. How many do you want between each of the elements that you're asking for? So if we ask for that, it gives you the 1. It skips 1. Gives you the third one. Skips 1. Gives you the fifth one. All right. So you can do this without providing the upper and lower. And this is saying, just slice the whole array. I want every other one. And so this is a very easy way if you have A range 10. If you have an array of those, if you want the even numbers, you can ask for the even numbers by slicing. If you want the odd numbers, you just offset start at 1 instead of 2. Skip the upper value. That will give you all of the values offset by 1 at the lower end. So we can use those stride operations here as well. Here what we're saying is, I want you to start on the index 2 row. And I want you to go to the end of the rows. And I want you to skip every other row. On the columns, I want you to start at the beginning of the columns, go to the end of the columns, and give me every other item. And so here we've clipped out. We've specified start at this row and pull out this row as well, and then do all the columns down. So we get all the columns. There's one difference between arrays and lists. That can be a gotcha. So if I have A is equal to 1, 2, 3, 4, 5. I have that array. Now I'm going to create a slice by slicing out A. And we'll get the first three elements of that array. Now if I come in and I change the first element of B, so I've changed B, A is not changed at all. However, if I do the same thing, so now A is an array. I do B is equal to A, and we'll slice off the first part. And if I come in to B, and I set that value to be 20. So now B is that A has been changed too. So slices on arrays are views into the array, or references into the array, instead of being a copy of the array. And this is a gotcha. I mean, you have to pay attention to this, but it's also a feature. If you're slicing arrays a whole lot, then, well, two places it comes in handy. One is if you're slicing arrays a whole lot in an operation, maybe doing a differencing algorithm or something like that, where you're trying to subtract some offset values from some other values, then you're going to take the same array, and you're going to slice it two or three times within this slice plus this slice plus this slice. Well, if you made a copy every time you did that, it would be very expensive. There are ways to get around that. Copy on write is one of the methodologies for handling that. It's more complex. But there's also a nicety here in that you can refer to an element of an array with a new name, a different name. And so imagine you're doing some finite difference time domain scheme where you have an array of values, and they're going to be evolving over time. Well, you have this large array of values, but what you want to do, you can't store out the whole array for the million time steps you're going to take. It's going to take forever, or it's going to take up your whole hard disk. So what you can do is describe a view of a center section of that, and just hand that off as a probe or something, name that a new value, and then say every time now you can say after you've done your time step you can say probe to the disk or save it to an array. And you've just chopped out a little area. And this is referring to that original array. You don't have to keep a handle to the original array to be able to do this. So there are a lot of capabilities as you play with things where that becomes a nice capability. Just be careful with it. So fancy indexing. This was something added by num array, and wasn't a numeric, wasn't num array. Now it's a numpy. So if we look at this, we have an array A equal A range 0 to 80, stepping by 10. And so our array is we have a picture of our array down here. What if we want to pull out the second and third element and the third from the end element out of here? Well, there used to be this function called take down here that you could use. It's still there. It's available. You can say, listen, I want to take the index 1, 2, and negative 3 elements out of the array and assign those into Y. And it would indeed do that. Now there's this really nice capability in numpy where you can actually just index using another sequence here. Or it actually has to be, it can't be a tuple, because tuples are interpreted as we talked about just a few minutes ago as being the rows and columns type approach. But here if you pass in a list or another array, then here, and that's what we're doing here, A is indexing with 1, 2, and negative 3. That's going to pull out those indexes and put those in Y. I just told you that indexing into arrays created views, right? Well, here's the exception to the rule. If you're using fancy indexing, you don't create views. You create a new array. And the reason for this is that numpy, I talked about the little header of information that describes the layout of your memory. Well, that header of information has information about where the byte offsets of each row are and each column are in your array. But it only stores one value for the whole array. And so that means that every row, if it's 40 bytes to move in memory from the first row to the second row, you can always think about memory as a sequential list of things, even though we think about it in a 2D sense as a 2D block. So really, you had this 2D block that strung out. If we're 40 bytes between each row, then we only have one value to specify that. Well, if you look at this, if you want to specify the distance between each of these values, it's easy when this is a contiguous array where just four bytes each place. But if you come in with this indexed array, you're in trouble if you want to refer to the same memory block, but refer to a non-uniformed memory distribution there. We'd have to store, in fact, an array that had all the offsets for individual elements. And so NumPy doesn't do that. So in this case, you're going to get a copy, because that's the way that it can handle it. Why is a copy? We can print why here. So this is if your list or your array that you're passing in is of type integer or, well, of type integer. There's a special type, and we'll get into this in a little bit in a few minutes. But you can specify that, no, I don't want this array to have type integer. Here we're saying that it's a Boolean array. And Boolean arrays are treated differently. They're treated as a mask. So they have to be the same length as your original array, and you can index in A given this mask, and it's going to choose the elements out of the array that have a value of 1 in the mask. So this mask will index and pull out the individual elements just like the other. So fancy indexing can be used in two dimensions as well. And we, again, highlighted the different colors. We'll show you what sequence is picked out of the array over here. But here we've said I want the rows 0, 1, 2, 3, 4, and I want the columns 1, 2, 3, 4, 5. And that's actually just pulling out a diagonal. Because what you do is you say the 0 goes with the 1. The 1 goes with the 2. So 0, row 1, column 1, row 1, column 2 gets the 12. And so you just take these basically pairwise. And those are the elements that it will pull out. And that's a nice way of actually being able to grab a diagonal from a matrix or from whatever you're doing. You can mix our indexing schemes. Here we have a scheme where we're using the normal slicing operation. I want the index 3 row up to the end. And I want to get the 0, 2, and 5th column. And you can also use a mask in the same way. I want to use this mask for the x or for the rows. And I want to index 2 column. And that's going to pull out the 2, 22, and 52 for you. So it's fairly handy. Fancy slicing can do quite a few interesting things. And so I encourage you to play with it as you're working on your problems. It can actually make things that end up being fairly verbose if you're having to loop over things condensed into a single line. And sometimes that's a bad thing. I mean, you've probably all seen the Fuscated C contest where they take algorithms that should be 1,000 lines and they condense it into a single line. It's impossible except for the writer to read. And he probably can, or she can't read it a day after they wrote it. So you don't always want short. That's not always the goal. However, if it's still readable, and I think after you learn what you're doing here, this is still fairly understandable. It's a nice feature to have an expressive language that also is readable. That covers the basics of indexing. There's also the capability if you have an array and you want to change the dimensionality of the array. So here we have a is equal to 0, 1, 2. Now what happens if that's a one dimensional array? If I do a is equal to array 0, 1, 2. And I'd ask for its shape, but only as a shape of 3. It's one dimensional. What if I wanted to actually be a row for some reason? I want it to be a 1 by 3 array. And there are various reasons you might want to do this. We'll come to a few in just a second. Here we have a 1 by 3 array. And the way you do that is if you can, in the indexing operation, indexing treats the none value. None is defined in Python. And that means it's kind of like a null in C or C++. There is nothing here. So the none operator, if you pass that in, Python treats that especially when it's indexing. And what we're saying in this statement is I want a single row and I want all the elements of A spread along the columns. And so here we've injected. It doesn't look any different, as far as the picture. But if you ask for the shape of this array, it's just going to return 3 comma to you. If you ask for the shape of this array, it says I have one row and I have three columns. So if you come along and you want, well, I don't want to spread these along the columns. I want to make one column. Then you use the none here as the second item here. And you say, so you're saying I want one column and I want all my values splatted down the rows of the array. And so now we have a shape 3, 1. And you can do this as many times as you want. If you want to add more nuns, then you're saying I want this along the 0th dimension and then I want an extra two dimensions of my array. And so we end up with a 3, 1, 1 array here. Just to point out that the slicing operations work in three dimensions, you can do exactly the same thing that we've been doing. If you come along, we've created an array here. Or we have an array A that's three dimensional. You can think of these as images or whatever you would like. And then what I've said is I want to pull out of that. Give me all of the rows or all of the, it's hard to say rows and columns anymore. You kind of have a depth. These are actually the rows and columns. And this is the depth along the array. But I want all the values in the 0th index. And then I want all the values along the first index. And then I just want the image or the slice 0, 1, 2. So the third element or the index 2 and then 2 from the back. And that will pull out these two arrays. Say you can use take. It's kind of old style. If you're old school, you can use that. All right, flattening arrays. Sometimes we have an array here. Let's see what I'm going to do. Here's a trick that's kind of handy that I'll use in a few places. A is an array. If I want to make that a multi, let's make that a slightly larger 16 element, 20 element. So A is a large, fairly large array here. If I want to make this multi-dimensional, I can just assign to the shape. So if you ask for the shape of A now, it's 20. I want this to be four elements with five columns. It's now relayed out in that form for you. So fairly slick capability here. The one requirement here is that the product of these values is equivalent to this. If that's not true, it's not going to work for you. So if we have an array like this, well, I've just made work for myself because I'm about to flatten this value and this guy back out. But a lot of times you have a two-dimensional array, and you actually do want to flatten it out and treat it like a one-dimensional array. Well, there's the A.flatten method. There are actually a whole lot of methods here. So you can just call A.flatten. That will flatten the array out. And flatten is a method that always creates a copy. So you're not referring to the original memory. It creates a new array. So changes to that array aren't going to affect the original. There's also an A.flat attribute that you can access on array. And if you modify the values in this, then you are. This is a view into the original array. It's actually what's called an iterator. A.flat doesn't return really a new array. It returns to you an object that allows you to access elements of the array. And so it largely acts like an array, but it's not really an array. It's an iterator. So sometimes you want to flatten them out, be able to change values in the new thing. This will allow you to do that. There's another method. So we have three ways to do sort of the same thing for different problem sets. And Ravel does the same thing. Well, it tries to do the same thing as flatten, but give you a view. So I guess it tries to give you the dot flatback. But if it can't give you an actual array instead of an iterator, it's going to go through, and it's going to make a copy for you. So this returns to you a new array. It's not an iterator. It's actually an array. And if it can refer to it safely as a view, then it will. If not, it gives you a copy. And the issue here is if you have discontinuous arrays, Ravel tries to be as efficient with memory as possible for you. But if it can't, it always returns to you something. If you ask to flatten an array that's not able to be flattened, it's not going to flatten it for you. It'll give you an exception. I do. I go into, I guess, right here. One of the issues here is if you transpose an array in Python or in NumPy, so a transpose of a two-dimensional array is taking the element from here and putting it down here, taking the elements along this diagonal and putting them on the diagonal below the mid-diagonal. So it's just flopping all the values from the top to the upper triangle to the lower triangle and vice versa. Well, there are a couple of ways to do that. The naive way is to go, OK, somebody wants a transpose. I'm going to go through this in-by-in array and just move all the elements in memory. Well, that's fairly time-consuming to go through that process, right? So NumPy, you remember we talked about that little header of information at the top that has stride information about how far values are between rows and columns. Well, it just goes and changes those values around appropriately. And so instead of having to move hundreds of thousands of elements, it moves too. So it's much faster. But what happens when you do that is you end up with a discontinuous array. And so now you don't have memory laid out where you have one row and then the next row and then the next row. It's kind of one row or one element of a row and then another element of a row and then another element of a row. And so if you're trying to refer to those elements as a 1D array now instead of a multidimensional array, you don't have this nice indexing. The elements may be spaced in strange ways based on this process. So you can't use the flat. And so Ravel, as I show here, if you've done this, when you try to change an element of B, it's not going to affect A if you've done something like a transpose where you have a discontinuous array. Reordering dimensions. All right, if we transpose here, all there's a couple of slides back. We have an array and we take it, look at its shape. If we ask for the transpose of the array, as I mentioned, this isn't slow. It's not going to do the memory transpose on each of the elements. It's going to do an element by element. It's just going to do the transpose of the row and column offset values. There's also a little attribute here called A dot t. And A dot t will actually return to you a transposed array. That's sometimes handy when you're trying to have a short, concise mathematical operations in a sequence of math operations like plus, multiply that sort of thing. Now, again, the transpose is going to return to you a view. And so if we can change that value in B, it also alters the A value. There's a method called squeeze. And squeeze will, if you have an extra dimension, then it's going to remove that extra dimension, meaning a dimension that doesn't have any elements down that dimension. So here we have originally an A dot shape of 2, 3. And we've assigned a new shape. But we stuck in a dimension of 1 along the zero dimension, first dimension. So when we print that out, it looks a little bit different than what we would normally see. But if we want to get rid of that guy, squeeze it back down so where all the dimensions are greater than 1, then it will just pull that out. Diagnose, we've seen that fancy indexing allows you to get to a diagonal. You can also use the method diagonal to refer to a specific diagonal within the array. So we have, in this case, if we want the main diagonal of a two-dimensional array, we A dot diagonal will give us those values. You can also specify an offset to the diagonal. Offset equal 1, offset equal negative 1 to move to an offset on either side of the main diagonal. That's one way of doing this. The problem with diagonal is it returns to you a copy of the array. Because you remember about the memory layout, that's kind of a strange indexing scheme. It can't give you a view back into it. So there's no way to assign to an index or to a diagonal if you want to modify the original array. So now that we have fancy indexing, you can do the exact same thing with these index arrays. And they're both read and write. So if you ask for a value and you want to set all the values along the diagonal to be 2, then you just set, here we've said i is equal 0, 1, 2. We index, we can grab those values out, or we can set them. And then if you want to get the upper diagonal, then you can ask for that or set that and set the lower diagonal. So it's a handy way of doing the same thing. It may not be quite as readable by somebody else, but it does allow you to write to that value. And if other people have seen these things, then they're fairly readable after time. So we've seen this a little bit already. Array construction, if we're creating an array and we put in a decimal in here, then instead of this being an integer type, it becomes a float 64 type array. So I'll just go over. If I take a is equal to array, 1, 2, 3, 4. So we have our array and we ask for what are the d types here? They're integers, right? If we come in and we instead say, that's because we gave in a set of integers from Python. If instead I come in and make one of those guys a floating point value, then now all of them are going to have floating point is the narrowest type that's available. And the floating point type, since it's a Python variable, you remember I said there's 64 bit. Then NumPy uses 64 bit. Well, what if you don't care about maybe you've read your data in on a 16 bit a to d or something like that. You don't have all this precision. You don't need 64 bits to do your calculation on your data set. And so you don't want to waste the memory because you have a whole lot of values. If that's the case, then you can come along here and specify the d type is equal to float 32. And now that you've done that, then you see the d type is float 32. And instead of being 32 bytes for the number of bytes for your array, you're now down to 16. You just did a factor of two memory savings for yourself. So that's nice. There are also a lot of other types. Instead, if it's an integer, but you don't care. You don't want it to be 32 bits. You want instead it to be an unsigned 8 bit integer. Then you'd say d type equal u n to 8. And this is the unsigned integer and the bit width. And that's the type that we have. So a dot n bytes is equal to 4 now. So numpy supports a wide range of values. There's also, when you're creating arrays, there's a set of other flags that you can look at. And oh, so you can see them here. So you can specify the type code. It looks like suspicious. Yeah. So the ipython on my machine, when you import PyLab, grabs the old numeric version of these methods. And so you just have to do it from numpy import star. And you get the new versions of all these array operations. Here we have the copy flag. There are also some other flags that you can specify here. The order, numpy is very flexible on the order of how the memory is allocated. You remember I mentioned that the C version of allocation is usually to do, if you have a continuous bit of memory and you make a multi-dimensional array. The first row is in the first segment. The second row is in the second segment. The third row is in the next segment. If you do it in FORTRAN, it does exactly the opposite. It does column major order. So the first column is in the first segment. Second column is in the next. Well, you can ask numpy by specifying the order. You can tell it, I want to lay out my memory in the FORTRAN way instead of the C way. If you're calling a whole lot of FORTRAN routines, this may save you in some speed because you don't have to convert the format of your arrays, the order of the memory before you go into those arrays. There are a lot of data types. I think there are 23 or so. I'm 21, 23, something like that. So we've already seen a few of these. We've seen the Boolean type. That's what's going to be used for master arrays and that sort of thing. We've seen a lot of integer types. Note that if you specify just int as the d type, instead of int 32, int 8, int 16, it's going to use whatever Python uses for that data type. So for integers, that's going to be an int 32. For floating point values up here, for floating point, if you use float as the data type, then that's going to be a float 64 complex. That's going to be a complex 128. You can also specify other data types. So you don't have to use numeric data types, actually. This notion of slicing is actually fairly handy for a lot of other. I mean, it's a nice capability to be able to use the arrays this way. And so you can specify that you know I have an object. I have an array of objects. And I want to slice or I want to store just plain Python objects. If you want to do that, then you just specify the data type as object. And now it's just standard Python objects that are stored in that array. And that's one way to imagine you want to do all of these operations I've shown so far are floating point operations using the IEEE standard floating point units on the machines. And so they're not arbitrary precision, right? They have round off errors associated with them. Well, if you wrote a class that defined add and multiply and all these operations for an arbitrary precision number, then you could create an object array, put those values in there. And if you do an A plus B, it does an element by element add of all of those elements. And so it's going to do arbitrary precision math on those guys. So it's a way that you can extend or use your own data types in places. There's also the ability to put strings and unicode values into your arrays. So if you need to cast the type of an array, imagine we have an array A that's a type D type float 32. Now we need to upcast that to another data type. We can use the as array method and specify, well, I want to take A and upcast it to float 64. And as array, if A originally was a float 64, it just returns to you that same array. If, however, you need, if it needs to cast it or if it needs to convert it in the casting process, then it returns to you a new array of type float 64. And you can also downcast. There's also a method as type that will do the same behavior. There's a set of functions that are available for doing common operations on arrays. So the first one of those that will show as sum, imagine we have an array here of type float and we ask for the sum of A. Yeah, so this slide is slightly dated. So make a note right here that this is the old behavior from numeric, but the new numpy. If you ask for the sum of an array and you don't provide any extra arguments, it just does a sum of all. It's as if you called sum on the flat array. It sums up all the elements in the array. However, you can always specify an axis to specify which axis you want it to sum along. So if you do A sum of A and I want it to sum along the zeroth axis, then what it does is the zeroth axis is the rows. And so it collapses the rows into a single row or into a single array. So it's a reduction operation. It reduces the dimensionality by one. If you want to sum along the columns, you can supply axis equal negative one. There's also the ability to, oh, there's also the method A.sum. Returns to you the sum of that array. And you can also supply the axis operators here. All right. And there's also product. Are there any others on there that are added on here? There, well, we'll see mean and standard deviation and things like those. Yeah, there they are. There are quite a few extras. We'll stop here though. Take a tea break.