 So, all these operators that we've been talking about with add, multiply, divide and also the functions and or anything that is a binary operation. It takes two, two operands. They're what are called universal functions in numpy. And so they have these extra set of methods on them. So if we look over here on add, you know, you might think add is what's called a u-funk and you can take a, we have our array here add a comma a. It's going to add those two values together. That's exactly the same thing that happens if you take a plus a. But also if we look here, there are all these strange little methods here and I'm going to talk about four of them. This accumulate method, this reduce method, reduce at, we won't talk about that too much, but outer. So those four methods are special methods that are available on these operations. And so here's what they do. The reduce operation and this takes the operation, think about it, add is an easy one to think about. Add.reduce, if you like math, then you can look at the formula here and see what it does. It's just a summation of all the elements in that array a. Add takes two arrays and adds them together or it can do a reduction on a single array by doing an element, adding all the elements together individually. So add.reduce of a adds those together. But look what happens if you have a, it works on list as well. And these universal functions work on any sequence and they're going to try to do a reduction of those elements by performing the add operation. So here we have strings. A is a list of strings and if we do the reduce, you remember that if you added strings together, it was a concatenation operation. So we're taking our a, b, c, d, e, f and squeezing them together into a single, form of single string. There's also logical operation. So you can do a logical and reduction. And that's the same thing as doing a all operation, right? All values being one. And a logical or.reduce is going to be the same thing as the any function that we saw earlier. So reductions, if you define your own u-funk that takes a binary operation, then you can add these methods as well and they'll work on individual arrays doing the reduction. And it's called a reduction because it takes an array of indimensions and returns an array that's in minus one dimensions. It's a reduction along some dimension. All right. So if we just do this, if I, I'll create an array here, a equal a range. Let's do 20, a dot shape. We'll reshape this guy where it's equal to four by five. So we have that. And if we do add.reduce, a, then the default behavior then you see is a summing along the columns here. It's a reduction along this axis. If we spent, or along the row axis, if we specify axis equal negative one, the last axis is the column axis. We'll reduce all of those down. And, and now we reduced one, two, three, four is added, zero, one, two, three, four is added to ten. So visually, if you look at these, you can look at it this way. It's just the reductions either occur along the columns or along the row, or you're summing up the columns with a reduction actually along the rows. You're reducing all the rows to one row. Or you're reducing all the columns to a single column. So there's an accumulation operation as well. And this is very similar. The only difference is you noticed we did this, these are a reduction operation, right? We're reducing the, the number of values. It's almost as if we're going along when we have our array here. One plus two is equal to three. But then we immediately add three to three. That's six. And we throw away the fact that one plus two is equal to three. Well, a lot of times you want to keep up with that. You want to keep a running sum of well, what, what happens if I just add the one to the two? Well, that's what accumulation does. Accumulation is not a reduction. Your array is exactly the same size as the previous array. And so here we have accumulate one, one in itself. I mean one plus nothing, which is what you originally had is one. And then one plus two is three. And that three plus this three is six. So we're accumulating and keeping those values in the location where the element was before. All right. And you can do the add, accumulate with strings just as well. You sit here and you add those up and you get the accumulation. A, B by itself is the first value. A, B accumulated with C, D is that string. And now you accumulate them all and that becomes your third string here. In the string, oh, the O. So that's an artifact from these slides originally being developed for numeric. So this should actually be and we can sit here and do it. If we do this actually, let's do it in UNC. Thank you for catching that. A, B, C, D, E, F. Now if I do add dot accumulate, A, what's that? Oh, maybe I need to. I would have thought that worked, but we'll do. So that's interesting. So it's changed. It has changed since from numpy numeric to numpy. That actually looks like a bug to me. So you can see what it's doing. It's actually accumulating from the other direction, right? It's adding, instead of pre-adding, it's post-adding. So logical ops that work the same way. It's just doing a logical accumulation along the way of ands or ors as it may be. If you use op dot outer, this is going to take the outer, we usually think about outer products, right? But it's the outer operation. And so here, if we do add dot outer, A, B, then you can see what happens. It's almost as if A gets turned down this axis and B is along this axis. And then we add the element out of this row to each element in B and then so we have the zeroth element of A added to B zero and then to B one and then to B two and the same for this row. And you'll note that outer products order matters, right? And so we add A comma B here. And if we do B comma A, then we're going to end up with a different shaped array as the output and different results obviously as well. So all of those functions are available and you can use them. We obviously have talked about, okay, well let's move on to a few more functions for doing selections out of an array. And this is a slide that it's almost better if you just look at it and I don't say anything. I don't think that's quite true but it seems that way. This is a really funny function. It's called choose. And as you look at it, you may wonder why the heck do they even bother to have this function but I'll show you why in just a minute in an example. Okay, so we have the choice array here. We have the function choose and the first argument is this choice array. And then as a second argument, we give a list of arrays that are basically what we're going to choose values out of. And so here's how it works. In the choice array, we have two things that are important. There's the value of an element in the choice array and there's the position that it's in in that array. So we have this choice array zero, zero, zero. Well what happens here is this index is saying I want to choose from array C zero, the zero index element in this list of values. So it's going to pull this zero right here and copy it down into Y. And now we come down and we say well we have another choice or a choice. We again want to choose from C zero but notice when we copy a value down, we copy the value out of the same location as this array. And the same if we come along and same for this one. So zero, one, two go across the top row. And now the one is selected out of this C one array. And again it's going to pick whichever element. These are all the same elements so or the same value but it pulled that five and stuck it down here. Two pulls from here and then you can also just stick a scalar in if you want to. And so now position doesn't matter right. It's just saying if you have a three over here it's always going to choose nine and copy it into your output array. So now we have nines in this little section because of these threes. Does that make sense? Now the question is why do you care right? And why would you ever do this? And the answer is we talked about clip a few minutes ago right? So how do you implement clip? Well you can do it in C but you can also just write a simple function in Python. So what would that function look like? Well let's see how you might do this. We have our original array has a set of values and then this is equivalent of saying a less than 10 right? I'm just if I have a what does my array look like? See a equals a range 25 a dot shape. Oh I didn't that's good enough. Let's do. Alright so I have my array here and I can either do less a and say four less a four and that's going to return all the values that are less than four or I can do a less than four which is probably a little prettier mathematically but I've chosen to use the less here and I've created an array that's basically this mask right? We're using ones here instead of true false but the same same basic purpose. Alright so what would happen if I then come in and say okay I've created this mask I want to use this array that I just created. I want to use this array that I just created as my choice array and so these become indexes right? Zero and one are my indexes into a and into 10 or into this array so anywhere there's a zero here we're going to choose a value out of a anywhere where there's a one we're just going to choose 10 and if you look down here look what happens anywhere where we were less than 10 the values are 10 anywhere we're not less than 10 we just get our old values kind of a nice behavior well if you want to expand this actually handle clip we got a clip at the lower end and at the upper end right? So how do we do that we have to we're creating indices that we have and so we have our less than array that we had we want greater than 15 we create another greater than array and then we need to do a little math here because these greater thans are just going to be one zero values well we want what we can do is multiply those by two and this is going to create this array that you see here where we had the same ones that we had on the less than but wherever there were greater thans we now have two because we multiplied those times two and then we have zeros in the middle now you use that as your index and now you have a comma 10 comma 15 you've clipped it at both the low end and at the high end so this is starting you start getting a feeling of if you're not a MATLAB user which some of you I know are but if you're used to writing Fortran and C there's a bit of a change you're not thinking about four loops anymore right everything starts happening in these vector operations you have to think about things how do I how do I create vectors that represent these things you know that the operations I want to do and use special indexing or use these choice operators or use these clipping tools all of these different things instead of thinking for loops that's how we think in the C in the Fortran world so when you go to the lab today hint hint when you sit down to do exercises starting with the for loop is probably not the answer all of the examples I want you to think about indexing all right so where I had mentioned earlier we talked of there was a where function version one that I showed you this is the version two and and the slides here almost just for the purpose of documentation or whatever so you can look at it there's no difference between how this where works really in the choice array before it's just a simple version that only allows you to have a false and a true value here so your condition if you know if you have zeros and ones the first array that you pass in as the false the second array is the true and you're getting the same kind of behavior as the choice where I don't use this very much always use the choice array and whenever I use the where I'm using the first version that just takes a single argument and returns to you the indices of where that argument was true remember if we do where if we have a equals a range 10 and I do a where and so a is that and if I do a where a is greater than five then it just gives you back the indices on the other hand you can use it in this way where you pass in three arguments where I just passed in a single argument here and you have this this functional behavior we talked about concatenate we showed a few other methods that are maybe simpler concatenates there and you can use it if you if you would like x comma y if you want to concatenate two arrays together they have to be it takes a list or a sequence of arrays as its argument but when you concatenate x comma y it defaults to concatenating along the zero thaxis and so that's the row axis and so it concatenates those together you want to concatenate along the first axis or the column axis in two dimensions then you just specify as the second argument acts as equal one now if you want to stack these things in a third dimension then the way that you do that is just using the standard array function conversion and if you just put those elements in a two in a two-pole or any kind of sequence then the default behavior I mean the way that the nesting occurs is to concatenate those along another dimension so all of those work alright so broadcasting this is a really cool feature in in numpy and it takes a little bit of getting used to and once you learn it it also takes a little restraint not to use it too much so we'll point that out in a few minutes but here's an example we've already talked about what happens if I have two arrays that are the same shape in this case we have four by three plus a four by three and we're just adding the elements together so this array or this operation here is just duplication of that one there's no difference well what happens though if you come along and you have a four by three and you add it through an array that just has a one-dimensional array that has three elements well one of the one of the choices would be it fails it won't allow you to do that Python actually does something slightly different it's called a broadcast operation and so what it does is it says ah I notice that the last two dimensions of these arrays match they're both three so then I'll take the elements here and add each of the elements here into the elements on this row as well as this row as well as this row so it's as if it stretched these values down this array or down this dimension and then it just added those to each of the rows and so this happens to be equivalent to this up here but you haven't had to duplicate all of these values right on this one well what if you come along and instead of having a four by three you have a four by one matrix and you add that to this three element array well the last dimensions don't match do they so it doesn't quite follow this rule but there's a second rule in broadcasting and that is if either of the dimensions are one then that then the broadcast matches and what it does is it stretches that one to be the same size as this one so what happens here is you have two stretching operations where you've duplicated these values across and you've duplicated these values down and then you add them together and you end up with this array and so you'll notice what operation is this we had the add dot outer it's the same thing if you don't want to put add dot outer in the middle of your equation you can have arrays that are like this and use broadcasting and your mass still looks kind of pretty and clean as you're writing out your equations and you haven't also this is kind of painful if you have to create you know in the end you just want this array right this is 2d it might be big it might be a zillion by a zillion and in rows and columns just to do the operation of creating this array I don't want to have to create two more arrays that are you know now you have two a zillion by a zillion you have three zillion by a zillion matrixes here right instead of having one and you could do it just with one vector here and one vector here so it's much more efficient memory wise to do this so that's a handy trick now Python doesn't try to be too smart about things it doesn't go ah I noticed you have a four by three and I see a four and I could take this array this vector and broadcast it down this way and that would work there's a mismatch here because the three it's not going to try to match this four with this four this three and this four don't match and so because of that it's going to return to you a frame's not aligned there if you actually want to use this this brings out two things you remember we talked a little earlier about indexing with none and that looked a little strange why would you ever care to do that well this is why you might want to do that right you can take two arrays that you just create in a normal way a and b and then you you ask for a to have all of its values put along the rows add an extra column that creates this or just basically turns that array on its head then you add that to your original be and you end up with this array so that's a lesson in how to do broadcasting I'll show you an example of where you might do it and then show you why this was a bad idea so here's a problem set and this one is is how many people are in information theory do so there are a few people okay so vector quantization is one of the typical algorithms that you might do to try to do classification on objects or whatever else you might imagine that you have a set of of any any attributes that you might imagine each of these elements here is a sample and you have feature one and feature two and they could be any feature I mean if you're looking at basketball players those features might be their height and their armspan or something like that plotted against each other and what you've learned is that people that are in this category if they're in zero or three way or people up here tend to be good people down here tend to be bad that doesn't quite work with the attributes that I gave there but but you can you can have an idea of wanting to be able to separate if you have this large population wanting to be able to separate them into two groups well the notion of vector quantization what you do is say you know people that are close to zero and three are in one class people that are close to one two and four are in another class and so to be able to pull that off what you have to be able to do is calculate the distance from each of these elements to all of those individual codes or what we call them and find out which one's the minimum distance so let's look at this if you imagine you take one of your your observations or your basketball players one of your data points if you want to know which which code does this item belong to you have to calculate its distance to all of the codes that you've defined and then this one's the minimum distance so this one falls in the category of one and then basically you have categories one two and four a good zero and three are bad or one is one target one is another target any of these kind of ideas that makes sense to people all right so how would you do this well say you have a new data set you know you've trained this you kind of created this data set based on a training data set or created your codes position them based on a training data set now you get a whole lot you know you go to you have a no I maybe a basketball is the wrong example here right I should be using cricket I guess so you have a whole new set of cricket players come in you want to find out who's good and who's bad and so now you just need to classify them based on these features well how do you do it well you have two things you have all these observations in here these gray dots and you have your codes and you can array if you organize these you might organize them in this way you have your set of observations here I've done three dimensions instead of two you know this is x and y you can do this in the end dimensions it doesn't matter how many dimensions you have so here we have x y and z so each of these is an observation or each of these is is a measurement that you've made on a specific observation so observation zero has this x value this y value and this c value and you have that duplicated over and over again so you have a thousand new players in this case we have ten new players and then you have your codes that you trained originally to determine which which class somebody was in C zero one two three or four so how do you do the math of this well the first thing we need to do is take a difference right so what we can do is take these color these arrays there are two arrays here we can turn one this way and one this way and now that's done with this indexing process of none colon we've added one zero dimension in the depth dimension it has one and then we just turn the ten and three oh excuse me that's this we've added one dimension and then put the ten along the rows and the three along the columns here done the same with the other one now when we add them we end up with this cube in the middle or subtract them excuse me we end up with this cube and each of these is the difference between the code elements if you look here this is going to be observation zero this row right here is going to be the difference between observation zero and code zero this is observation zero minus code one and code two all the way down so once we've done that one of these is really the difference the distance from the vector so what we can do is we can square that value and what we're going to want to do is compress it along that the depth axis here so that we get the distance value now this value is going to be the distance of observation zero from code zero this is the distance from observation from observation zero from code one and on down the way and what we need to do then is find out which one is the min so you do arg min and along the zero thaxis it compresses which one of these is the lowest one and that's going to give us the index so object zero now this will say either zero one two three or four in that place and we've reduced this actually fairly complicated algorithm into a fairly short amount of code so this is really cool right I'm going to show you how fantastic this is all right so how fast does it run well so here's the MATLAB version of it and that really fancy thing that I MATLAB doesn't have this notion of being able to do broadcasting so we ought to be better right well doesn't look so good here right we have a speed up a point seven one usually you want speed ups greater than one speed up a point seven one means you're slower so that's not a good thing so we just ran slower if we go to floating point types we do a little bit better you know float float 32s are a little more efficient than float 64 we're still not much of a speed up so why what went wrong here here's the problem if you look at this I start with this data right and really what I want is this data well in between here what have I done I've created this massive 3d array and as an intermediate step I'm just going to throw that thing away so the allocating this 3d array just is killing me it's sucking up all the time and so this is the this is the don't be too clever with broadcasting is that the the the moral of this story because if you are then you end up with vectors that are much larger than the pre original vectors and especially if you're reducing back down you can kill yourself on speed and so I took a second approach to this if you look at this you can say well maybe I don't want to broadcast and create this 3d array I'll just do the broadcasting on a code by or an observation by observation basis so I'll turn this instead of taking this whole row I'll just take just this one element just this one observation multiplied here and so then I'm just getting 2d slabs instead of this 3d slab and I kind of walk through the array like this right and I can do that and that's that's a good idea that's that solves some of our memory problem and we almost get back up to the speed to the matlab version in here and it's for floating point values it's still better but we're it's still not that great and so this is kind of to what your appetite for tomorrow if you just go back and go listen I know how to write this little algorithm it's only about 10 lines I'm going to write this and see how fast does it go well goes fairly fast get a factor of 25 speed up over the matlab version so the nice thing is you can do it you can write a quick algorithm in python and get it working and you can test all your code and do that sort of thing and and then when you need to you can switch over and write your little algorithm and see that you need link that in these algorithms by the way are now if you look in the I think there's a cluster module in sci-pi and so we use these are the vector quantization methods that are in that cluster library so if you're doing this in your information theory stuff don't write your own just use these they're already blazingly fast pickling you didn't talk about pickling yesterday at all okay quickly so there's this notion in programming or in in many languages of serialization and so if you have an object a and you need to store it out the state that it's in on the disk or send it over the network through TCP IP or whatever it is then you need to get some representation of that object state right well serialization is the common word for doing that you're going to take this object state its memory and serialize it out write it out in some format that can then be interpreted in the future or someplace at a different location where you can reconstitute your object and be able to manipulate it again so python has a serialization method called pickling and you know the reference here is you're saving it out right and so and it they further it by the module that you use if you're pickling things and storing them out is the shelf module so you store your pickles on your shelves well python that there are multiple different notions of pickling you can either pickle out to an ASCII format or you can pickle out to binary formats and they're actually two binary formats so there's the old version and the new version the point here is when you're storing out your arrays be careful about what you're doing or pay a little attention if you if you make sure that you store out the default is to store out an ASCII and you can see what happens here we have an array of zeros here it's 40,000 bytes long dump s will allow us that that just pickles the array to a string so now we have this ASCII representation of our array look how long it is it's a hundred and sixty thousand so you had a 4x ballooning and that's because it's having to write out the characters that represent that floating point number instead of just storing the the four bytes of the floating point number or the in and in yeah in this case four bytes of the floating point number it has to write out a lot of characters you know eight dot four three two one e negative sixteen or whatever it is negative ten so it explodes in size on the other hand if you pick a line the binary format you can see it's very compact it's almost exactly the same size it just has the little extra information we talked about about having to store out the header information about how the array is organized so be careful about that so controlling the output format you'll notice if I do on an array if I do a is equal to array and I make this zero one two one e to the negative six when I print that out it prints out a whole lot of digits here right on these things and so what you can do is come in and there's a set print options flag that you can set up so that a lot of times it's pretty hard to read this it takes up a lot of space and you can't really tell if that's a zero or not and in that sort of thing so if you want to control that you can set the precision this will show how many digits it's going to show out of the floating point numbers the other thing is that you can you can set the threshold if I have an array that's a hundred elements long and I asked to print it it prints it for me if I come in though and I give it an L an array I think a thousand it will it print all of those as well but as soon as I put a thousand and one elements in my array so there's a threshold that's been set at some point you can't watch all of that stuff streaming by and make any sense of it right and so you just want to see a representation of what my array is especially if it's a gigabyte then you're just you know you've you wait in half the day for it to stream by and that actually used to be a problem it would actually I Python would lock as it sat there and streamed all million elements by you so there's also the ability to suppress really small floating point values so if we come in and a is equal to array and we do 0 2 3 1 e to the negative 15 so that's a pretty small value if we do set print options suppress equal true now it doesn't print out that teeny teeny value it just says ah that must have been a zero I'm not sure where the threshold flag is set but it's it's on floating point values is probably down at 10 in the negative 14 or so I mean on double precision so that's kind of a handy feature so you can tweak these however you want to get the output that you need there's some examples right quick I think we've walked through those there's also error handling and what I'm going to do is I'll leave that for you to look at this is just explaining about floating point errors if you have you know when you you do floating point operations you can have overflow and underflow how is that handled by default numpy warns about it but you can set it where it throws errors whenever it sees a man or all of these different things so what I the reason I went through that quickly is I do want to talk about composite data structures if only in a very small way here so we we've said that arrays have a D type and it's usually been float 32 or float 64 or something like that but a lot of instruments especially things like the Hubble for example when they stream data back the data that comes back from an instrument isn't just an array of values it may be an array that has the first four bytes or a floating point value the next byte is an unsigned integer value and then the next three bytes are a string you know it's a very complex data type that comes in in memory well Python allows you to find data types or D types that actually describe a block of memory in a more interesting way it doesn't you don't have to say it's all floating point and so what you do in a D type is you just split you describe how an element what is the packing of an element and here in this case what I've done is I've described a D type where the first element I'm saying is a float for element or a four byte floating point number and the second element is a four byte floating point number and both of them have names were naming the first one mass and the second one velocity so I can create a particle here equal to this array and we created this and I've specified the data type to be a particle type so what happens here is these values are the masses and these values are the velocities in the description of how this is put together and you don't have to do this through Python right this can be reading a data structure in that's a C data structure and this actually specifies how it will treat the individual bytes in this so I'm gonna all right so we created a D type cut and paste didn't work very well and then I'll create a particle and this is a nested or a list of lists and one one I'm gonna make another particle here to one and then a third one that's two three and then spent my D type to be particle type save that as a array we'll make that our particles all right not sure why it doesn't take list but it doesn't so there's our array and now I can index into this if I ask for particle zero I grab the particle zero and it gives me both of those elements back but I can also do this which is kind of cool and that just gives me the masses of that array so it treats those as if they're fields this becomes really really powerful when you start thinking about a database this is nothing more than a database table but you have it in memory you have all these slice operations you have all these reduction methods you have all of these capabilities so instead of having to do all your operations in SQL you can on the on the server side you can select out your data set that you want put it in one of these tables and then go to town doing all kinds of math and slicing and operations on it very very handy very very fast it's not used very much yet actually because it's brand new but it's very slick all right so that does it for arrays