 Hi. Well, let me get started. My name is Catherine, and I'm a patch arm developer, and how many of you know about patch arm? Oh, great. Well, this talk is not about patch arm at all. So if you are interested in patch arm, please visit our booth. We'll be really happy to see you there. And what I'm going to be talking about is vectorizing your brain with NumPy. And this is actually a lecture taken from my graduate machine learning course. I'm currently teaching at an academic university in St. Petersburg. And how many of you are using NumPy? I mean, how many of you are using NumPy in your everyday development? Well, I'm sorry, guys. I believe I will not tell you anything new today. And as I already mentioned, this talk is taken from my machine learning course. And you might wonder why this talk was included in such course in the first place. And the simplest answer is here. I started my course with a simple algorithm one can imagine. And it's K nearest neighbors. And this algorithm, you're probably familiar with it. It's used in classification tasks. And the idea is to assign the label which is most frequent among K nearest neighbors to the object. And assignment for this lecture was to use this algorithm and apply it to one's dataset. And that was actually an average code I got in reply to this assignment. And no one of my students actually used NumPy. And, well, it was sad. And this code worked ours. I mean, I can't just wait for so long time to check the assignment. So my teaching assistant and me decided to include introduction kind of NumPy lecture in the course. And, well, that is my motivation to speak about NumPy and this knowledge. And NumPy is the main tool used in all the science. And what I want to do today is talk about how to use NumPy efficiently and how to use it for data-centric computing. And it's relatively easy, but you have to think about in some different ways about your core writing NumPy in order to use it efficiently. So I'm going to go through some ideas that may be helpful. Well, unfortunately, when I was preparing this talk, I found that I didn't have enough time to make proper introduction to IPython. So I'll assume for now that you're all familiar with it. But I will explain some features used in this talk during this talk. Well, let's get back to Python. And let's talk about Python performance. The first thing a person learns about Python is that Python is fast. And it's fast for developing and trying things out. But unfortunately, the second thing you learn about Python is that Python is slow. And everybody knows that Python is slow. But do you know why? Any of you? So let's write a simple function to calculate Euclidean distance. And this is actually also taken from this first assignment. We need this Euclidean distance to calculate nearest neighbors. And on the first row, we got a number of iterations needed and then we just accumulate the distance or the difference between two points and then return this accumulated sum. Nothing special. And I'm going to be using time it magic function included in IPython and IPython notebook. And it allows you to measure your runtime and to quickly get benchmarks for the simple functions like this. And time it function will run your code a couple of times to make sure it has the best result. And if we use time it and we call our Euclidean distance function, we find that it executes in 2.67 milliseconds per loop. And you might wonder, is it fast or is it slow? Well, let's look on something in comparison. And the best way to compare this is to compare this to compile language time. So if we instead implement this exact function in C, and I used here by team magic extension from IPython to load C code directly into Python, so we can use the same time it functionality. Well, it's pretty awesome. And if you haven't checked IPython yet, please do it. And if you time this C function, we find that it completes in 28 microseconds per loop. So we see that C code is 100 times faster than Python. So I'm sorry, it's true. Python is slow for this kind of tasks. And what is the problem with this Python code? Nothing special, nothing difficult is done here. We're just going through the array and doing some simple addition and multiplication. So let's do the next step and we want to find a bottleneck. So we want to learn what part of our code is so slow. And I'll use line profiler installed on my computer. And line profiler has this nice LP run magic comment. And LP run shows us how many times, time you spent on each line of code. And do you see anything strange here? Well, it might be kind of tricky if you haven't seen LP run output before. But the strange thing here is that we spent 38% of our time on fifth line, on looping. So the question is why? And to answer this question, we have to go back and see differences between languages. And C, Java and other languages are compiled and statically typed languages. So you write the code, you have a compiler that runs through your code and decides how it's going to be executed. And the downside of it is that compiler needs to know variable types at a compiler time. That means you have to specify types yourself. Well, actually I really love C and it was my first language. But it's far more cumbersome. You have to write all this extra stuff. I mean, you have to remember to declare variables and et cetera. And Python or Octave on the other hand are interpreted languages. So they don't compile down to the fast machine code, which means it executes a little bit slower. But there are advantages as well. And we all know that Python has this dynamic type system, which makes programming so easy. And you don't have to specify types yourself. You don't have to write type annotations. And my colleague Andrew is going to be talking about Python 3 annotations. And when it becomes useful, so please visit his talk tomorrow. I believe it's going to be interesting. And so back to dynamic nature of Python. Each time you do Python operation, there is a little bit of overhead for things like type checking. And when you do A plus B in Python, interpreter has to check type of A and then check type of B and then find a proper code to execute and then return the result. And there is also reference counting. Interpreter has to augment reference counter and then decrease reference counter as you change the values of your variables. And, well, we like Python because it's even though it's somewhat slow, but it's very quick to develop to write the code. And, well, that's why I use Python. So the question is what do we do with the slowness? And that's where NumPy comes in. NumPy is basically designed to help us get the best from both worlds. We want to have fast execution time from languages like C. And we want fast development type from Python. So I'm going to talk today. Here is some ideas to make Python faster when you are working with the numerical data. And the first idea I'm going to talk about is UFUNX. And it's the simplest opportunity. UFUNX is the short name for universal function. This is basically a special type of functions defined within NumPy library and it operates it element-wise on an array. And the idea behind UFUNX is to combine functionality and the loop together, well, in one. So let me show an example of this. If you're a Python programmer who doesn't use NumPy and you want to do element-wise separation on your array, this is probably the best way you do it. And so we have an array of intervalues and we want to add one to each of these values. And as a good Python programmer, you probably use least comprehensions. So you do for value plus one for value in A and print out the result. So this is Pythonic way to do it. NumPy way to do this is to, which is a bit simpler, is to create NumPy array with special creation functions. And here we don't add one to the end of the array or something like this. What you do here is you treat your array as just a number. And NumPy or lots plus operator and actually produces a result element-wise. So that plus operator is doing here beneath the surface is binary U function. And universal function combines loop and functionality. So what we've said here is when we do A plus one is we tell NumPy I want to loop through all the elements of the array and I want to add one to each of those elements. And we have the same thing for multiplication and for other operators. And please note this is element-wise multiplication, not just matrix product. We'll have nice index for matrix product in Python 3.5, not now. And we'll notice the difference here. We don't have any loop here. So with NumPy this loop actually taken place in internal of NumPy. And the question is why do we care about this? So let's take a look at the speed of view hunks. First of all, we create a large array with a lot of values and 2% in time-ed function means to time everything in the cell. And we're gonna time creating an array and adding one to each element of the array. And for NumPy what we get is 100 microseconds per loop. And if we do the same in pure Python, we do this by hand in Python, we create an array and then we loop through the lengths of the array and then we add one for each element of the array. And again, we got 100 factors speed up. And also I should point out that it's much more easier to type and understand this code. It's hard to get it wrong than the least comprehension. And you might ask, why NumPy is so much faster? So what is the magic that happens under the hood in... Well, here. And what it comes under is the fact that when you use NumPy, you find some functions. The loops are happening in a compiled code. So NumPy is a big package written in C and you have compiled functions for common operations. So the common operations... So we actually access these common operations in Python using the high-level expressions. And that's why it's so much faster. Well, does it still make sense? Well, it's really nice, these new functions. And there are many new functions that are built into NumPy and basically all arithmetic operations, comparison, bitwise operations, are loaded for NumPy to do these sort of universal functions element-wise. And there are a bunch of other operators in NumPy. Well, the next thing we're going to talk about is slicing and indexing. And if you used to lists in Python, you know that you can index a list with an integer to find a single value. And you can also index a list with a slice to get multiple values. And you can actually do absolutely the same with NumPy rates. Well, and one interesting thing about NumPy slicing is that there is no memory overhead. It looks like in plain Python lists. And NumPy returns just a view of the array. So if you assign this slice to the new variable and you change only one value in that new array, this value is changed in the initial array as well. So please be aware of this. Well, in multi-dimensional array, we can access elements by row, column, row, column, so indexes. So if you pass 0, 1, and we are asking for row 0 and column 1, and that's the value is 1. And we can also use slicing on multi-dimensional arrays. So in the last example here, we got a sub-maitex. And we can go further and combine slices and indexes together. And here we are asking for row number 0 and for all columns, which is exactly the same to write, well, 0 of X of 0. So NumPy actually offers a lot of other fast and convenient ways to do all sorts of indexing, to index more complicated arrays, to index more complicated chunks of data. And one of those are index arrays. Index arrays is just basically passing a list of indexes to the array. So if you want a second, 0th, and first element of the array, we just put those indexes in a list, put that list to the array index, and came up with the values. And again, we don't have to write any loop here or these indexes. We just pass them all together at once. And it's much quicker to, well, than writing Python loop. But the real thing about this is it doesn't return the view of the array as we've seen before. In most cases, it returns a copy of the array. So you have to be aware of this. You can see here that this assignment didn't change the value of the initial array X, unlike we saw it before. NumPy allows you to use Boolean mask as an indexing. So instead of passing an integer to choose values from X, you can pass this mask. And it will construct the array you are interested in. And so this might seem like, well, why would I need a thing like this? And when it becomes handy, when you combine this with the simple view functions we saw earlier, and if you look at the last example on the slide, here we used X is greater than two to construct Boolean array. Then we just pass this array to the array index of X. Well, and I found myself using this technique mostly on data preparation step. For instance, when we are looking on the array and we want to split data into tests and train sets, well, the nice ways to do this, besides using built-in scipy test train split, is to create a mask with the length of the array and apply this mask to the array and apply its negative version of it to the array. And that's how the same thing achieved by my students. So instead of writing this loop over the list and saying, you know, for freedom in the list, if some condition impending it to the result, it happens automatically, and it happens in one line of code. And it's much, much quicker than this Python by hand version. The next idea I want to talk about is using NumPy broadcasting. So this is something very cool about NumPy. And broadcasting is one of these things that really makes NumPy powerful and allows you to express very complicated operations very easily. And what broadcasting does, it gives you a set of rules by which you function operates on the arrays of different sizes and dimensions. So what this set of rules allows you to do is to do things like, for example, add integer to an array, and, well, you can add row to the matrix, or you can do even crazier things. You can add row to the column, and it will expand to the two-dimensional matrix. So the rules of broadcasting is pretty simple, but it's sometimes a little bit confusing, and it takes a while to wrap your mind around what's going on. But once you get this, you can do a huge amount of operations that really efficiently using these broadcasting rules. So the first rule is if the array shapes differ, left-pad this smaller shape with ones, and then you compare the dimensions, and if any dimension doesn't match your broadcast or kind of expand the dimensions with the size equals to one. And if the dimensions don't match, but neither of them is equal to one, there is no way to match those together and you raise an error. So this is a quick example of how it goes. We already saw adding a scalar to a vector. Example, when we spoke about u-functions, we didn't know back there that it was broadcasting. And looking at this example, we have two by three matrix, and we are adding length three vector. So the first thing we do here, we do left-pad it with the ones to make the number of the dimensions match. And then we broadcast that up and we stretch that vector across the whole matrix. So then we have two matrices to match. And then we just add them together and we got the result shape two by three. And we can think about it like copying memory across a rate to much dimensions, but there is no actual... Actually, there is no copying memory. This is just an abstraction to think about it. So there is no memory overhead. And NumPy is just X, like if this happening under the hood. So what this allows you to do is to do things like rather than writing your loops around two arrays and in Python you can express this with broadcasting syntax. And you get much faster version and much faster computations and also much cleaner code. So you don't have to worry about loops and well, I showed it here for the addition but it works for any binary functions. And one more nice feature about NumPy you might not have seen before. We have a three by two matrix here and what will happen if we add these two together according to broadcasting rules? Well, we got a value error because our shapes are three by two and three lengths array. So there is no way to match those together. We can left pad array with the ones but then we just can't expand this too much the matrix shape. So here comes NumPy new access and what it basically does is it takes the array and adds a new access here and you can add a new access wherever you want and it's very useful when you want to reorient your array somehow to broadcast it in a way you wanted. So does it still make sense? Well, because at my lecture in university most of my students were lost at this point. Okay, I see. Once again broadcasting doesn't add additional memory. It doesn't actually allocates new array. So the last idea I want to talk today is NumPy aggregations and NumPy aggregations is a function which summarizes the values of the array somehow. And as an example, I have a mean function and NumPy has a bunch of aggregations are built in like minimum, maximum, sum, etc. And again, it's something that if you are writing it out raw you have to write a loop, a Python loop. So you will loop over this array and do it yourself but it's much faster to do this using built-in aggregations. And one more thing about NumPy aggregations that aggregations can do is to work on multi-dimensional arrays. So if you want to get a mean value of the entire array you do x.min and if you want the mean value of the columns of the array you pass the axis argument there. So you got the mean of your columns. So there is a lot of aggregations available in NumPy and you should get familiar with them if you're going to do some large-scale data analysis and the cool thing about them is all of them have the same call signature so you can pass axis parameter to all of them. So in a quick summary, writing Python is faster but loops in particular are slow. And if you are looping over the large data set the best way to do this is to use NumPy package and to try some of these techniques. And the very last little thing I want to show you is the example of how it can be used to implement a meaningful algorithm. So we will use K-mins here and I believe all of you know this algorithm and this is clustering one. So it's a quick reminder how algorithm sounds. Well, you select K-points and random and cluster centers and assign object to their closest cluster center according to Euclidean distance and then calculate the centroid or the mean of all the objects in each cluster and then you repeat steps two, three and four. And here we just generate some synthetic data to work with it and here it is. It's visualization for this data and we have a bunch of points floating in the space and we want to compute clusters for each point here. And basically what we're going to do we're going to compute Euclidean distance and here we got vectorized version of it. Oh, sorry, here it is. Just five lines of code and here it is. It's algorithm implemented line by line like it was written before and so I took this words definition and I just managed to translate it line by line into Python. And it can be achieved by pure Python without non-Pyrolysis. This makes me really excited and here it is. So I believe we're just out of time and I'm going to leave you with this and if you are interested in these slides you can go to my Twitter account and I'll post a link to slides there and I want to thank you for listening and I hope this was helpful and please enjoy the lunch and the rest of the conference. Well, thank you, Katarina. Now it's time for questions if you have some. No questions, really? Have you ever compared NumPy performance with PyPy? For example, if any of your students refuses to use NumPy but you still need to check the assignment you can just run it on PyPy to speed up. Well, PyPy and NumPy and several... Well, for instance, just in time compilers are great ideas but it doesn't fit this talk. Sometimes it's faster, sometimes NumPy beats PyPy so there is a lot of work to be done. More questions. Hello. Is it easy to write custom universal functions? It's perfectly easy. Well, I like NumPy because you can write universal functions yourself and then it will work like built-in one. Okay, thank you, Katarina, for coming.