 It's a matter of introduction of the subject matter. Does everybody know what NumPy is? No. OK, almost everybody, but I'll say it anyway, because that's the first slide, so I don't really have much choice. And I wouldn't skip it for my life. So what's NumPy? The shortest explanation possible, really, really, is that it's a library that extends Python with a new data type, a multidimensional arrays. I edited the sentence a few minutes ago, and it doesn't make sense now, so please ignore. But the thing is, though, the most important aspect of NumPy is that there's this x1 extra data type and matrices of numbers, basically, which you can perform a variety of operations. So basic operation on arrays, linear algebra operations. So that's a little bit towards MATLAB direction, right? Fourier transforms, and a variety of random number generators. That's a fairly mature library. It started with something that was called numeric, numerae, and now it's NumPy. Documentation is out there. It's quality tends to be spotty, but there is a book, which actually is rather good. And since recently, it's even free, or should we say, based on honor system, you pay if you feel like that. It's a downloadable PDF by one of the authors. So it's pretty much everything is there. And it appears that, as far as I'm concerned, it's been around since almost forever. So it is a pretty mature library that I went through a fair number of iterations. And it's really, really nice. And I suggest it can really warm you to everybody. Now, the way I said it's the labor of law. If you read this book, the manual or guide to NumPy, it starts with, how is it we wrote it? And it starts with grad students somewhere there writing up stuff for their own use for research and how they stuck to this very idea of having it. So it's kind of neat. But let's get to the subject matter. So NumPy and SciPy. Well, SciPy is scientific Python. It's a library for scientific computation, which uses NumPy as its basic, shall we say, engine, the basic data model under it. So it's there. What's in SciPy? There are functions. There's lots of goodies that I don't really use. But since it's listed on the website, I list it here. For other, I do use statistics. But there are methods for optimization, for integration, image processing as well, which is kind of a bit of my alley, and more. Just go there and see. By the way, the website where you can find both NumPy and SciPy is scipy.org. And then there's some rather unfair comparison. So no Python equipped with NumPy is not Matlab. Not quite. And it's not R either. Does everyone know what R is? Those who don't like you. And honestly, I always like it much better Python and NumPy than any of the other two, really, especially R. So the way I started with NumPy is that I was doing some prototypes, something I should have called algorithm research or algorithm work. And I needed a prototyping tool for finding little things on images. That's what I do most of the time when I work. And it just turned out to be perfect as a prototyping tool. I've never honestly used it for really heavy mission critical stuff, but I've written lots of things in it that ultimately led to commercial products here and there. So what can we do? That could be mostly about NumPy itself and what you can find there. So I'll try to give you a little bit of an overview of what can be found there. And then I would like to show you how I applied it to actually a real life project about which I'll talk length as well. So what you can do there, those basic operations, we can create an array. And I have a pointer, hang on. So we can create it. So that's how it looks like. You can very well use it from, say, from iPython. You just type in import NumPy and you can use it. And that's, for instance, an instruction for creating a matrix, a zero matrix, five by five. You can access and then NumPy fully subscribes to, say, Python way of doing things, accessing numbers or elements under indices by slices, by indexing, and so forth. So for instance, so we can access elements at indices. And that's how it looks like. I'm comparing something to zero. That's not a very good way to compare anything to zero, but kind of shows the point. We can access entire rows. So this is my matrix, which happens to be n by n plus one, because that's from called the L show in a moment that solves a system of linear equations by the typical means, the same illustration only. But the point of this little example here is that I swap rows. So I have a matrix of numbers and you just swap rows. So you just do it like you would do naturally by two, as if they were variables. So that sort of tells you something about how the library works. You can assign something to a row, and it becomes that row, as long as dimensions agree. So you can use this whole system in a very natural way, a little bit, I would say, MATLAB-like. So it is a convenient, shall we say, a command line tool as well for that reason. So we can access, we can slice and dice things. Slice, and by the way, these are not only square matrices. All my examples are square, sorry, these are not only 2D matrices. You can have arrays of arbitrary number of dimensions for the record. I'm not using that in any of these examples, but it's really possible. So the other trick that I'm doing here as well, I create a bigger array, and then that happens to be all zeros, and then I assign to a block of it, just this block, 10 by 10 in the top left corner, right? Once, that's how it works. Why not? So I can assign one to another. And then there's basic arithmetics on matrices directly, so I can take an element and compute something from the end. That's pretty obvious. But then I can compare, I can assign, well, I can multiply a row by a scalar and assign it. That works. So that's more or less how it works. Now I'll go on to my first example about NumPy, and that's a little piece of code that I typed up yesterday. Yes? I'm not sure I followed the question. I don't think that would work, actually. Because this is an object that exists. I haven't tried that, honestly, though. But from the smells it makes, it probably wouldn't work, because you are accessing same object twice. So this was really boiled down not to two assignments, but two invocations of a method on the same object. So that's not something that you want to do, really, is that? But I don't know. So here's a little bit of an example that I would like to use to show you how that all works. It's a pretty trivial example. It's a little function that, as I said, I wrote up yesterday as an example, it solves a system of linear equations by using Gaussian elimination. So that's like high school stuff, I believe. And everybody's familiar with that, but I will talk through Python applications anyway. So here's the first thing. So we take, so in case, well, the equation is given as a square matrix, which I don't verify here, and a vector. And well, it's the first thing we do, we kind of attach the vector to the matrix, which we can conveniently do using a function h stack, which means horizontal stacking. So I concatenate the two, attaching that very vector to the end. Now I'm ready to convert my matrix into triangular matrix by the process of Gaussian elimination. So that's what this function does. And this is where those little examples that I, little snippets of code that I use for flipping rows, for instance, this is where they come from. So here is by, everybody knows how it works, so I will not go into details, but we use conveniently here all those little bits and pieces that my slide has shown. And then we do it the other way around, because I thought it would look nicer and be shorter. And that's how it looks like in the end. Oh yeah, there are two return functions that do pretty much the same thing. They sort of demonstrate, this is basically the same instruction, but just using a different facility of Python and play flip to the slides. And ups, where it is, a rather fundamental or rather important feature of NumPy is so-called u-funcs or universal functions, which bring all sorts of arithmetic operations, element-wise operations module on matrices. So we have the, just to clarify, I'm trying to not to confuse the issue, but we have the algebra set of algebra sub-library, which is quite important, which I'm not really talking about here. And then we have all things that relate directly to matrices as essentially a data structure for accessing rows, columns, of numbers. So u-funcs most of the time, they really apply to elements within the structure. So you can do element-wise multiplication, take two arrays and you, for instance, can multiply each element by each element into arrays. Or you take two arrays, there's a function called max. You take a maximum, you take two arrays and it returns you from this two arrays max element-wise. So from each element it will take the biggest, from each array it will take the bigger of the two. It compares two matrices. So this one makes it rather convenient tool for things that are not quite algebraic, but more like, say, image processing or experimenting with images. Because these are things you would normally do. Well, not quite, but you would do occasionally. So going back to my example. So one is a rather, what the first line shows, or is intended to show, is that I can use quite easily, well, that's really done as least comprehension. So I just process elements using indices from the list and then in one call I turn my list into an array. That's clear what it does. What I do, I produce lists of numbers and I turn it into an array by one call. So my intention was to show, OK, we can interface to lists in Python and create arrays out of them. Here's the non-py way of doing same. What I really do, I take the last column of AY and the diagonal of AY, which is n by n plus 1, and divide one by the other. Since I've done elimination before on this data set, my matrix looks kind of like, oh, there's stuff there. Do I have a, no, I don't, sorry. It has a diagonal and one column at the end. That's how it looks like. So you kind of can imagine what is it for. So it's pretty simple. Now we can run it and if it doesn't, I'll be totally embarrassed now. But it does. So it should, so there you go. It's sort of my randomly generated, OK, thank you. Very nice. It's too late, though, but thanks. Thanks a lot. All right, now seriously. So there you go. I'm lucky because I got the same result twice, which is very nice. So I'm comparing myself to Python's linear algebra module that has all goodies for doing this sort of things. So yes, you can type up a reasonable way of solving linear equation in about 15 minutes that make it 30. The point is it's really easy. It's really quick. And all the stuff that you need to deal with, it goes away. Of course, typing up this takes about 10 seconds. So I strongly recommend using a library. And you don't have to test it. So that's much for the first example. So what else is in the library that can be of use or of interest? So well, I went through Ufang's. There's lots of stuff. Well, there's several forms of fast Fourier transform that's there. You can interface to another library very conveniently. Like, for instance, you can interface to PIL. Does everybody know what PIL is? Now PIL is a really, really dated library. It's called Python Imaging Library. I don't really think it changed since four or five years ago a lot. But someone keeps it, shall we say, compatible with subsequent versions of Python. But there's no real development on it. It's a useful library that I used to open images, really. That's not much else, really. But it loads images for me. So I open an image and convert it in an array, mandle the array. So what I'm doing here is pretty obvious. I'll show you this example. But what I really do is I take a fast Fourier transform of my input image. And then I multiply it by itself. I take the square of it. I shift it all by one, all values, to not to have pesky zeros, of course. And then I take logarithm of it to make for nice visual effects, scale it. You will have to see the visual effects. And then I rescale it to whatever I expect to have in my, this is an 8-bit image that I'll produce. So I rescale it to between 0 and 255. And then I make an image back from an array, right? From that array that I rescale and save it up. So what I've done, well, I have created, I have computed so-called power spectrum, effectively, and took logarithm of it. The logarithm is really, as I said, for visual effects. So I'll now try to run this thing for you. You will see it runs, actually. Oh, this is a really nicely, nicely written C library underneath. So this thing, if used correctly, is blazingly fast. I mean, numpy. Anyway, what's going on? Yeah, no, no, not done. But oh, OK, done. So I loaded what I've done in that split time. What's here corresponds really to what you saw in the slide. So there isn't much to add. But so in this time, what numpy and pil have done for me is I loaded an image that happens to be 800 by 800. Done power spectrum computation, written out the output. I'll show you the image and I'll show you the visual effects so that you can appreciate the hidden beauty of, OK. Hang on, hang on, I'll be right with you. Yeah, that's my summer house. Everybody's invited. Hang on, where did I put it? Oh, boy. Yeah, that's here, I think. So since I put it in the same directory, I have now hard time finding it. But that's OK. Yeah, it's right there. But the normal Microsoft viewer doesn't show it correctly. So now I need to also find a decent viewer somewhere. Sorry. Yeah, well, you know. I have a love and hate relationship with Microsoft Windows, I have to say. Love part is when I get paid for using it by my customers. OK. So I'll load that image. So this is my source, shall we say. So this is an image that I made by myself. It speaks a lot about my artistic abilities. You see it, it was in color. It would be a real Pollock. But the point is, though, it has really regular features to it. It's a fairly regular thing. You can accomplish that by pastakingly painting little circles or using copy and paste. I've done the latter. Now, if your image is just fairly regular, you can expect your power spectrum to look interesting, which, of course, it does. So that's the logarithm of the power spectrum of that image. There's some saturation, but that's how it looks like. So it tells you something. If you analyze this sort of image, it tells you something about the regularity of features in the image itself. That's about it. Nothing much to see. But the point is, it's kind of pretty, right? I like it, but it's better than I expected it, actually. Do you want to use Photoshop? Big Pan? Do you want to use Photoshop? Of course not. Why would I use Photoshop? I mean, yes. Big Pan, Big Pan? Big Pan. Yeah, I don't need it. You're correct. It came from something else, sorry. I don't need it now. I mean, I take, right later, I take the square, yes. OK, so did I mention R before? Yes, I did. And R is, for those who don't know, it's just awesome and everybody hates it, but it truly is awesome. It's incomparable to anything else set of statistical tools. Now, in Python, we do have a bit of it, but not nearly as much. However, if things that, just like me, you do, tend to be pretty simple, Python and NumPy with, well, SciPy will serve you rather well in doing basic stuff. For instance, if you want to generate random numbers, lots of random numbers, there is quite a selection of generators that match your textbook pretty well. And there you go. You can use it for that reason. So now I'll talk a little bit of a project in which I used NumPy to generate test images. So I'll skim. OK, that's the end of the part where I talk about, where I introduce NumPy. Are there any questions regarding that part? No, OK. Am I totally boring? You are very kind, thank you. You can, we can get through this only together, really. I'll be brief, you will be generous. OK, so why would I ever do it? Why would I generate test images? Because software that I worked on, it measures those little, little things on images. Those little things that I talk about, they really correspond to images of cells. So if you take a microscope and connect a digital camera to it, and you take pictures, so this microscope has rotating wheels on which filters, color filters, sit. And then you get monochrome images from your sample that first was marked, of course, with fluorescent media of sorts, fed to those cells that they incorporated. So you end up having a bunch of images from this. That kind of look like this. That's your nucleus. And it's marked with one fluorescent stain that normally would correspond to light that you don't see. But, well, it would be there. Then it would be something in nice green color that would correspond to, for instance, give an example, that corresponds to cytoplasm, right? And then you would have a third image. Again, acquire the different wavelengths, where you would see, say, a bunch of dots around this particular target, which correspond to, I don't know, a pattern of the expression of something or some organelles that just happen to be there. They incorporate this stuff that you feed them. And they're usually like antibodies that mark something. I'm not really strong on details. But there's a fluorescent moiety on the other end. So we get this sort of peculiar looking images that, if you compose together, they make a color image of a cell that kind of looks like a cell. But we analyze it in that form, because that's really the way to go about it. They're not there to look at them. They're there to measure processes that occur in those cells. So the problem with this sort of work is that you want to test your software, or you don't want, but you are required to. I mean, that really depends how you look at this. The entire point is that if you go through the entire, these are living things, really, living cell characters. If you go through entire process from sample preparation all the way through analysis to some sort of output that is interpretable by a biologist, so you can contrive the sort of testing pattern where someone prepares a sample with known cells and known stuff applied to them, and they would sort of expect a given type of response. So you can validate this entire process if that works. But the problem is that it really confounds so many factors that you can't really figure out on the basis of that what is not working and is really your software measuring things that you would think it should be measuring. It is really extracting the numbers that you want to extract, and this is really segmenting correctly the shape that you want to segment. So this objective truth is really, really difficult to attain. So instead of trying, we just give up and fake it all the way. So what do we do? Well, so that's pretty much the summary of what I said. So this is a biology. We apply some chemical compounds, take pictures. Digital images are, well, here's kind of empty space. Something happens, and then bunk, you get data, and then you compare it to what you expected. That's sort of how people tend to look at this. But of course it's, the issue is writing software this way. I mean, you sort of make some assumptions about those images that will be coming in. You write your software, then you get the actual images. You feed it to that software and you get some results, really. So how do you start it? How do you come in? Once you actually have working software, then you can sort of bootstrap yourself, right, in the old good way of regression testing. You just assume your version one is correct and then you test against that. Unless a bug is found and then you fix it and you assume that version one plus delta is the correct version and then you test against that. That's really not that good, really, is it? I mean, it's okay, but it's not perfect. No, actually it sucks, but that's what they are doing. Anyway, the point is that in design when you start, so you start from writing your software in, well, you can vast people that have good will towards you and you get some data from them in form of those images and you feed those images along with your prior knowledge into creating the software and then you release the software, you listen to sounds of dissatisfaction, which tend to be pretty loud at the beginning, especially from people who actually paid for the crap and then you iterate and you iterate and you iterate. So how is that we can, where's the problem with that? So the part of all that is that when you get your real life images and you make a lot of assumptions about what's, how representative they are of what your software is going to encounter in the future, really. So I just say that implicit assumptions are bad. It's better to make explicit assumptions. So how do we make explicit assumptions? Well, we assume something about characteristics of the images explicitly and we write a simulator, a generator of fake images that has certain parameters and just we create this fake data according to what we think should be correct, but at least it's explicit to arrange those problems with sort of models. You must not allow yourself to believe in them, but it sort of helps you focus. That's why the software is useful. So for instance, we're making images by hand. So how we can test it? Well, you can, you know, still going at it. You can make some test images with known values by hand and test against that. Okay, there should be value of seven here. Is it seven? Yes, good. Eight, no, that's bad. And, you know, that's how we can go about it. There is a problem associated with acquiring the slave labor necessary to create this sort of images. People don't want to do it. Nobody wants to do it. So even if you pay them, they don't want to do it and we are poor. So what do we do then? Well, what I try to do is that we basically take example from real life images, assume certain things about them and make artificial images out of this, shall we say, knowledge. So how that really works now? How is what this stuff does? I'll show it in a moment. I can't really run it because it runs forever. Really, it takes a long time, but just believe me, it works and it uses Python and it uses it in a quite nice way, which I'll kind of step through a little bit. So how that really looks like. So we really have, you know, what that software gets is a container. So I mean, in real life, these actually containers are well placed. So there's a whole plate of little containers in which those cells normally reside. And you know, on the image, it sort of corresponds to this sort of picture. You have a round shape or square shape within which some sort of objects live, really. And my test image, ultimately, my real life images over there, they normally contain, in their scope, there's tens of cells or hundreds of cells depending on magnification. But the whole point is that this container gets divvied up by the microscope. So we kind of look at a given position well and move a little bit, take a picture, move a little bit, take a picture and this whole bunch of images that sort of overlap a little bit. You need to stitch them together. That's another step in the process. So what I really do, so within the chosen container shape, code randomly places shapes drawn from predefined object populations which have their intensities or rather concentrations, hypothetical concentrations of this fluorescent agent and their shapes. I imagine for a sake of simplicity, they are elliptical shapes like that. So you have two, really, two axes to decide. So you can kind of model them as this sort of distribution. They can be bigger, smaller and so forth. They're drawn from this resolution. From this distribution, you place them in this container. Then you render part that's visible in the camera step by step, generating an array, really, of values in NumPy. So to do that, I use NumPy and another nice library that probably nobody heard about. It's called Shapely. It's sort of difficult to do, but the best I found, it has its own problems, but it's a GIS actually library that has lots of shape things. So you can do operations on shape. Why I needed it is because I needed to figure out quickly if I need to cut something or not, right? So writing my own code for actually figuring out clippings at the edges of these things is kind of unpleasant, really, and why would I do it if there's a perfectly good library, once again called Shapely, and it's available from the General Python Library Repository. You can find it there. So there's a perfectly nice library to do it, so I use the library. And then having done that, I get, for instance, for this little square, I get my three images and I mangle them further because then the real, so here I sort of have a list on the side on my software that tells me in this image what I should really find. I should find this shape and that shape and I have this sort of intensities, blah, blah, blah, blah. These are all known values that I can now automatically compare against, right? Just, that's not a very insightful test, mind you, but it's a rather rigorous test for exercising this sort of software that just computes or measures something from a real life detector. So now I also have some other problems to account for that tend to bother this sort of software. And these are problems associated with the actual physical detector, right? The camera that measures this stuff because sort of camera has a subject or sorts of noise, or sort of problems that are known, most prominently there are integration noise or shot noise, dark current, and possible detector defects and our software. And this is, by the way, the main point of parametrization because our software should be robust in regard to the sort of effects within chosen limits. And that's actually usually when you write software, it's given in specs. What sort of effects should you, what sort of level of effects you should have, you will have in your instrument, right? The one that you work on. So dark current and these are interesting things that are related basically to how a camera is constructed, right? You have a camera like that one and it's an electrical device and it basically captures electrons. Well, it captures photons incoming, they bomb off electrons and electrons are being captured, right? But the problem with this sort of detector is that it's kind of warm. So there's a level of thermal activity in it that, well, it manifests itself as electrons jumping on and off. So this is your so-called dark current. Some level of background noise that's always there, right? And that relates to the very nature of light, right? And then there are other problems. And so we add the noise and NumPy is excellent for taking a massive matrix corresponding to an image and just creating a numbers, putting random numbers drawn from a certain distribution like for instance Poisson to that matrix and that's what I do here. And then we output it using PIL to images that then get sucked into the software. Ta-da, finished. Now, if someone has patience to look through the software, it's there, I can show it. I discourage, generally, because it's really boring, right? Hang on, what I've done. But it's a pretty, okay, a little reload. I should have done it earlier, but I'm at the end, really. I'll show quickly the code and if there's something of a particular interest I'll point it out. It will take no more than three minutes or five minutes and I have still 20. So if anybody has any questions, I'm really more than willing to answer them. If nobody, then we're almost done. Think up some questions, people. I feel underappreciated now. Biggy, pardon? You know what? I don't think it's you, I don't think it's you. NumPy itself is using Lapak. I don't think so, no. Although SciPy does use Lapak. So, I mean, when you look about, when you think about what's in NumPy, it's really simple stuff. There's no ambitious math there. It's a matrix representation that is really, really nice and useful in Python and you can use the false sorts of things that don't have any mathematical flavor to it. You just have a bunch of numbers. By the way, it interfaces very well to a database called HDF database-like thing called HDF5. There's a Python library called PyTables to do exactly that and that's a data storage system for this sort of massive numerical data. Comes highly recommended. I tried to use it once. It's okay. So, do I do anything in particular in here that Merit's mentioned? Well, I take real life, sample shapes and I create my pipeline that draws these little shapes. I don't know if there's nothing there. I mean, no, that's it. It's really simple. Then, I mean, it's a real application of Python and Python and NumPy make it really so simple. There's a little bit of labor involved but you don't have to go and, you know, contribe clever stuff. It just works. Thank you. Oh, thank you.