 Welcome to another edition of RCE. Again, this is Brock Palin. We're rolling right into our second podcast for 2011. Again, I have Jeff Squier from Cisco Systems and one of the developers of OpenMPI. Jeff, thanks again for being on the show. Hello. Yes, we're back here in 2011, the second podcast, getting right back up to speed after we took a little bit of a break in January. So here we are with another interesting topic today. Before that, you can find us online at www.rce-cast.com. You can find old shows on there. Subscribe and iTunes or get an RSS feed in your RSS reader of choice. But yes, we have today a guest about what I think will be a pretty popular topic and is a... I've been seeing more and more users at our site use it and request it. We have Travis Oliphant from Enthought. He was one of the developers of NumPi or NumPi. He can correct us on how that's actually said. It said NumPi. NumPi? NumPi? Okay. Yes. Okay, so Travis, why don't you go ahead and give us a little bit of background, your personal background, and then you can roll right into what NumPi is. Okay. Well, thanks Brock and Jeff. It's a pleasure to be on your show. NumPi started basically while I was a graduate student. So I was a graduate student studying inverse problems and medical imaging at the Mayo Clinic. And really, really enjoyed high level language work while I was trying to do image processing basically or medical image processing. And found that I was a big user of MATLAB and it wasn't quite enough and had seen this nascent numeric Python work going on. I ended up getting heavily involved in the numeric Python community. Carried that with me as I took my first academic post at Brigham University where I taught electrical and computer engineering. Ended up doing so much work in that NumPi community, rewriting numeric Python to become NumPi during some of that time. So much of that work kind of dissed me from the academic work. I ended up deciding to go into industry after that and have been at Enthought for about three years. At Enthought we do scientific computing consulting as well as training and some products in the space of high level scientific computing with the language like Python. Travis, I wonder if you could tell us what exactly is NumPi? What's its goal? What does it do? But give us the elevator picture. What is it? That's great. NumPi basically provides large data capability for Python as well as fast calculations for those for elements or information in those large data arrays. Python has lists and dictionaries. It also has an array object, but the array object has no calculation functions associated with it. It's a single dimension. So NumPi is an n-dimensional array with fast calculations attached to it. And why exactly did you pick those specific abstractions? Does this reflect something that Python is natively not good at? Basically, in some sense, NumPi evolved from the work of many folks. So Numeric Python started in 1995 and it sort of created those abstractions. The fast calculations called U-funcs, universal functions, or U-funcs for short, and the array object, those were already created by Numeric Python. NumPi enhanced those, kind of formalized them a little bit better than they were in Numeric Python. But finally, they didn't change anything. And Numeric Python really grew out of a lot of other array-like languages, MATLAB, J, APL, IDL. All of these were inspiration for Numeric Python. So I guess I would say that those two abstractions have kind of been created multiple times by multiple people through the years. And Python did not have anything like that. So compared to using the straight Python versus using the NumPi array objects and operators, how much faster is NumPi compared to regular Python? That's a good question. If you're doing a lot of element access, if you're just taking elements and wanting to access each element individually and do operations on it from the Python level, NumPi actually isn't going to be faster than using Python lists. Where NumPi really shines is if you have, say, 100 or more elements in a large data structure, you want to do math on. Let's say you have a sound wave that could be considered a one-dimensional array and you want to do a clipping operation on that sound. Or you want to do a filtering operation on that sound. Writing that code in Python versus writing an equivalent code that does the same thing in NumPi, you're going to be about anywhere from 50 to 100 times faster using NumPi. But again, it's for large data sets. So then the big question is how much better is this compared to writing C or Fortran or C++? Yeah, exactly. So it integrates with your Python code much, much more easily. It integrates with Python code much, much more easily. So you could write C or Fortran. A lot of people, if they know C or Fortran, are comfortable writing that and then trying to link that into Python. And in fact, when Python first came out, that's how a lot of people did their work. Python was a glue language between other people C code and Fortran code. A lot more people really prefer to write in that high-level language. It's closer to the way we think. And you don't have to remember as many arcane details about memory allocation and pointer arithmetic. And so NumPi allows you to do that all in Python. NumPi also creates objects that you can think about the problem as a whole. So you can think about the sound wave as a single structure, a single object, rather than thinking about each individual element at one at a time, as you would in C or Fortran. So do I infer from that, then, that you're implementing these algorithms and whatnot in C or Fortran or some nicely optimized language behind Python and then providing a nice abstract, high-level interface to it in Python? That's correct. One of the great things about Python is it allows you to create your own built-in objects in the same language that Python was originally created in. So C Python, for example, you can write a C extension that creates a new object, just like a dictionary or a list or a tuple is an object in C when you're working from the Python command line, the C Python command line. NumPi is an object in C. So therefore, all the methods or all the operations that work seamlessly with NumPiRays are in C, and that's how you get the really fast speeds to do what's called vectorized operations over a large amount of data. So what kind of operations do you provide? You said vectorize. Give us some examples. Yeah, the idea of vectorization is really the key for this high-level use of NumPi. Vectorization is when you take two objects, like just to say an array of numbers from 1 to 100, and you take another array of numbers from 100 to 200, and the same number of elements, you can just with one expression say A plus B and get another array of the same number of elements where element by element, the operation is taking place. So that's a vectorized operation. It's a lot of computation that's done with a single statement at the Python level. So that both conceptually lets you think of the problem at a higher level. So you're not worried about for loops and pointer arithmetic when you just want to say, I want to add these two vectors together or these two large data sets together. And then it also allows it to happen much faster because the actual calculations are done at the C level. And I think you also asked what kind of functions are available. So there's a lot of functions available. NumPi itself provides some basic functions, adds, and tracked. Basically everything in say a C99 math library, exponentiation, logarithms, log 1p, multiple, and then all these standard binary operations are provided. Other packages like SciPy provide even more functionality for special functions. NumPi itself also provides things like convolution, a basic one-dimensional conval function. SciPy then adds to that and takes the same data structure and provides two-dimensional convolution. Two-dimensional other image processing calculations as well. So you mentioned, so you're talking about vectorization in there that made me think of something else. I'm sure a lot of these things are optimized for what we would call the vector unit of a CPU or any of these new accelerators or like the Intel LRB project and stuff. Are you focusing on any of this stuff in your C at the bottom? Do you rely on a third party library? Yeah, another great question. We rely on third party libraries when we can. For example, the distribution of NumPi that we ship inside the N.python distribution links against the math kernel library, the Intel math kernel library, the MKL, to provide fast operations for the matrix multiply or for fast Fourier transforms. But at the low level some of the other loops really need to be optimized. They're a great place for somebody to come in and really provide optimizations that aren't there at the moment. We do just rely on a C compiler at the moment. Some people I've seen projects out there to try to experiment with OpenMP or with linking against additional libraries. I think some people have even created assembly code. It really is a pluggable architecture and you can replace any of the low level loops that NumPi uses to do the calculation with something else. You could do it on a GPU, you could do it on a vector machine. But a lot of that work are extensions and NumPi itself doesn't currently use any of that in most distributions unless you're linking against something like a BLOSS or an LA pack that's optimized. I didn't have to mention some of the additional packages that come with NumPi. NumPi itself comes with fast Fourier transforms, random number generation, and some basic linear algebra functionality. These are the functions that occasionally are optimized depending on the distribution you get, depending on who compiled it and what they linked against. Fair enough. Let me rephrase that and make sure I understand what you were saying. The basic NumPi comes with basic C kinds of loops. Is that accurate so far that you're basically just taking advantage of the C compiler or rather the interpreted Python language? Yes, that's correct. And then there's a bunch of add-ons where people have done various other types of optimizations or linking against other third-party libraries and things like that. And I would assume this is kind of a researchy area where people like to play around and whatnot. Is that also an accurate statement? So have people done, so I'm going to expand a little bit more on what Brock was saying before. Have people really started experimenting with GPUs and whatnot? Is there much of a call for GPU level acceleration in Python applications? There is and definitely people experiment with that. There's several projects out there that take NumPi arrays and expose those or do calculations or compilations to the GPU. Some of my favorite examples of that kind of work are something called Copperhead, which came out of Berkeley. There's some PyCuda is a project that uses the Cuda libraries and then fundamentally there it's using NumPi as a data structure. So you know that data is all stored in a certain way and you can access it as you'd like. And then people create Python level translators essentially from Python code to GPU code. So how much work is available? Can I actually get some of this GPU stuff in the regular NumPi library or is there a plan to add it? So right now there isn't any of that in the standard NumPi download. And definitely there's plans to add it. Although it really relies on the volunteer labor of folks. So it depends if somebody has an interest in spending the time to make that happen. I don't know of any specific plans where people are trying to get that work done except for some of the work that Mark Weeb is doing. He's in British Columbia in Vancouver. He's a master's degree student and typically the kind of work that people do, it happens while they're academics. So their master's program or PhD work. And I know that many people are thinking about this. But so far nobody stepped up and really said here's some code that actually does these loops at a low level on a GPU. But there are tools out there that are allowing people to write essentially low level GPU loops in Python. And then those could be exposed or linked into kind of monkey patch essentially or plugged into the NumPi framework. So as that evolves, I could see in the not too distant future, NumPi creating using those very fast loops at the low level. It's really designed for that. It's just a matter of work to make it happen. And NumPi 2.0, I know that's one of the goals is to have some of that capability available. It's just a matter of volunteer time and effort and people interested to see if they'll implement those tools. There's a lot of need for developers on a NumPi project. It's got a lot of room for people to plug in and do really significant things to it. So you mentioned that NumPi also provides FFT and a few other functions. But you only mentioned that it like the NumPi that you ship from Enthought links against the MKL. And when you download the NumPi source, you can build against your blast library of choice. Can you use any external FFT libraries for your hardware or threaded FFT libraries or any other third party libraries with NumPi? So you can with varying degrees of effort on your part. You know, the FFT interface is essentially an API and to link against a faster FFT library, you'd have to write an extension module or some kind of Scython code that replaces the FFT called a NumPi with your FFT call that calls your fast code. Definitely doable but not really varying degrees of difficulty. FFT libraries have a lot of different APIs and so it's hard to kind of create a universal API to all the FFT libraries. Okay, so you're not shipping any like default interfaces for FFTW or something like that right now? Not in NumPi. My very first project that related to SciPi was in fact an FFTW wrapper to allow you to create FFTs using FFTW. I don't know where that work still is. I think other interfaces exist now to FFTW. DJB FFT, the MKL has an FFT algorithm that we also have a version of NumPi that links against that. But there is, you know, there's many, many people who distribute binaries of NumPi. We have a binary of NumPi that's in the Python distribution but there's also a binary freely available on the download site. I think currently it's linked to get it doesn't link against any fast FFT algorithm. It simply uses the FFT pack code that was translated into C that currently comes by default with NumPi. So NumPi doesn't require you to link against an optimized blast. It has all the code necessary, but it has the ability to link against optimized blasts. And then there's a few people who have created optimized FFTs. There's certainly much more work that could be done there. Now you're talking about all these different implementations of different algorithms and potentially linking against other third party libraries and things like that. Is NumPi a pluggable framework in a conventional sense of plugins like for my web browser and things like that. I can load one or not load one or at runtime I can potentially choose between different algorithm or different implementations of something. Or is it more of a developer side framework where I can publish my own distribution that has my stuff in it. More of the latter NumPi itself isn't really pluggable but Python is very pluggable. It's very generically pluggable. I mean you can do very interesting things with Python to create a distribution that looks as if it were pluggable. But there isn't anything specific that makes it more pluggable. On the blast side it is on compile time. It's basically a command line option to the build instruction. You tell it where your packages are. There's also what's called a setup file and you can just in that setup file much like a make file put your libraries you want to link against. But blasts it supports pretty well. FFT not so well. FFTs are basically whatever by default the FFT pack. If you want to do a faster FFT. You really do have to as a developer write that extension and then monkey patch which is refers to replacing the symbol with your FFT function on import time. To make that available to downstream programs or create your own distribution or binary of NumPi and build it against your particular FFT library. It's not difficult to do but it does take effort from the developer to do that. Fair enough. Let me go back and ask you a little something that's near and dear to my own heart. Are there any algorithms that paralyzed that go across either multi core within the same server or perhaps even use MPI and spread across multiple servers. So only if you render library supports that there's nothing inside of NumPi that that currently is taking advantage of that. That's definitely one of an area of interest. Certainly many of the low level loops could be paralyzed. There's some work that needs to be done to make that that possible to make that happen automatically. But the MKL linked NumPi that we provide and then you can also get from other sources if you have the MKL libraries does use multiple threads or multiple cores if they're available. So right now it's basically third party vendor libraries that NumPi links against that supply that but it's certainly there's many places in NumPi where that could be optimized. Just again requires somebody with an interest to make that happen. So you're right now not relying on too many external libraries. So I'm guessing you guys aren't using anything like swig to kind of quickly kick out a Python interface to some C library like you're actually writing all these Python interfaces by hand to make sure they're the way you want them. Well so that's that is interesting question. Certainly when I started working on Python that was really the way I preferred to do it. Because you could have full control over what you linked against and what C code you called and how you call it. Swig has been around for a long time as well and I would use Swig for example the FFTW library I wrote 10 years ago to link against FFTW was a Swig interface. NumPi itself doesn't really do a lot of that because you know a lot of that's in SciPy. It's more of a library level call. For a long time I've thought of NumPi really is just the array object and the universal function which is the addition, subtraction, exponentiation, sinusoids, cosines, those calculations. But historically NumPi has also included some basic linear algebra basic FFTs and basic random number generators. And so those have remained inside of NumPi even though there's additional better versions of those in SciPy as well. So it's the linear algebra FFT and random number generators along with any other library, mathematical library, machine vision, machine learning, image processing. All of these libraries can be, you would use something like either write your own extension module or what's happening now is people are writing a lot of Scython modules. I'm not sure if many of you are Scython but Scython is a simple, it's decorated Python that gets compiled to C code. So you've mentioned now SciPy twice and I know having built SciPy in the past that it relies on NumPy. There's a real quick elevator pitch on what SciPy is because we'll do a separate show on SciPy. Sure, yeah and SciPy deserves a separate show. It's really a large collection of additional libraries to do things like special functions, optimization, more extensive linear algebra, more extensive statistical analysis besides just random number generation, also getting PDFs. There's curve fitting techniques and approaches. There's some generic algorithm, start of generic algorithm, I guess that was actually removed. So it's just basically a large collection of any kind of generic operation you'd like to do with data. So how many other third party libraries? What are some popular ones besides SciPy that rely on NumPy? Yeah, there's quite a few. There's many PsyKits.image processing, a lot of PsyKits which are kind of another way for people to contribute to the SciPy community without having their code go into the larger SciPy ball of code. People have written things like PsyKits.image, PsyKits.stats models, there's Pandas, there's Larry, there is CVXopt, a lot of there's some partial differential equation calculations. It's actually grown quite large and I haven't kept track of all of them. There's a lot of third party packages that use NumPy. NumPy provides a CAPI as well. So a lot of the functionality you can do in Python you can also do at the C level. So you can create another extension to Python that uses some of the functionality of NumPy without going to the Python layer. Okay, so hold on, let me make sure I understand that properly. Did you just say that a C developer can write cool new functionality and you provide the hooks to allow that to be exported into Python? Is that what you said? Basically, a C developer can write an extension that is then used by Python and that extension can make use of low-level NumPy functionality, things like the convolved function or the standard deviation function. And that uses the same code that NumPy itself uses without going through the Python layer. Of course, a C developer can always call Python and create NumPy objects and do the equivalent of that level, but it allows you not having to go into Python and back down. In fact, in that point, the refactor of NumPy that was necessitated by trying to create an iron Python port of NumPy really has created a library version of NumPy. It doesn't rely on Python at all with the idea of it being used as an array object for any dynamic language. And that's really NumPy 2.0, and there's a lot of things that can be done from what's there now, and I'm kind of excited to see what work will be done over the next six to eight months before NumPy 2 comes out in order to really support that concept of a library-based NumPy. Interesting. So you're saying that you just have a lot of good algorithms. You want to make them available both natively and C and quote-unquote natively in Python. Is that one of the major goals of 2.0? Yeah. And it's the algorithms and probably more than that, it's the basic data structure that's used and then exposing that data structure to a C-level framework anyway that can be overridden with your own algorithms with some default algorithms and then overridden with your own. It's a place to grow a large community of basic algorithms that work on a standard object that everybody uses. That raises a very interesting point there. So to kind of standardize on your object, you would kind of need to be interoperable with other things. So do your data structures play well with others, say like HDF or stable storage and other numerical kinds of libraries? Yes, I think so. I mean basically NumPy is in some sense best of breed of multiple ideas around data storage. One way, for example, what illustration is that the HDF data storage? NumPy data structures are very, very similar to an HDF file. And so in fact, we could have made the default storage of NumPy arrays an HDF file. We didn't do that because nobody stepped up to want to rate that interface to connect. It really just called the APIs from HDF. But it was remarkable to me to see the similarity between how HDF thought of data and then how NumPy thinks of data. So you already mentioned that you have a CAPI so that you can export certain functionality so that C developers can add functionality to NumPy but use some of the parts of NumPy that are already done. What about Fortran developers or C++ developers or any other Cuda developers for any other low-level optimized language? Yeah, so there aren't really any specific features there in NumPy itself. In SciPy, I guess I shouldn't say that, NumPy comes shipped with something called F2Py, which is basically an automatic parser of Fortran code that generates an extension module, the Python, using NumPy as the data structure to pass back and forth between the Fortran code and the Python code. It is an automatic wrapper generator for Fortran code. It's quite powerful. It's getting older. It's really optimized for Fortran 77 with some Fortran 90 support. There's a newer package called FRAP, which is optimized for Fortran 90, and a lot of modules are moving to using FRAP here. That's really where the Fortran integration comes. C++ integration is really a matter of just making sure you recognize that the Python API is really C, and therefore the NumPy API is really C, but you can of course call C++ code as well if you have the right extern C sections around the right places to put those correct hooks in your C++ code. And C++ is the same way. You just have to follow its standards, but then you have, oh, I need to write this method or this function call, and you can do it in whatever language you like. Some of that's made easy, and some of it is more manual. You have to do it yourself. Fortran is very easy to use with NumPy and Python. Tell us what else is coming in 2.0. What's on the to-do list? Yeah, that's great. So NumPy 2.0 really came out of a realization that to add date-time support to NumPy arrays so that you can have a NumPy array of dates and times required a change to the low-level application binary interface, or ABI. And all that means is in order to use this, you'd have to recompile your extension against the new NumPy. There's a desire not to have a standard version, a major version, require recompilation. So that necessitated that the date-time work, which was started really over a year ago, has to be put into NumPy 2.0. Well, in the meantime, a lot of other ideas have surfaced, things like restructuring the Ufunks to be centered around an iterator, a unified iterator. This is great work that Mark Wiebe has been doing. I'm really excited to see where that will go. There's definitely some refactoring that could have been done. The Ufunks structure really hasn't changed in the six years that NumPy has been out there. And it was definitely kind of pushed together rather than really well-designed. There was a lot of design work that went into it, but when you're kind of operating under the gun, you do a lot of things just to get it done. And he's taken the opportunity to create new structures at a lower level and then have the universal functions up on top of that. So I'm excited about what that will bring. The date-time support is one. It's a little unclear how much of the new ideas that are out there will actually get into NumPy 2.0, but some of the things people are thinking about are deferred evaluations, basically a pointer data type. I should say more about data types in order for that to make sense. So new data types, new ways to calculate, particularly for lazy evaluation or faster evaluation. So tell me a little bit more about the data types that are exported and the abstractions that are exported by NumPy. Why is that useful and how are they good things? Thanks for that. Yeah, NumPy is a very generic data structure for low-level data. You have a data type essentially. One of the things NumPy can be thought of is a homogeneous collection of bytes that is then described by another object called a data type object. These data type objects can range from anything from a Boolean object to a complex number to a structure of complex numbers plus strings, plus a Unicode character, plus a lot of other basic data types. Really, anything you can do in C, any kind of data type you can create in C, you can also create a data type description that allows a NumPy array of that type to exist. One of the ways this is really powerful is you can imagine any kind of binary data format from years gone by that you've stored on disk. NumPy provides the ability to describe that binary data format using this data type object. And then you can memory map that whole disk file into memory as a NumPy array and then use NumPy's slicing capabilities or field extraction capabilities to really do access just the parts of that array that you want instead of reading the whole data into memory in a very, very simple way. We've seen this used many, many times to really speed up both conceptually how you get data from disk into memory as well as the processing time because you end up only looking at the parts you care about instead of reading all this other data into memory first. So why don't you tell us a little bit about the NumPy community? Roughly, how much of the code is external contributors and how much of the code is provided by Enthought? So it's a good question. Enthought really doesn't have much to do with the actual code of NumPy. I wrote a lot of NumPy before I joined Enthought and have tried to keep up while I've been at Enthought, although I've found that working full-time does make it harder to contribute to an open source project and we've actually seen that in multiple projects. People do a lot of work on the project while they're students or in an academic position and then if they're full-time working, they tend to have less time and less contribute contributions. But as I moved to Enthought, a lot of other folks stepped in and have added significant contributions. Charles Harris, Pauli Bertanan, David Corpoo, and most recently, I'm hoping that some of the work that Mark Weave has done will get into NumPy. Ralph Gomers has said it has stepped up to actually be a release manager to make sure things get out on time. So it's only the work of those folks that makes the NumPy project move forward. I did a lot of work early on, I mean probably a year and a half of my life, probably 40 hours a week, to go from numeric Python to NumPy. But since then, I haven't had as much time and lots of other folks have stepped in to keep the project moving along. So it's definitely a lot of discussion takes place in the NumPy discussion list and it's lively discussion. It's a great place to ask questions about NumPy. You also get a lot of good ideas that are passed back and forth and debated. So it's a lively community and definitely a community project. Enthought itself, we just try to support it by sponsoring the SciPy conference, occasional sprints, and we certainly have hired a few people in the community. So just out of curiosity, a question I'd like to ask a lot of open source software projects and what repository system do you use for version control and why? It's a great question. So we recently moved from Subversion, hosted basically on Enthought servers. Enthought provided the Subversion repository. We recently moved to GitHub to use Git. And that was after quite a long debate and some discussion. And I guess the reason we ultimately moved to Git is because the primary contributors were really excited about Git and the distributed version control that it provided. And we had two options basically, Git and HG, Mercurial. And most people had more experience and liked the speed of Git and then of course GitHub as a social coding community. So we just recently moved there about four months ago. We're in the middle of trying to get the SciPy project over to GitHub as well. So I'm excited to see how the community responds to that move. So you mentioned mailing lists and things like that. Where can NumPy be found? Where can the mailing list be found? Yeah, that's a great question. SciPy.org, if you go to www.SciPy.org, that's really the community site for SciPy, NumPy, any of these things. And on that site, there'll be a page on the left-hand side that'll show you its developer zone. They're also maybe on the front page. A little box that shows you how to subscribe to either the NumPy discussion mailing list or the SciPy user mailing list or the SciPy development mailing list. NThought does sponsor all of these projects to the degree we can. We've hosted the data on our servers for a long time. We're continuing to sponsor the SciPy project, the SciPy conference. We try to organize it around the SciPy.org site, which is really a community-driven site. I have to emphasize that even though NThought does provide some financial support for this, we don't direct it. It's very much community-directed. Okay, Travis, thank you very much for your time. This show will be up soon, and we will talk to you or possibly one of your co-workers later about SciPy. That sounds great. Appreciate that. Thanks, Travis. Yep. All right. Thanks, Brock and Jeff.