 We are going to be talking today about scientific computing with Python and it's really fun to see how the community has grown and how much is going on these days in scientific computing, specifically with Python. There's a conference this year that is really, I guess we're probably in our sixth year now. And so if any of you happen to be in Caltech next month, we'd love to have you at the SciPy conference. This is a group that's been meeting I guess since 2001 or two. I can't remember when we initially started, but it's specifically for really the scientific community. SciPy, as we'll talk about a little later, is a package of libraries or toolkits for doing scientific computing. But the conference is really much larger than that. People come and talk about all kinds of different things from astronomy to genomics to neuroscience. So it's an interesting place to go and listen to talks. So please consider that if you're in the U.S. So what is Python? One of the questions that comes up is how many people here know FORTRAN? Y'all use FORTRAN. How many are C++? How many are MATLAB? Yeah, so there are all these other languages out there that actually have been used for 20, 30 years, some of them, in scientific computing. Why are we switching, or why has there been this large shift to using or to looking at the languages like Python? I guess there's not been a large shift yet. There's still, those other languages still dominate the landscape, but Python is really growing in popularity. Why is that? Well, I think it was, well, I can't remember the person that actually came up with this one-liner. It's the Python programming. Mark Lutz is the guy's name, I believe. And he had a quote that said something along the lines that Python is strong because it allows you to do everything that you can do in a compiled language like C, C++, or FORTRAN, but it's a much simpler language. And that's largely true. In the scientific world, you know, there are places where we run into where Python isn't the choice, but when that's true, then we can, it turns out, Python interfaces very nicely with these other high-performance languages such as C and C++. Now, you're not going to sit down with Python and probably write an operating system. That wouldn't be your choice, you know? But most other things that you look at, you know, if you look at building a tool for network analysis, or for writing a network-based program, or writing applications for database queries, things like that, well, Python works really well for all of those. And there was even, there's a guy named David Beasley who was contemplating, and he may have done it. He taught at University of Chicago, and he had his students implement the TCP IP stack in Python. So there's not a whole lot that you can't do. Sometimes it's not the fastest choice. But it really works well for these things. And because of that, Python has this huge breadth of capability. And if we can add the scientific side to it and get the libraries in for that, then we have this wealth of capability that we can add to having very rich applications and scientific computing. Hopefully you'll see that over the course of the class. So what are the things in Python that make it really strong? Well, you know, automatic garbage collection is a major feature of a lot of languages these days. Java and Ruby and Pearl, all of these languages, if you've, I think a lot of people raised their hands with C and C++. If you've ever chased down a dangling pointer, you know, you know the pain that comes from that. And there's one, I think it was Eric Raymond who made the comment that probably 80% of the programming errors that come up in applications in these languages are from these memory errors and the effort that goes into fixing them. So if you can avoid that, it's nice. Now Python's not perfect. You're going to end up with memory leaks in programs that you have to chase down. That occurs. You have to pay attention to your resources. That's true in all of these languages. We haven't come up with a utopian garbage collector yet that works for all problem sets well. But it works very well. The other thing is dynamic typing. And this is a critical thing for scientists. When you sit down, you know, scientists are really worried about problem sets where at the end of the day, you don't know where you want to be at the end of the day when you start the day. You have some data set. It's in some file format. You need to read it in. And now you're going to do some analysis. Maybe you're doing statistical analysis. Maybe it's a signal. You read in the signal. You plot it. What does it look like? You take an FFT of it. What's its spectrum look like? Ah, there's some noise up here or noise. I need to filter that out. Now I have some colored noise in some area that I can notice. I want to filter that out and look at the signal. And so, you know, there's this process of not knowing where you want to go. Well, the dynamic typing and the interpreted and interactive notions in Python are really critical to this. You can import your data. You can sit there and play with your data. And you can say A equal 1. And then A a few minutes later can equal array. You don't have to think about, oh, I've defined all these variables as a specific type. It allows you to play with it very quickly. And then the interpreter is critical because you can interact with your data very comfortably. So both of those are really important as scientists. And really this is, you know, MATLAB came after C and C++. MATLAB had a lot of strengths as far as having vector calculations and that sort of thing built in. But I think one of its major strengths was that a scientist could sit down. They didn't have to type their very specified types on their variables. They didn't have to write a compiled program to run and analyze their data. They could sit there at the command line, type things in, see the results immediately. Python has that same strength. Another major strength for Python is its object-oriented design. It's really object-oriented from the ground up. It wasn't an afterthought that was added to the language. And, you know, a lot of these, the features like dynamic typing and interactivity are really critical to the scientist and in exactly the same way the object-oriented notions are very critical to the computer scientist. And this is a major importance to the scientific community because we do write a whole lot of things where we don't know where we want to be at the end of the day and we mess around with their data and play with it. We come up with some interesting algorithm. But at the end of the day, once we have that interesting algorithm, we don't want it in a script that gets passed around the world. We want to create an application that this lives in that has visualization, domain-specific objects that people can work with, a nice interface, all of these sorts of things. To make it more useful to end users. And to make that happen, typically we need the help of some software architects to come in and build out this larger framework for an application or around our algorithm. You know, if you look at applications, I worked in electromagnetics and there's an algorithm called the finite difference time domain algorithm and it will run for days if you want it to or need it to to calculate what the radar cross-section of some object is or what the properties of a circuit are, what the radiation of an antenna pattern is. And it will run a long time. I mean these take days to run on large problem sets. The code base for a typical problem set or for a typical empty TD algorithm, I would guess is somewhere between 10 and 50,000 lines of code. And if you look at the algorithm, the algorithm is about 50 lines of code. It's very, very small. And so all of this other stuff around it is really the pieces that you use to read in files to allow a user easily to set up the problem to post-process the problem, that sort of thing. Well, all of this stuff around really needs, if you're going to have a maintainable long-term application, it needs to be built in a robust way. And one of the tools that's currently used heavily in computer science for building these sorts of architectures is object-oriented programming. So that's a critical piece. The notion of batteries included has been talked about quite a bit in Python. And I think the library is probably going to shrink a little bit the standard library as they clean up some of the pieces. For a while, the notion was this battery is included is Python comes with every library you might want to need for your application. So if you look at the standard library, there are all kinds of tools for URL live and email tools and telnet tools and things to deal with accessing pieces of the file system. I think the library is probably 80 or 100 libraries now. But some of those also deal with libraries crept in that deal with the audio format for SGI computers and things like that. Well, so they were probably a little loose in the beginning with the number of libraries that got in. Now they're going to start trimming those out as time goes on. But that huge library of capability is really an important notion because you get out of the box something that really solves a lot of problems. Free, I think, some places that's not very important, a lot of places that is very important. Matlab licenses are expensive and mathematical licenses are expensive. It's very nice to have a tool that if you build your application, you can send it to anybody and they're able to grab a piece of software off of the net and run it. So I think that also, you know, that's one of the reasons Python has grown as well. It's hard to be free on price. Portability is another reason it has grown very quickly. Python runs on this. You know, you can run it here. You can run it on the big iron machines that come from SGI and from IBM as well. A lot of national labs in the U.S. are running Python on their supercomputers. So portability is important. I had this notion in grad school to point out how easy it is to learn or use. You know, I wanted my advisor to be able to read my algorithm. He may not be able to read the application, but he could read the algorithm and understand what I was doing. And so one of the questions was when I wrote a computer program, could I show my advisor and we'd be able to talk about it and Python passes this test. The syntax is easy to learn and understand. There's not a whole lot that gets in the way where you have to learn what a lot of different symbols mean. The other point is that it has a very nice and elegant numeric syntax for doing vector cooperations on arrays. And we'll talk about that tomorrow. But that really helps in the scientific community. And then the modularity of things. Python encourages you to build code that is modular so you can build these libraries that get handed around and used in this modular way. And modular programs not only are easier to reuse, they're easier to maintain over the long term. So that's kind of an overview of Python itself. And I guess why we're here today to talk about it. The course from here today we're going to focus on the language itself. So today we'll focus on the language itself, get everybody up to speed on that. This afternoon we'll have hands-on laboratories, which will give you an opportunity to sit down and work through problems. Prabhu and I will be there to help out as we work through the problem set. Just to give you an idea of who's using Python, one of the major first users I think was the Space Telescope Science Institute. Those are the guys that process images off of Hubble. And so it's used in really the pipeline for calibrating their equipment and also processing the images off of the telescope. And maybe a slightly different area, Hollywood, is also a major user of Python. And all of these companies are using it. It's funny to go walk around the Python conference and you'll see Spider-Man t-shirts or the Hulk t-shirts. But the place where it comes into play, these companies have very large rendering farms, right? I mean, they have hundreds or thousands of Linux machines usually in some back room. And each of those machines, once somebody has set up a computer-generated graphics scene, each of those machines will be in charge of rendering one frame. Well, the process of splitting all of the information out to those machines and managing that is often controlled by Python. There are also tools for doing the masking and putting together if they created a scene that's computer-generated and they need to put a person in, then the process of how that's done is often done with Python. So it's kind of interesting that it's there. It's also heavily in the geophysics industry. We work a whole lot with ConocoPhillips. There's also Shell. Saudi Aramco uses a little bit. And so it's starting to work its way into that industry. Google has hired Guido Van Rossum. Alex Martelli is there. There's a number of people out of the core community at Python that are actually... Guido is the creator of Python that are on the Google staff. And Greg Stein in one of his talks, I believe it was at Ascon, talked about the usage. And he named C++ and Java and Python as the three most heavily used languages there. So that's another area to use quite a bit. I was interested to see that PaintShop Pro in their version 9 came out and they chose to use Python as their scripting language. So we're all used to probably Excel and Word and these languages are these applications. Well, you can always script those applications with Visual Basic on the back end. Well, Python was embedded by PaintShop Pro, not maybe in the most transparent way, but still it is the scripting language that they use to drive their application. And that's a really good use of Python. I hope to see more languages over time that really have Python as their scripting engine. And we really want to encourage you to use it as the scripting layer for your scientific applications. If you've ever loaded a Red Hat installation, or at least this was true in the past, I think it's still true. One of the very first things that was installed is Python. Because Anaconda, which is their installation application, is a Python application. So it runs through. And then Procter and Gamble, we also work with them quite a bit on building fluid dynamics applications. And these are applications that have a whole lot of, they have CAD that has to go through meshing, that has to go and run on a supercomputer and then come back and be visually analyzed and the data needs to be processed. And then somebody needs to change the initial design of an application, of whatever they're building. So that's very common. I think there may be quite a few people in here in the fluid dynamics world. And so it's commonly used there. I became interested in looking at these graphs that Tim O'Reilly produces. Tim O'Reilly is the, or I guess at least analyzes, that Tim O'Reilly is the publisher of the O'Reilly computer books. You may have some of the books that have the animals on the front, their computer programming books. And he has a blog, and I think it's about every quarter. He does a reanalysis of languages. And this is all data from Nielsen. All the people in the U.S. have a, or some people in the U.S. have a box on the top of their TV that records what they're watching all the time. Nielsen takes that information and analyzes it, and then that's used by the advertising industry to figure out which kind of toothpaste people should buy or whatever. Who's winning in the market segments? Well, they do the same thing with programming books, or the book market in general. And so one of the things you can do is go through Nielsen records, what books have been sold every quarter. And I think O'Reilly has post-processed this data in a way where they can say how many Java books were sold, how many C-sharp books, how many, and the way they do that is they just look for the name and the title, or maybe in a brief blurb on the book. So if you were looking for the popularity of the language, this isn't a perfect measure, right? I mean, this is a proxy for what we're actually looking for. But it's a reasonable proxy. It's a nice little tool to look at is a language growing? Is it shrinking? What's its market size? So if you look at this plot, what we have is the whole rectangle is the whole market. And then the size of each rectangle is showing you the size of that language's piece of the market. And so you can see that there's some, the usual suspects are kind of the big ones. Maybe Visual Basic isn't one that we might think of from the scientific community, but if you go into corporate America or corporate India, guaranteed you're going to see Visual Basic runs the back office of half of these guys. You know, it's heavily used on setting up reports and that sort of thing. So Visual Basic is a major tool. Java has the largest chunk, not surprising. CC++ has a huge chunk. And PHP, that's a language that's heavily used on the web, really grown in popularity and has a large chunk. And then JavaScript is a huge piece. And not only is it a huge piece, but the colors that you see here and then the percentage number there, Java negative 14%. If it's red and has a negative percentage, that means that the market share from last year to this year has changed by that much. And so we can see that Java, even though it's the NC++, even though they're the big languages used, they're shrinking in popularity, at least if looked at from book sales. On the other hand, it's fairly believable that JavaScript not only is large, but it's growing very quickly. And this is because of the web 2.0 push with all of the Ajax and interactive web browsing capabilities. So that's critical. So we have kind of the old guard languages, mainly up there that are large, a few new ones that are growing that are also large. And then you kind of see this green kind of band across here. And those squares are getting bigger, and those are languages that are kind of the new guard that are coming in. And C sharp and .NET languages are a huge part of this. So that's some partially a testament to how strong Microsoft is and the size that they have in the marketplace. But C sharp is also a very strong language. And so that's another critical aspect of that. It's nice to program in. There's a very bright green square over there, and that's Ruby. And how many people have used Ruby are seen? So not a few. Ruby's heavily used on the web. It's a general purpose language, but the reason it's really caught fire, JavaScript is kind of driven by this Ajax wave. Does anybody know what's driving Ruby? Ruby on Rails. Ruby on Rails, exactly. So Ruby on Rails is just pushing its growth through the roof. It's not only has a large market share there, but it's also growing very quickly. So that's interesting to see. And Python is sitting over there, and it's similar, smaller market share and smaller growth than Ruby, but still in a very nice place. I mean, it's on the vanguard of the languages that are up and coming. And Pythons, you know, one of the nice things to think about there is that it doesn't have, it's not being powered by something that's in vogue right now. There isn't an application that's fueling the growth of Python. Python's strength is really very general. I mean, we're going to sit here and talk about scientific computing. There is a huge portion of the Python community that has no clue that Python's even useful in the scientific computing arena. It's really, they're just using it to... There's a company called Rackspace that uses it that has thousands and thousands of servers that you can... You just call them, they'll set up a rack for you, a server, and they'll co-locate it at their place. Well, their whole system is managed with Python. They don't care that you can do numeric processing. They just care that they can have a distributed system that works well. So there's this very wide range of users and really that growth is very broad and that's very important to, I think, the long-term health of the language and where it's going. And I want that to happen because I want the people outside of the scientific community using it very heavily so that I get to use their tools in my scientific applications.