 Welcome to another edition of RCE again, this is Brock Palin up in Michigan where we're finally getting rid of snow No more snow and I have Jeff squires down in Kentucky Jeff. Thanks again for helping me out Yeah, Kentucky where we're getting tornadoes and thunderstorms and all kinds of other bad weather, but it's nice today So I can't complain not too much at least So you can find the RCE shows online at RCE dash cast calm There's a link to subscribe for itunes and RSS feed there You can also find all the old shows because I noticed that itunes only shows like the last 10 shows or so Also, Jeff has a blog and works on open MPI and a few other pieces of information. So Jeff you have some information about that So yes, I have a blog out at blogs that Cisco comm my corporate overlords like me to mention that on here And it's also linked to from the RCE cast site. I very very occasionally tweet something But I do every every once in a while But I think more interesting than all of that stuff that we say at the beginning of every show is that We now have another member of the second time guest club on today's show Yes, our guests today who is a repeat is Travis Oliphant But we also have with him Anthony Scopats and Warren Oh Wekiser I Think I just screwed up his name This is this show is very well known for getting everybody's names absolutely perfect. I just want to go on record saying that So guys, I wonder if you could introduce yourselves using the proper pronunciation of your names Sure, it's great to be on the show again Brock and Jeff. Thank you for inviting us back My name is Travis Oliphant and we have Anthony Scopats and Warren Wekiser We're actually here in Austin where it's beautiful today Glad you're getting a snow in Michigan, but I don't think we ever got any down here. It's been a great winter We've got some fires, but they get a few fires, but We're happy to be here. I'm happy to answer questions or talk a bit about Sci-Fi. Yes today. We're gonna be talking about sci-pi that's SCI PY a scientific Python Travis was nice enough to get some other sci-pi folks that worked on it with him as a nice follow-up to RCE number 48 where Travis talked to us about numpi Which I believe it has some sort of relationship with sci-pi and we're gonna get into the guts of that here in just a moment So guys, can you give us a little bit of breakdown each one of you how you're involved with sci-pi? Sure, I'll start then hand over to the other folks. This is Travis Sci-pi really started back in 1999 And my involvement in sci-pi was to create some of the early modules that got rolled in the sci-pi in 2001 And then I worked with Eric and Piaru to actually create the name sci-pi and put all these modules together Well, this is Warren I'm basically sort of a user and or and the developer in sci-pi. I started using it several years ago before I even got to end thought When I was in academia using it just for research and plotting nice plots of solution different equations About a year and a half or so ago. I got involved in actually contributing to and adding features to sci-pi So I'm both the developer and an active user of sci-pi Hi, and this is Anthony. I'm I much like Warren was using sci-pi to do research in my graduate program and then I'm also sort of a fair weather developer on on sci-pi But both Warren and I are or Warren is the program committee head for the sci-pi the corresponding sci-pi conference This year and I am all chairing a track in that conference as well the Python and core technologies track. So Sort of involved in the community otherwise Hey, let's let's get a little plug for that right now. Give us the 30 second spiel on the sci-pi conference When is it how do people attend? What kind of program can they expect stuff like that? Sure, I'll try to keep it to 30 minutes or 30 seconds, I mean I Get the chance to be the sci-pi co-chair this year Sci-pi happens every year and this year it's in Austin, Texas And it's the week of July 11th to July 16th starts with some tutorials It's a great time to actually learn sci-pi and Python for scientific computing in general great people come and give Excellent tutorials very inexpensive actually by the way And then after that there's two days of conference and then two days of sprints after that So it's a it's a full week starting on July 11th and ending on July 16 there will be a Two special tracks this year one for Python in data science as well as another Python and core technologies where We advertise the fact that scientists use Python Their 10 of computing needs because it gives them exposure wide community of other tools that they need to use besides just the nice algorithms That may be linked into Python and that will now be available at the sci-pi conference information about those and a more More depth than perhaps in the past Registration is online go to Conferences dot sci-pi.org in fact if you just go to the WWW dot sci-pi.org site There'll be a sidebar that says conferences and that'll take you right to There's actually three conferences in sci-pi throughout the world But there's one happening in the US this summer and you can go to that link and there's a link there to register and Registrations open until really middle of June So there's there's plenty of time to register and even after that if you can come and register on site as well So sci-pi scientific Python What is it because like we talked about numpi on show 48 and that was pretty useful to scientific users, too So what sets apart sci-pi from numpi? Yeah, I'll think I'll start just a little bit by Sci-pi really at this point is as many things that name sci-pi is Sometimes refers to the website Sometimes it refers to the community in general of all the people developing scientific applications for Python But originally sci-pi and it still is a library of Tools and kind of fundamental tools that people might need to do technical computing It's got things like optimization image processing signal processing interpolation linear algebra integration ordinary differential equation integration Different kinds of fundamental tools that somebody might need really building on top of numpi Yeah, I would jump in here and say that it's it's really about being a collection of stuff that Scientific users or scientific programmers don't want to coat up themselves, right? But you don't every time you want to go and solve your differential equation You don't want to have to go and write your own, you know stochastic integration function That that's not really a good use of your time if you're trying to Solve some heat transport equation. You want to you want to focus on the the science You don't want to focus on the algorithm development method necessarily so and then the reason it sits on top of Numpi is because numpi gives you this great data structure for handling your your Arrays of data, right? This is Warren Also point out that and much of sci-pi is actually Code that has been written over the over the past several decades. That's very solid reliable Fortran or C code The standard ODE solvers or optimization libraries and much of other ones and the sci-pi provides the Python wrapper for this You know solid robust code that's been around a long time So it's nice to have that so you get the advantage of these powerful solid reliable libraries and also this the Benefit of Python language, which is a very nice powerful language for doing all the other work or besides the core number crunching So is that a good way to differentiate between sci-pi and numpi is that numpi is kind of the the glue Underneath the the data structures and things like that and sci-pi are the algorithms and other things that put together and use those data structures Yeah, that's reasonable. I mean I think numpi at the core part of numpi is sort of the you know An array an array object with a lot of powerful ways to manipulate arrays Again much more efficient than you can do in pure Python And then sci-pi in fact many other tools decide sci-pi use numpi as that for its core array data structure Matplotlib is a plotting library that uses numpi as a basis Many others I'm too I can think up on time ahead, but yeah, yeah There's a lot of tools When you come to Python you recognize it's so easy to write code Scientists recognize that as well, but they need access to some of these fundamental tools like a run to cut an integrator or a Anderson ordinary differential equation integration tool or they need some Image processing capability sci-pi is really a collection of those tools the library sci-pi But this sci-pi community has as Warren was hinting at also created a lot of other tools that build on top of the numpi data structures and give you access just to wavelet transforms image reading and writing capabilities additional optimization algorithms It's a it really is that that need is larger than anyone library and that's sort of what we found out Kind of over the ten years or so the sci-pi Organization and community has been evolving it started by with just a bunch of people who came to Python together and really like coding with Python Typically coming from Matlab or IDL or even some people from APL and older languages like that And then finding so much really almost pleasure in writing code in Python and wanted to extend that that pleasure to all of their Technical computing needs but needs a certain tool still that was really the motivation behind sci-pi How do we get access to these libraries that are there? That may be sitting on netlib and Get them into Python So actually I want to ask you more about that motivation So you mentioned earlier that you had wrote a lot of the original modules and stuff that became part of sci-pi Who was the driving force and what exactly happened? I got all these independent Python modules that call these lower level things that maybe lived on netlib and put it together And really organize them into the sci-pi package So that's a good question There's a there's a lot of voices and one of the things about the sci-pi community is is a loosely knit community Basically people who are Interested step up and do the work they're interested in doing and so it's it's led by a lot of different people Early on, you know, I was a young graduate student I wanted to use Python more in my work, but I kept having to go back to MATLAB so I ended up writing a lot of finding code on netlib and Wrapping into Python over the course of 1999 actually was a really particular year where about ten different modules were created And then I kind of organized into something called multi-pack Eric Jones wanted to start and thought and he noticed Multi-pack out there and Piero Peterson was very actively involved in multi-pack kind of helping write a lot of scripts noticing I was doing a lot of hand wrapping of F2P of Fortran code and he wrote a tool called f2pi that made that hand wrapping much more Much easier So basically it was a collaboration between Piero Peterson and me and then Eric Jones was probably a driving voice in the in the Organization to create something called sci-pi. I think the name sci-pi came from Eric in fact I think my voice was pi lab is what I wanted to call it and he he decided sci-pi was a better name and we all agreed and then he also Butting the company and thought funded early development of the package sci-pi I spent a bunch of money in getting the builds created turned out most of the work was making windows builds So it sounds like two of the focuses of sci-pi are you know performance and usability, you know saying the target audience are You know scientists and engineers who just need to get work done and they want to do it in a nice easy language Like Python easy yet powerful. So let me ask you kind of a strange question With those two goals in mind. What happens if they come into conflict? Which one wins? Um, well, I I don't think they have to I think it's a false conflict actually So there should be a way to get the performance that you want and Also get sort of the high-level Sort of beautiful representation of the code that you need as well So if that's a matter of like dropping down in to see your Fortran and writing some C in Fortran code that you know is It's nicely with numpy and it's gonna be performant and work work well And then just writing a wrapper around that so that you can expose that functionality back to Python That I mean that's sort of how what sci-pi does in a lot of instances like that that's kind of what sci-pi is It's it's a nice wrapper around other Terrible but powerful interfaces Right so things like lapak and Blast and Atlas and all these other things that are really really powerful and they they're linear algebra So they've been worked on for 30 years or something and you know that if there's a bug It's gonna be really really obscure and it's gonna be really really fast But you don't have to worry. You don't necessarily as a scientist want to worry about getting your Array like double pointer array thing of floats to to shove it into that that code You just want to like here's my data go do what I want with it and I think that's a fair description of That sort of balance And I'll also add that this is Warren again in so many cases I A good chunk of sci-pi is where wrappers around these classic Well-known robust solvers, but over the years new codes been written in pure Python even or Python using NumPy Britten from scratch rather than wrapping an old library And even there oftentimes the beauty of using Python and NumPy is that you can get something working quickly It's like I if I use the right NumPy Abstraction so on I can get this code working. Maybe you say for signal processing I can make a filter that does something nicely and pure Python with NumPy and then you got something working quickly if you find out that it's not fast enough Then you can kind of go to the next level Okay, how can I start merging that back into say a C language versus Python? And that's actually a fairly painless process to convert from Python in various ways to under to a bit a faster Implementation that's that's pretty common in fact pattern arises in development. It was Python and sort of is Going from pure Python into faster wrappers if you need it And that actually that sort of strategy. I think it's been Driven by the scientific computing or scientific computing community more than the sort of pure Python community, right? not that necessarily like web developers don't care about speed and these sorts of things, but It's mission critical for scientists and engineers to be To have fast code in a lot of situations where whereas it's just it's not it in in other industries, I guess So interpreted verse compiled languages Python being one The traditional high performance languages Fortran and C being another why use one over the other when we're already spending millions on computers What's the value of maybe giving up a little bit of performance for a little ease of use or investing program or time for a little more performance? That that's a good question. It really gets at the heart of why people Like Python and why it's being used so so much a lot of it is about Code reusing code If you're talking about a library that just has to run as fast as it possibly can and the API API of that library is well understood You know exactly how you're gonna call it and exactly what data inputs it needs Then yes, that should be sort of written in machine code if you can get away with it You have the time to spend on making that as fast as possible But the number of of code that the amount of code you have to write that's like that is very small Compared to the number of the amount of code you have to write Most the code you have to write you're not sure what the inputs are you're not sure how it's going to change in the future You're not sure what the the circumstance are gonna be that you wanted to run under and so what you want is code That could be refactored easily that could be understood by somebody down the down the hall somebody down the years That's gonna see it later That's much more important than the fastest code that could be is possible out of the machine So Python sits in this space where it can it can reach down and call machine code And you can there's tools being developed that allow you to create machine code out of as fast as C can produce as well as High-level code that is easy to refactor that expresses abstractions that you can then share with somebody else So it's it's a good question. It's an evolving question I think everybody has a different answer to it But the empirical reality is a lot of people are using Python and really Enjoying their use of Python for this purpose So that's a that's a good lead-in you guys have kind of implied earlier in the conversation a couple of times that the Internals and the guts of sci-pi are not necessarily written in Python. Can you elaborate on that a little bit? Parts of it are Python, but a lot of the core libraries are not a lot of the core. There's actually a mix of Fortran C C++ Also a language called Scython, which is sort of a Hybrid of Python that can be converted to C in a nice way And generate so fast C code but in a Python style So yeah, so much of the underlying Core albums are actually written in C or Fortran and we're where which is fast Which is nice about that it gives you fast code and Python provide a sci-pi provides a wrapper around those so you can call these functions from Python so you can use Python for your development language But when you actually want to go and solve linear system or solve different equations, you're calling a Fortran library So you get the benefit of both the fast Fortran and the facility and the power of Python as a development language So if you have all these languages living inside this and you mentioned, you know calling stuff from netlib earlier Well, how much is it involved building sci-pi? Building sci-pi can be a challenge. That's probably been its largest challenge in the past Most people today get their sci-pi from a site that has a pre-built binary Sci-pi comes in some of the distributions on Linux. I'm like Dibian Ubuntu. Yeah, there are sci-pi builds Of course, we have an end-to-python distribution that contains sci-pi It comes with sage sci-pi is available in Python X Y So majority of people today get their sci-pi from somebody else that builds it It's still a pain to build you got to have a good working Fortran compiler that's up to date You got to have a good C++ compiler, but fortunately development in the GNU tools for compiling have really helped us out there G Fortran is pretty big with us and GCC works pretty well Do I also have to build my own blahs like hack FFT library or do you guys do all that for me? Good question either one We actually recommend and in fact require in sci-pi that you use an optimized Bloss something like Atlas or something like the MKL or AMD has a similar a fast kernel and If you do have to build Atlas if you don't have it pre-built in order to get sci-pi working or have access to MKL or on the Mac for example the accelerate library that macOS 10 provides it links to that directly So yeah, most flavors of Linux right there the package manager will take care of this for you but if you wanted to get this on Mac OS X or or so or Windows or something then it's actually probably easier to Get a binary of sci-pi from someone else because it's much harder to develop on those systems in those environments Now by the same token, let's say I'm a developer and I want to add something to sci-pi You guys have talked about the extensibility of it before as a developer. What what kinds of things do I have to do? You know, I found some New cool numerical library and I want to you know, that's written in I don't know cobalt whatever and It I want to make that available in Python. What kind of things do I have to do? one of the recent developments in the sci-pi community is actually the migration of the sci-pi Extension modules that were previously hand-coded many of them were the same hand-coded extension module. I wrote 10 years ago Still working, but you know embarrassing for me personally to see all that code there that the way I wrote code 10 years ago But we've been migrating a lot of that to the such a scythe on and Microsoft's helped us in that in this endeavor because They really want a numpy and sci-pi on the dotnet stack and to do that We essentially migrated handwritten extension modules handwritten modules in C To as this scythe on language and then the scythe on could both generate C code as well as generate net code So that was a strategy we undertook and it's been very effective for the sci-pi community because it's It's given a lot of patterns and a lot of examples to many people out there as to how to create an extension module That can link into sci-pi very easily by using this great technology called scythe on So what is the scythe on is this part of the sci-pi package like a meta language you guys have created or is this something completely different? So the way I think about scythe on is that it's really a pigeon language between C and Python now Historically it comes from a package called Pyrex and someone jump in here if I'm wrong Which was related to sage or I think what's written by Greg Ewing in Australia The sage project from William Stein, which was a desire to get a symbolic Mathematical toolkit really he calls it the car instead of the parts of the car He wanted to pull all these things together and replace maple and Mathematica with an open-source tool Sage needed to be able to write a lot of extension models to Python They took Pyrex, which was this pigeon-like language and Essentially created scythe on forked Pyrex friendly way made scythe on scythe on really is headed by William Bradshaw The Robert Bradshaw excuse me Robert Bradshaw and and William say William Stein from the sage project That are available constructs that are available to you and see that aren't available in Python You can also include in this sort of Python like language, right? so type defying on on Function call signatures or things like that right adding type basically adding type information getting around the Python interpreter is another big one or You know head dealing with pointers and memory pointers and things These are all Abilities that you have in Scython if you want them But they it doesn't force you to have them So it's very similar some of some of the Pythonic ideas are still there But you really do have access to the full power of a compiled language as well So over the past two to three years The introduction of Scython as a method for creating extensions to Python has really dramatically grown the number of modules that people are writing Linking this old C code or else inventing new C code and linking that into Python It really does bring this promise of fast code when you need it yet easy to modify and understand code where when it's appropriate Traditionally wrapping C code and exposing that to Python is something that is sort of difficult to do It's a non-trivial task. You end up writing a lot of boilerplate code and so Scython really the advantage here is that it Minimizes the need to do that boilerplate stuff. So it makes it easier to write fast code So I think we may have missed that in our conversation here But it that's why you have more people writing in Scython than you do have people writing strict See headers by hand. It also makes it easier to maintain So Scython as as Travis mentioned before has a number of different back ends that you can compile to so yeah There's a dot net back end. There's also the CPython back end, but within the CPython back end you can actually You can compile code down to different Python versions So it really it takes care of like migration needs sort of automatically the same Scython file can be converted into a Python 3.2 Extension module just as easily as it can be converted to a Python 2 4 extension module And that that's really really powerful that saves on a lot of development time and That kind of thing so But all of this is about developer community surrounding sci-fi It really just enables and empowers that community for the user of sci-fi They don't really necessarily even care that how it's written It just gives them access to really nice fundamental libraries that are easy to use from within Python So let's talk more about the user of sci-fi How many public-facing Methods are available in sci-fi right now and probably list some of your most popular ones That's a good question. I think there are I should know this but there's there's around 20 different sub packages I Can tell you the most popular ones I just did a Google code search to see who was using sci-fi in public-facing code and the most popular modules are optimized The stats module and the linear algebra module Some a few more people will use interpolation Integration but it pretty good goes down pretty quickly from there and that that dovetails with my experience as well And it's also why numpy Itself contains some ability to linear algebra as well as a bit of ability to generate random numbers That's those are the 75% use cases for most users of numerical libraries Sci-fi just extends that and creates a long tail of libraries So going off in a slightly different direction. We were talking about users there. What about the community itself? How does the sci-fi community work? You know and I assume that community kind of includes both users and developers and maintainers and vendors and a Wide swath of people Well, as both the this is one as both a user and developer at the community's Let's see Users from the point of view of users. It's there's often just well, how do I get started with site with a with a with a sci-fi? One common way is to what you learn you got you got the optimizations You know how about optimization you maybe ask a mailing list somewhere How do I optimize this function and that will often lead to people say well sci-fi has optimization libraries And so they get into the they go into documentation on that available for sci-fi or going though onto the Sci-fi user mailing lists and ask I had to optimize my function How do I do that and often it was very active community on a mailing list for getting help in using sci-fi? As people use it more and more they start thinking you know, I wish this optimization code did had one more feature and That might lead to somebody saying well I can make it better if I do this and it might just you know contribute a patch You know a change by the mailing lists or now as people have ideas for improving sci-fi They can Start we recently moved over to using git as a source code control system Which provides a nice way for people to contribute code in a way that they can put their change on a web page and Tell people it's there and it can be reviewed and and possibly merge into the sci-fi code itself I'll echo a little bit Warren said the Python community has always heavily used mailing lists in order to coordinate and communicate And sci-fi is no different. There's in fact three mailing lists associated with the sci-fi community the first is numpy discussions and We talked about that. I think in the previous podcast the two for sci-fi are a sci-fi dev and a sci-fi user You can find both of these mailing lists Related to sci-fi at the sci-fi.org website There's a section for developer zone and it'll take you to a link where you can sign up for the mailing list And that's where most of the conversations take place Recently with the move to github that does provide another environment where issues can be raised and comments on those issues can be raised and Discussed and that is also part of the community now but how it works is that people who are interested who have itches to scratch get involved and Talk about what they care about and they'll post an issue on the mailing list and if anybody is interested they might push back there are people who have been around the community for a while and Sometimes those voices are heard a little stronger, but really anybody can jump in and it's about it's it's an activity level driven community People who are interested drive this story forward that said I would say that the Common first experience with sci-fi is in fact Google and The common second experience is the documentation and the sci-fi documentation along with the numpy docs are really really good It you can go years and years as a sci-fi user without ever having to jump on to a mailing list The the stuff is pretty well documented and what it is and what it does And that's really about sort of the quality control that the developers have on themselves in terms of not Checking in code until it has at least a pretty decent doc string that explains what it what that thing is supposed to do You know and that that's one of the main advantages. I would say actually of the of the project So what are some new things that we can expect that of future versions of sci-fi? What things are being worked on what are being developed, you know by you and or by others? well on my current to-do list There is a package called sci-fi.signal which is the signal processing library And over the last you know six months or so we added some better filter design tools I've got more in the works for that improving some of the filter design and There's been recent work on adding If I wasn't involved if I just followed on the mailing list it was work on adding non-uniform fast-foray transforms What else are you guys familiar with this coming up that people are working on? Well one of the ways to find out is actually to go to the sci-kits webpage and look at what kind of sci-kits are being produced a lot of sci-fi Bug fixes and feature enhancements to particular packages and sci-fi are ongoing But new additions to the sci-fi library really kind of go through the avenue of becoming a sci-kit first And so you'll see things in the sci-kits world like stats models We've known for a long time that sci-fi could be used for exactly the same way people use are But there's still there's a couple of missing things that folks in the our world really like and some of those things are actually being Added in the in the sci-kits community things like fast man processing or missing data processing things like The ability to set up a model and then do quick regression analysis on that model including analysis of variants Lots of that sort of work is happening in the sci-kits world and I expect that to be rolled in the sci-fi over the coming months and years There's also work going on very active it's sort of the fact that there's there's several packages that are having a common themes going on There's the Larry label to raise There's the a pandas that what's McKinney's been working on. There's just kind of a whole area of having Often based on time series data about allowing for missing data from your time series an imperfect time series How you represent the data like that? How do you label it? How do you keep track of dates and so on? That's definitely a very active area of research really in terms of what should what would a good library look like that? dealt with this sort of imperfect time series data often driven by One very common area where it's being driven by isn't from the finance world is having lots of time series They want to analyze and have tools to do that on that note actually I think it would The sci-fi community and development would really benefit a lot from having more sort of financial Engineers involved a sort of off on their own island a lot of the time they're users of of a lot of this code that comes out of other scientific and engineering disciplines But it would be great to have them sort of more involved in the community as a whole Okay, guys, well, we're going to wrap up here So thanks a lot for your time again The sci-fi website the sci-fi conference and I hear one of you guys host a podcast also Yep, so I host a scientific computing podcast called insight that I n s c i g h t You can find us at insight org And we talk about a whole range of scientific computing issues Yeah, it's a lot of fun. So they have a rant section at the end. That's particularly fun to listen to One of our biggest fans here So again sci-fi sci-fi.org The sci-fi conference July 11th through 16th in Austin and thanks a lot again guys for your time Thanks guys