 Welcome to another edition of RCE again. This is Brock Palin You can find us online find all the back episodes and nominate new shows at RCE dash cast calm You can also follow me personally on Twitter at Brock Palin all one word And you can also find my blog floating around off of the RCE website I also have here again Jeff Squires from Cisco Systems and one of the authors of open MPI and sits on the MPI 3 Foreign body and Jeff you've been pretty busy recently. Ah, yeah, there's always lots to do and MPI 3 is actually finishing up. Well, it's it's starting to finish up meaning that All the new things that it will be considered for MPI 3 are now in so the doors are actually closing But it's it's a busy time. It's good stuff Yeah, and you've been having some information about some of the changes of Fortran and some other like tips and tricks for Utilizing open MPI at a more detailed level over at your blog, which is also linked off for the website Also, I will be speaking at Globus world at Argonne National Lab Which is April 10th 11th and 12th be talking about some of the stuff We've been doing with Globus here at the University of Michigan and Jeff you've got Euro MPI coming up Don't you? Yeah, let me give a shout out to Euro MPI call for papers. The papers are due in about a month May 5th is the deadline for posters and papers and it's in Vienna this year. So beautiful Vienna, Austria Please be sure to get your papers in Okay, so let's go ahead and go into our guests today. Oh, we have two guests We have Matt Turk and Aaron. I'm a day. They both are authors on the MPI for pi Which is a set of MPI bindings for Python. So guys, why don't you go ahead and take a moment to introduce yourself? Sure, so my name is Aaron. I'm a DIA. I'm a research scientist at King of Dully University of Science and Technology and Primarily, I came into MPI for pi as a user since we're interested at a high level of using Python for You know parallel computing applications on supercomputers. So I really we have this Invisible third person sitting in the room with us Lissandro Dalson who's who's the main creator and developer of MPI for pi and so there will be a lot of Lissandro did this and Lissandro chose and and Lissandro's genius sits beneath most of MPI for pi But yeah, we're happy to be part of the development team and helping with with MPI for pi I also wanted to do a quick shout out for the PsyComp stack exchange, which is a new beta I'm moderating for that and Hopefully if you have a question or answer about high performance computing you can go check that site out And there'll be a link. I hope from the podcast page Yeah, so my name is Matt Turk and I'm a Postdoc at Columbia University with the NSF CI tracks program And I also like Aaron come into MPI for pi mostly as a user I've you know deployed it on a number of different systems inside the exceed system and outside it as a Library used by a program that I around that I wrote called whitey, which is used in astrophysical data analysis And you know, I've used it MPI for pi to interoperate with C plus plus code As well as you know strictly within Python So that's pretty much where I come at it from Yeah, so unfortunately did try to get Lizandro on to show But he couldn't be here due to scheduling conflicts and time zones and all that kind of fun stuff but He said that you guys could fill in for him as a worthy representative So why don't we go right into this? Can you give us a little bit of an overview of what MPI for pi is? Sure, so MPI for pi is at the highest level a set of bindings for the MPI interface from Python So it attempts to provide not only all of the MPI Functions at a C level where you're basically calling the C function directly just from Python But also a friendlier set of bindings that interact with generic Python objects using a technology in Python called pickle Which is effectively just serialization Now what was the rationale for creating MPI for pi? I mean why add yet another set of language bindings in here? What's what's the motivation? So I think that It's clear that if if you want to use MPI from Python You can't just call the C API directly that's ugly and dirty and and C types wasn't even around When MPI for pi was first written so it's nice to have something in in the language That's friendly and flexible and and easy to use and and feels like Python So if you're an MPI programmer, of course, and you've been writing and see in Fortran it you don't care But if you're a Python programmer you want something that feels comfortable for you So for instance if you have a a Python program that you're that you're using you want to you know add on to it Say a parallel analysis layer, which is where I came at it from You know we had this this program called yt and it operated very nicely in serial And we wanted to be able to extend it in order to run on hundreds and now thousands and tens of thousands of cores and so in order to do that we needed some mechanism for parallel interoperation and You know MPI is the obvious mechanism for doing this and it was something we were comfortable with coming at it from the simulation side Our simulation code you know uses MPI from message passing And so it was natural that we would want to use it from within Python And so we started out using MPI for pi and its most you know trivial mechanism where it utilizes the The or trivial from the user standpoint mechanism where it uses pickle to serialize objects and pass them between You know different processors and then we moved on to using it from the more advanced Standpoint of actually broadcasting arrays directly, you know where it where it takes the underlying memory buffer and then supplies that directly to MPI Now how many Python applications for MPI do you see because in some respects in in high-performance computing people issue everything That detracts away from performance and there certainly is the argument that a higher level language like Python Won't give you that bare metal performance Well, I guess it's interesting right because what you're what you're getting at here is this idea of Bare metal performance and in a sense Yes if you're if you're broadcasting around Python objects and you have to rely on the interpreter to go through and you know Evaluate what a type of a given object is and how it how it's expressed to the underlying C code then yeah that can provide You know a number of barriers to performance, but that's not necessarily how it's exposed to MPI for pi For instance MPI for pi relies on on num pi and what num pi is is essentially a number of Shims that provide Python access to to fast underlying C arrays And so if you can broadcast those I'm sorry if you can supply those arrays directly to MPI Rather than having to do any interpreted steps It already knows the data type and it can broadcast those around to different nodes or it can send them and it can use the underlying MPI machinery directly I think I think you guys are interested in in some hard numbers, right? Right so so you want to know exactly How bad the performance? Overhead is and and and how much it's affected by going through Python so Lissandra ran quite a series of extensive tests on Gigabit ethernet and I owe him some tests on on a blue gene machine just to compare this and The way that we measured throughput is we have three different ways that you could write your MPI code, right? You could do it with just pure C. You could do it with num P Plus Python plus MPI for pi or you could just do MPI for pi is very native technique and what you'll see is if you just use a C of course It's the fastest, but you're within a fraction of one or two percent As long as you're using the the num P arrays beneath it and of course the pickle interface is the slowest, but it turns out that it's also The most flexible and just requires the least amount of code and thought And and a final thing I'll say is that of course if you you're doing this on shared memory Then you start seeing a performance hit due to just the Python overhead and that can be as much as a factor of two But never worse than that I would say Yeah, what I was really going for there was the the canonical argument that I have usually heard for doing MPI in higher level Languages is the abstraction win that you can actually get more done in less time because you're not constrained so much by what? C forces you to do with you know primitive string handling and type handling and all these things you can actually get More done even though it performs a little bit slower You can actually get your code written faster and get prototypes written faster and and things like that This is this kind of your experience as well Certainly Python is a more pleasant language to program and then then see in Fortran So so that's certainly one aspect of it the code tends to be a little bit more maintainable since there are less lines of code and Easier to debug and easier to see what's going on sometimes So let's back up a moment you mentioned you're using numpy and numpy is giving us performance of this We've had numpy on the show. We've had on it. They would do a lot of work with nthought kind of being a Commercial face behind commercialized Support and things for Python. Do you guys have any relationship with nthought? so We neither I nor Matt have any official relationship with nthought for a while nthought has been hosting the MPI for Pi and the Petty for Pi documentation But outside of that not really Okay, so then the actual MPI has multiple versions. You know we have version I think Two-point something is the current standard one and you know we're working on version three What MPI spec does MPI for Pi currently aim to support? Oh, this is the best part about MPI for Pi is that it freely supports both MPI one and two Specs so and open MPI and MPI CH so it doesn't matter what you're underlying MPI implementation is LaSandra was hidden that from the user all you have to do is write MPI code now if you write an MPI to call and You have an open MPI implement a or an MPI implementation that doesn't support that MPI to call And you get a Python error at runtime and it says all you couldn't do this But you're not limited, you know, you don't have to worry about that when you're writing the code. That's cool Have you had a lot of issues with a library support for the underlying MPI library providing your transport mechanism? Yeah, so LaSandra again, go ahead be honest. It's Yeah So I would say that one of the most interesting problems that you deal with Certainly with the older MPI implementations is that there was no requirement That MPI in it Had to be called Sort of it had it had to be able to pass the command line argument And of course if you're running the Python interpreter, those are long gone by the time you import MPI for Pi so Now in an MPI to this is a relaxed requirement and sort of MPI in it is is not allowed to have access to the argc and argv anymore But just to simplify and support MPI one implementations that still required that argc and argv LaSandra has built a MPI Python Interpreter basically and this is just a simple main it grabs argc argv Calls MPI in it and then it starts the Python interpreter So if you're in a situation with a really old MPI that's supported as well This is actually a problem for me on an older system where where we were using the SGI implementation And and we ended up having to use the Python MPI interpreter quite often. Yeah Yeah, and I will say there was actually a problem with open MPI for a while too because we open plugins and Python opens plug-ins and the Whole shared library interaction was actually quite a mess for a while until we figured out the way saying no you need to do it This way some of the darker side of of linkers unfortunately But let me the next question actually after that little commentary Are your functions in MPI for Pi? Are they? Bindings meaning are they kind of one-to-one for the the C functions, or are they a higher level? You know like a class library kind of approach Okay, so that's a that's a great question because there are actually two sets of bindings in MPI for Pi and so your traditional MPI bindings that you would call directly from C are available from Python as well and Those are the uppercase bindings and I refer to the first letter of the Python MPI for Pi routines And those look a lot like the C++ bindings in MPI to though they call directly into the C Bindings themselves. They don't touch the C++ bindings and So calm as an object and so a communicator You can call functions on it like send receive and broadcast and all of those if you call the uppercase versions look exactly like the C signatures for MPI now in addition to those there's a lower level set of calls that are implemented in lower case and Those calls are what I would call higher level or abstract or Python like and you can send any Python object over these things that can be serialized and and this is very friendly and very powerful and this nice abstraction where You don't worry about what you're doing You just say send this to process one and MPI for Pi does all the work of packing it up and turning it into something That could be sent over the network Do you see users when they're utilizing those kinds of like make it easy to send this thing Engaging in communication patterns that are generally a lower performance than using like collectives and such I Would say that most of the questions we get from users tend to be Really really how do I make this thing go? The I can't import library problems really really basic problems We I think by the time people figure out how to use Petsy or sorry MPI for Pi. They're They know they know what they're doing between the two sets of calls One of the very nice things about the way that Lissandro has designed MPI for Pi is just how one-to-one mapping it is between the MPI Standard and the function calls except with this added twist as as Aaron noted of Pythonic interface on top of that So how do you guys handle types then and and I I say that because MPI has this notion of data types Right so like in in the C world I have int and double and float and care and all these things and I tell MPI what the type is because in C It's the buffers past as a void star and MPI would have no other way of knowing But in Python you're much more strongly typed. How do you do the mapping particularly for non intrinsic data types? Okay, so If you're calling the the lowercase methods, this is this is what happens is There's a Python Module called pickle and and what pickle does is it knows how to handle the serialization of all of the major Python? Object types, so if you have a dictionary which looks like a hash map or you have a list Which is a C++, you know STL vector if you're familiar with with the STL containers all of these things are are Instantly turned into serialized objects and so normally you might serialize something and put it in a file But what we do is we serialize something and we put it in a buffer and then that buffer of course it's just a bunch of bytes it's a bucket of bytes with a size and We send that over and on the receiving side the unpickle operation reads the first few bytes of that Understands how it's gonna decompress that and then it turns it into an object on the same side and all of this is sort of Seamlessly handled by these lowercase methods, so I really just say calm dot send Object and then the process I want to send it to and pipe and NPI for pi takes care of the rest Gotcha, so there's really no notion of of native or intrinsic types everything is serialized and unsurrealized If it's something that you can turn into a numpy array Obviously, it'll go a lot faster if you turn it into a numpy array first and so MPI for pi knows about what an MPI numpy array is so it doesn't call Serialization on numpy arrays and also if you use a pep, this is a python enhancement proposal 3118 buffer. This is also an object that and MPI for pi knows how to send efficiently So as I said, a numpy array is really or as Matt said a numpy array is really just a c-buffer Of memory and it could be a one or two or three-dimensional array and have strides and all of these things MPI for pi knows how to take that and very efficiently send that using the native MPI Strides and any MPI data types for sending these things So as an example if you use the the uppercase the functions and you send a numpy array using one of those Then they'll be broadcasted over just as as Jeff noted with the void star Yeah, if they if their strides match correctly and if they're able to be And if you use the lowercase methods on even a complex derived type for instance, you know a python class That's defined in your module. It will serialize that in the way that Aaron described pass it over and it'll be deserialized on the receiving end cool This this is a really nice hybrid approach to to high-level and low-level programming using MPI In the sense that you can do really complex things using the pickle methods But at the same time you don't necessarily have to sacrifice the performance for for things that MPI natively understands So this would require that say I'm a I'm an application programmer I would actually have to be programming in the the numpy Kind of style of my application itself. This isn't like a something under the covers that says like I want to use the pickle Driver I want to use a numpy driver. I actually have to explicitly write everything and work with my arrays all in the numpy style If again, if you were trying to get the maximum performance You would be working with numpy arrays and generally for Numerical computing or scientific computing in Python. You're using numpy Anyway, because that's your matrix or vector storage for for data But if I've got an existing pile of Python, that's like Generic Python doesn't import anything and I wanted to start kind of hashing it around to make it work with this I can just use pickle and not even think about it and just go you don't even need to yeah You don't need to use pickle because MPI for pie does that for you and that's that's in the Python standard library So it's it's available. That's right. You do from MPI for pie import MPI you say comm equals MPI comm world and You know you grab your comm sorry grab your size grab your rank from the comm and send the data Right data equals comm broadcast data and root equals zero So one of the things that we added in MPI 3 and actually is already available in the open MPI trunk is something called m probe matched probe It was to address a couple of things and one of the rationale that was given was explicitly that it would be useful for higher level languages Where you might be sending, you know serialized data where you don't know the actual length of The message that's going to be sent or or to be received more specifically And so it can be useful to just say hey just give me whatever the next message is and I want to receive it But I don't know how long it's going to be So the matched probe the idea is that you can say hey Give me this signature and and not necessarily put in a length with it And if you match it that incoming message is removed from the matching queue so that nothing else can actually probe or Receive it, but you get a handle back to that message that you can then give to an M Receive matched receive and say okay now actually receive it kind of thing now Is this useful to you know MPI for pied you guys foresee being able to use this kind of functionality Yeah, so that actually resolves one of the issues we have with with with I receive so oh, oh, thank God because So so the way that we handle that now is we actually if we have to we send two messages Right, we send the size of the object ahead of time and then the next message Is actually the data when we don't if we if we know we can avoid it We just send one message, but you're right for safety We were often in that trap or we had to send two messages and that's not just for point-to-point In collective sometimes we have these weird funny things as well And and let me say this again You always have access to the C MPI functionality every single function and MPI is available to you from The normal C bindings what we occasionally cannot implement or we cannot do is give you a nice High-level Pythonic interface to some of these functions And a great example for while we're going on this is MPI reduce And why is MPI reduce hard? MPI reduce is hard because you have to provide a reduce operand in the general case and In Python though we could maybe somehow Sneak a callback into a Python function In the reduced call well what ends up happening or the current implementation just Takes all of the data Gathers it onto the root node and then the reduce operation is applied on the root node So that's that's sort of the dirty under the covers MPI for pi Trick to make things look more Pythonic, but it's certainly not the cleanest way to do it and of course it doesn't scale I'll be the first one to say I think Pythonic is a wonderful phrase. I'm gonna work that into my daily for cabaret Don't forget a Python not right if somebody was exploring Python for the first time I Would fall into that category. I would totally find that category. I'm sure you could tweet that somehow Brock When we write idiomatic Python, we are Pythonistas Don't make me start busting out MPI puns because then it would get really ugly We have all of Monty Python to fall back on so this is true Okay, so keep moving up. We kind of mentioned some issues you had with The SDO an old version of the SGI MPI library what MPI implementations do you guys officially support and Which ones do you know work? Open MPI MPI CH Are there any other MPI implementations worth mentioning? I Have also personally used it with MVA pitch with an older lamb However, I must confess I have been unable to or I have recently been using it with the Cray implementation as well And that has gone very nicely Yeah, I'll say that we use it with the IBM implementation of MPI CH But I feel that that most MPIs really are derivatives of one of those two So do you find the interoperability between them that you do actually sit very nicely on top of MPI and are mostly Isolated from differences Yes, and and some of that is is due to a bit of leakiness are you guys familiar with this scython tool? One of our other guests mentioned it Okay, so so scython is is a Python package it grew out of another package called Pyrex and the idea was that well Python is this slow interpreted language and Every once in a while we want to be able to compile a Python file basically into C and so we add a little bit of static typing to to a function and We force that Into the function file and then we compile it with a C compiler with a little bit of interpreting and in addition to that We can also use scython to generate interfaces and and so that's how MPI for Pi is Built it's built around a set of scython interfaces to the MPI Library and it just basically takes the MPI dot h header file and and build up the set of interfaces from there So why is this interesting or relevant? Well, if you're familiar with the internals of open MPI and MPI ch which you are you'll know that communicators are different implemented differently they they're fundamentally different base types and You would think that we needed to do something special to handle the fact that well, it's an integer here, and it's a pointer here But I thought only needs an approximate type. It turns out so even though It in some cases. It's an integer and in some cases. It's a pointer Just saying well, like we think it's an integer is is close enough and and things do work smoothly between open MPI and MPI ch It's like I can see you guys are horrified No comment That's right. I'm you know as we talked before we start recording the show here. I'm the Fortran MPI guy I don't know how I got sucked into that but I am and Shall we say in older versions of Fortran types are a very fluid concept as well But I mean even on platforms where integers and pointers are different sizes right the native int type is not the same size as a pointer it still works and it's because Scython only needs to sort of know what type you're going to be interfacing with and see and It does figure it out when it actually does do the compile what it's supposed to be converting the type to So it works So for upcoming things, I assume you guys are you know gonna implement MPI 3 when that comes out Yeah, I think that as soon as the MPI 3 stable implementation is out There's certainly enough interest in an MPI for Pi that will be able to support that and so What about any of these more pythonic things you have any Pi into sky ideas of future ways to make things more abstract more higher level easier for a grad student to get something running in parallel I I'm always full of ideas and I've I've recently seen some people start working with and building Libraries on top of MPI for Pi to start enabling a simple parallelism. I'm mostly interested in sort of really high-level utilities that sort of just Monitor for example, what's going on with a scientific application? Things like load all of the things that are really dirty from from C or Fortran sometimes to Incorporate or to cache or memorize or things like that might be more interesting from Python, but I don't think I have any hard ideas So a question I like to ask people just because everybody has Different answers and different rationale for this is what what version control system do you guys use for? maintaining your code base and why so Lysandra uses mercurial and mercurial is written in Python and I don't actually have anything to say beyond that. I'm a get guy myself Well, I'm a mercurial guy both of my other I suppose the two main projects that I work on are also versioned in mercurial, so I'll Skype high five Jeff on that one That's just my personal opinion, but they're all they're all good systems the Another question we like to ask is what's the strangest use that you have heard of with MPI for Pi You know something that you didn't necessarily intend or design it for but you found out that some users using it You go, huh? Well, that's kind of cool Well, I think Aaron has a has kind of a hero run to describe My understanding is that that Aaron runs on a tremendous number of cores with MPI for Pi 65,000 tremendous. Yes, I think that's I think that's not even in the ballpark for amazing these days, right? Yeah, we I would say that we've exposed the dirty dirty underbelly of Parallel file systems with dynamic loading. Oh, yes I don't know how interesting or exciting this is to report negative or scary negative results, but bringing in Python importing numpy and importing MPI for Pi on Most or every supercomputing class machine that we've tried so far takes several hours. Oh Because that's a file operation at runtime Yeah, one of the problems with the import is is really how many directories and files it needs to touch during the import process on all Of the different processes are simultaneously Right, so it puts tremendous load on the metadata server usually That's right. And so what you're seeing is 65,000 processors doing a search over the standard DL open routine For all the paths over all the libraries and on top of that Python itself is also searching for every module and every package it needs to load and We have some good news and that's that Asher Langton who's a Key is a graduate student doing a research appointment at Lawrence Livermore National Labs has has a very nice Implementation that's trying to fix this by by simply letting only process zero Do the search and then reporting the results to the remaining processes of course, this doesn't help deal open and That's where Jed Brown and I have been working on trying to figure out How to intercept and we're all the way in G-Lib C at its its very base bootstrapping set to Intercept these file system calls and do something smarter with them Cool Yeah, we I think we have a smaller version of that problem in open MPI because you know We're all about the plug-ins and we have several dozen plug-ins and at larger scale We were actually requested way back in the early beginnings of open MPI say hey give us a way to Not make these plug-ins so that they're just part of you know Live MPI dot a instead so that I don't have a bajillion deal opens going on at the same time But probably fewer than Python and searching and things like that, but yeah similar issue. It's always cool to bring out these Things that people just never thought of until you start running them at tens of thousands of cores Well, thanks a lot for your time guys. Um, what is the MPI for pi a website where people can download it and how can they get involved? It's at code.google.com Ford says P Ford slash MPI for pi and of course we're always looking for people to contribute Especially with these pythonic routines Take a look at it. If you see something that you like or something that's interesting Send us an email. There's an MPI for pi developers list on Google groups. And of course, we're very chatty and friendly Okay, well, thanks again. Oh, thank you guys. Thank you very much, Jeff and Brock. Okay, and we're done