 As Kenneth introduced us, my name is Jeff Squires, I work at Cisco, and my co-presenter today is Ralph Castain, who works at Intel. And we both do quite a bit of development on OpenMPI. I didn't quite understand how much until he said those statistics, but there you go. It's both a labor of love and at least somewhat paid for by our employers as well to varying degrees. But that also represents quite a bit of the entire OpenMPI community. This is not the Ralph and Jeff show. We are the ones giving this presentation today, but there are countless other really smart, dedicated people who are integral to making OpenMPI be a great project and great software for the world to use. We, the two of us, absolutely could not make this product on our own. So I just do want to emphasize that GitHub may say we have the most commits, but that quantity, not necessarily quality. There's a lot of really smart people who may only have 20 commits, but they've contributed really, really important stuff. So I just want to make sure we're clear on that part. As Kenneth also mentioned, this is done in conjunction with the EasyBuild community. Kenneth is the one who invited us and kind of gave us this idea. And over the past couple of months, we've been fleshing out how to do this talk. So we want to thank him for all of his support and hopefully giving what will be very useful information to the community. So let's talk a little WebEx logistics here. As I mentioned, this session is being recorded. It will be uploaded to YouTube after the fact, so stay tuned. We'll send all those details later. Please feel free to ask questions in the Q&A panel. If you've been on a WebEx presentation before, this might be a slightly different format. There's what's called a WebEx event where there are three of us panelists, and we can do the presenting. It's not like your normal WebEx meeting where anyone can talk and all those kinds of things like that. So there is a dedicated panel. If it didn't pop up for you automatically, go ahead and go find it where you can ask questions in text form. Ralph and Kenneth are watching that while I'm speaking and we'll be monitoring it and answering questions and things like that. So let's get on to the agenda here. We actually have, when Ralph and I were making up all these slides, we discovered we have a lot of material. It's a deceptively small overview, but there is a truckload of slides here. We're not going to get through all of it in this first presentation. This whole presentation, I should say, the overview, this is not an in-depth tuning analysis of OpenMPI or in-depth how does the system work. This is what are all the parts and pieces, because frankly, the title does kind of say it all. HPC has become an alphabet soup of acronyms and different components, and you need them all to work together to actually get a functioning system and then actually tune it a bit to actually get a nicely working system. So what the scope of this talk is, is what are all of these pieces? What do these acronyms mean? How do they work together? And we might use this as a springboard for future talks for things like tuning and stuff like that. But just given that the ecosystem has evolved over time, we should actually give a snapshot of where we are today. As developers, we're intimately familiar with all of how these components pieces interact and whatnot, but the rest of the user community, it moves so fast, it can probably be pretty hard to keep up with all this stuff that happens behind the scenes. So that's what we're doing today. Our next session we've already scheduled is going to be Wednesday, July 8th, and I know that hits a bit of the summer vacation timeframe for Europe, especially, I'm sorry about that, that's just how the scheduling worked out, but it's going to be in exactly the same time slot. So for those of you following along at home, that's 11am US Eastern, I cannot do the time translations in my head for the rest of it. Please translate for your local time, but they will be uploaded and available on YouTube afterwards. You won't be able to ask questions, obviously, but the content will be there. I've done so much talking here. I'm not going to read through the overview. I assume all of you have done that. Let's just launch straight into the background. All right, OpenMPI is fundamentally based on something that we call the Modular Component Architecture, or MCA. You'll hear the phrase MCA and see it on the MPI Run command line in a lot of places. It's basically a plug-in architecture. It's fancy schmancy words for plug-in. Now it's an architecture. It's not necessarily a layering, and I'll explain that in the next slide here. But three key words that are important to know is project, framework, and component. Let's look at that real quick here. This is how the architecture is laid out. And again, this is not layering. So it doesn't mean that if you start up in the top box with an MPI send, you don't necessarily have to go through every one of those bubbles before you finally hit the underlying network. It's more about how the code is organized, not necessarily how we optimize and get short code pass for fast message transmission and low-latencies and things like that. But at the top, you know, the very, very top, we have the OpenMPI overall project. Underneath that, we have a couple of projects. I'm sorry, I used a poor word a moment ago. I should say the OpenMPI software package. And below that, we have projects. And then in a project, there are frameworks, and then in a framework, there are components. What does that mean? Well, let's put some concrete examples here. Now, this is not a comprehensive list of all the projects, frameworks and components that are in OpenMPI, but it's a good representative in some of these words may be familiar to you. So the projects that we have are MPI, Schmem, for OpenSchmem, and Opel, which is our open portable access layer. It's a lot of the portability glue between BSD, Linux and other operating systems that used to be important like Solaris and things like that. It's a lot of our portability junk down below. The frameworks may also be somewhat familiar to you. And we'll discuss these things later in the presentation, like PML and BTL and MTL, things like that. And then down below that is individual plugins that target a specific system. So BTL, again, we'll talk about this in detail later, but that's our bytes transfer layer. And we have a TCP layer, we have a shared memory component, we have a USNIC component, which is mine, and so on and so on. So these are the bottom layer, a synonym for the word component could be plugin. So if plugin makes a little more sense to you than just put that in your brain is like, oh, that's the TCP plugin or the shared memory plugin or the USNIC plugin and so on. Now, this is the overall semantic view of how OpenMPI looks. And I will say that the OpenMPI community has proven to be absolutely terrible at naming things. We are engineers, we are not necessarily literature scholars. And so several times we have just kind of thrown up our names or thrown up our hands and said, you know what, we're just going to pick a Star Wars theme name for this one. You'll see names like Obi-Wan and Vader and a couple others and unfortunately Vader is really terrible because it's very common. It's actually the shared memory transport. So we say, oh yes, you need to use shared memory so use Vader and people are like, what? So that is unfortunately a very not user friendly word. I apologize for that. A couple other names I've leaked in over the time to there's some Star Trek names or some Highlander names. It just reflects that we are geeks and terrible at naming things. And I apologize for that. So with that, I am going to turn this over to Ralph and Ralph is going to talk about pinnix. So Ralph, I'm going to stop presenting and I'm going to let you present. Okay, let's see. Now I need to figure out how to do that, Jeff. In the WebEx window, if you hover over the bottom, the little guy with the up arrow, bubble with a box and an up arrow in the main WebEx window. I don't have one, Jeff. If you. Okay, let's do this. Sorry, everybody. We tested this like two weeks ago. I think we forgot. Sorry. I have to give you presented privileges. Go ahead, Ralph. There we go. Okay. Unfortunately, I can't see this. I don't know how well this is what's being displayed. Is this covering that present button in the top right button? There we go. That'll help. There we go. My apologies, guys. Like Jeff said, and Kenneth said, I am the one who basically started PMIX. Oh, gosh, back in 1990. Oh, I'm sorry. Sorry. 2014. Basically, what happened was we were in the process of looking at the exascale emerging machines. And one of the things we found in. Looking at it. I'm having some issues here with this. With the presentation mode there, Jeff. Okay. My apologies again, folks. So one of these we were looking at was. How are we going to deal with some of the things we were seeing that were kind of going on in the HPC community. In particular, we were seeing that there was not just an emphasis on on the scale of the machines, you know, going up to the exascale kind of size. Both in terms of number of nodes and number of processes growing up like it to the million process level and stuff. But also we were seeing a lot of things going on with programming models and other such things. And we're trying to figure out, you know, how does one machine support all this stuff. And how are we going to make it all work efficiently together. So, so we took a look and we said, okay, so if we look at some of the key issues here. If we're looking at just, you know, MPI parallel jobs in general, we started seeing issues about how long is it going to take to start them. We get to these bigger scales. We were looking at times that might be like in the tens of minutes to actually start up a full scale job and that that obviously was not acceptable. So we figured we had all right we need to look at this launch scaling issue, but at the same time, we had the applications changing where there were going to be multiple models involved in application. You know, I showed to here the open MP and the MPI models. But there are others as well, you know, you might have spark, for example, that's that then wants to use MPI in it for a reduction mechanism. And their moves being made of on the artificial intelligence, the machine learning guys, starting to look at using MPI and other kind of models like open schmem for for some of their operations. So we had to feel, well, how are we going to deal with these, these hybrid applications. We also saw a proliferation showing a occurring of model specific tools, I'll call it, you know, people writing their own tools that would work with say, like a debugger, for example, that would only deal with MPI and a debugger that would only deal with with data analytics. And that seemed really inefficient that everybody had to rewrite all those all those tools just to be able to deal with a different programming model. Same time we saw, you know, the emergence of container technologies. Kenneth mentioned I was involved in the early days of singularity. Docker also has shown as well as several others. And those presented their own problems. If an application sitting inside a container, if you want to wire up those containers, it has to interact with the outside environment. Well, how's it going to do that across a container boundary. And what do you do about, you know, changes at the at the model level if you are, I'm sorry, at the at the host level, somebody does a version upgrade. You know, how does he how do you deal with the container not doing that. And then finally, we just saw, you know, this explosion of programming models, I list some of them them up there. But the every one of those has its own runtime. And if you you look at the effort required to run to write a runtime, it's really rather staggering. Everybody tends to underestimate it. And, you know, all these run times, we're doing basically the same thing that was a minor twist perhaps between the different programming models, but a great deal of effort was being spent on all these run times. And so the question was, well, is there a way maybe we can make that easier, or perhaps reduce the need for for generating all these new run times. And where we do need to run times. Well, how are they going to work with each other in a in an HPC cluster. So let's start someplace. And the first place we decided to go is to work was that we were going to address this launch scaling problem. For those who aren't familiar with how these launches work, you start all your processes and then in a parallel job, in particular, you then do a great global. You will where you share a lot of information about what connection endpoints my listening on where am I operating, you know, just a ton of information. It's amazing how much information it is on a large scale system. You may be exchanging tens of megabytes of information during this initial startup exchange. So if you if you hit, you know, MPI run my job and you sit there for 10 or 20 seconds, waiting for for any indication of life. That's what's going on is this massive exchange of information. And if you if you drill down and analyze that information, what you discover is that almost all of it was already known to the launcher. Before it even started the processes it knew where they all were it knew what hosts they were assigned to it knew where they were going to be bound. It just didn't have a way of communicating that information down to the application at the time the application started. So, so that that was an inefficiency that could easily be addressed. The other problem was that we were dynamically discovering the endpoints for every one of these processes. Even though I might know where the process was I didn't know how to communicate to it in a TCP like role I didn't know what socket you were listening on because you dynamically picked it up. And so I still had to do the exchange of at least the endpoint information. But it turns out that if you work with the fabric vendors, there are ways they can assign those endpoints in advance that you actually know what the what the what socket if you will use that as just a simple example that we can talk to. I can pretty much assign a socket in advance and know what socket you're actually going to be talking on and communicate that to everybody said everybody at time of start of execution knows exactly where every process is and what every process endpoint is. So what I want to communicate. I can just communicate. I don't have to do any exchange of information or do it. So so that's what we decided we would tackle and that was our first goal was just to completely eliminate the data exchange at startup. And then expand upon that eventually to actually allow the resource manager in upper and collaboration with the programming libraries basically to orchestrate the entire launch procedure. So our tool for doing this was going to be PMI. This is kind of the legacy or the history of the PMI effort. If you will back in the early days, there was PMI one and PMI two that came out of the MPI project. And that basically just concentrated on wire up support. So it was the mechanism by which you did that big global exchange of information at the beginning of time. It then expanded out a little bit to offer things like, you know, well, if I want to spell, you know, do a calm spawn, if you will, if I want to spawn a few more processes, dynamically from inside my application, what we had a mechanism for doing that stuff. But it basically was limited. His primary focus was strictly on that initial wire up. And so it was picked up by by P gas and a few other programming models. Interestingly enough, it was not actually picked up by open MPI when we pick when we started we went a different route. And the it was picked up by two resource managers the slur guys and then the Cray Alps system wrote libraries to support it. But it was kind of restricted. It wasn't a standard in that sense that there was a standard body over it or anything like that. But for a long time, that's what people used. And then like said, the late 2014 we started looking at what we were going to do to try and deal with exascale, because we didn't think that the time wire up times are going to be acceptable. And about 2016, somewhere in 2015, we released our first version of PMI X. And by 2016 we were able to launch exascale size systems in under 30 seconds. So we went from the tens of minutes to under 30 seconds. At that point in time, PMI X started getting picked up by some of the more of the libraries, if you will. So the resource managers, Slurman picked it up. IBM wrote a new resource manager called Job Step Manager for their coral machines here in the United States. They based it on PMI X. Obviously open MPI and Spectrum MPI out of IBM were using it as was open Schmem, a variety of flavors of open Schmem were using it at that time. By the time we get to this year, we're now going to have exascale launch times that are under 10 seconds. They're running somewhere in the five to seven second range. We have a much broader scope of PMI X that I'll get into over the two sessions we've got scheduled. And we're up to like version four of PMI X. So resource managers now, everybody except UNIVA's grid engine are using it. I don't know the state of UNIVA grid engine. I have not officially heard from them that they are using it, that they have added support for it, but everybody else has. We now have been, it's been adopted by MPI as well as the open MPI community. All of the open Schmem's have support for it. P-GAS has support on the tool side, TotalView and DDT both now have built-in PMI X support, etc. So it has really grown in its adoption. And the only reason I show you all these things on here is just to give you an idea that even though we are talking primarily open MPI in these two talks, you're going to see PMI X from a lot of different sectors, not just open MPI. And so you'll see it coming at you from a variety of directions. So PMI X has evolved into three distinct entities out there now. So there is actually a PMI X standard and a standards governing body. It's an international body. It consists of about 18 different organizations or members of the administrative steering committee. It meets quarterly for official votes and then it has monthly meetings that are just working meetings and has a variety of working groups now in it that are looking at things like extending into the power, supporting power APIs and storage APIs and a variety of other areas that people are interested in extending to. Nothing about implementation in there just defines a whole set of APIs. They're very generic APIs and a set of attribute strings by which you control the behavior of the API. There's one major implementation out there, the open PIMX library. We haven't really seen an explosion of implementations. So for now, that's pretty much it. Livermore has been working on potentially working on one that will be for their flux system. But other than that, it's pretty much just open PMI X as you'll see out there. And it basically is a reference implementation for the standard. So everything in the standard is in that PMI X library. We also ran into a case where people were saying, gee, our resource management environment is not necessarily up to speed on PMI X. It's not as current as we might like. But we still want to be able to start working with some of the new features as we move through the various releases. So we created a reference runtime environment for PIMX called Perte. And basically, Perte is a full-featured runtime. It supports all of the PMI X scope. And you can just run it under your own resource manager. So for example, you can get a slurm allocation and then start Perte underneath that. And then you can just operate your MPI job or OpenSchmem job or whatever it is, operate it underneath Perte as if Cray had a complete PMI X implementation inside it. You won't know the difference. So the other thing it did was it allowed us to provide, for example, for tool vendors that Perte runs at the user level. And so each user can have their own Perte environment that they can work in. And that way, if you have a bunch of people that wanted to develop it on the same machine, they can each have their own Perte running. And each of those Pertes can have a different version of PMI X that is running under. And they can work independent of each other without any interference. So it's being used quite widely in those areas. So this is real quick what the community is. There's quite a few organizations. This was a snapshot in time about several years ago, really. And I do give you the links at the bottom. The top one is for the standard bodies main webpage. And then the bottom one is actually the GitHub repo for the standard itself. So one of the key things about PMI X, and I'm just going to give you some background in it today so that you understand when we talk about the role of PMI X in OpenMPI, you'll understand a little more about what the heck this thing does. But PMI X basically is a messenger, not a doer. So in other words, when an application says, hey, I'd like to get a new allocation of resources, PMI X doesn't do that allocation. It communicates your request to the resource manager, and the resource manager either will do it or not. And then PMI X will communicate the answer back to you about what the resource manager did. Likewise, if you are a user and you're working with a tool that's PMI X based, if you ask us, for example, hey, what storage systems are on this cluster and what are their capabilities? PMI X normally will not do that request itself. It will simply pass it to, in this case, the storage system and say, hey, they asked a question. The storage system will generate the response and then we will convey it back to you. So its basic role in the HPC environment, if you will, is to basically convey these orchestration requests, as we call them, and the responses between the application and the system management stack. And there's a wide range of these orchestrations that people have asked for, like I say, allocations and job launches and job control and monitoring things. And we'll go over some of these. The other thing, though, that happened was that we had the various programming models come to us and say, hey, we have a problem ourselves in terms of coordination within the application. So I'm, in this case, open MP and MPI. We're both trying to use the threads and we step on each other. And we would really like some way of being able to coordinate inside of the application and say, hey, I need you to just sit aside for a little while. Turn your progress threads off for a little while, because I'm in a really intense computational mode, open MP computational section here. And it would really be helpful for our performance if the MPI layer just shut down for a little bit. So we created a mechanism by which these two multiple programming libraries can in fact interact with each other using PMIX events. In an asynchronous fashion to do things like say, hey, I need you to shut your progress threads off for a little bit. And then say, OK, I've shut them off, go ahead and do what you want to do. And then when you're done, you send me back an event saying, OK, I'm done. You can go ahead and kick on again. And that's actually been useful for people to help resolve some of the resource contentions. So there are some exceptions to that doer rule. When we're interacting with non PMIX systems like fabric managers or credentials or even in some cases storage systems, the issues that came in with people that people had was at the resource manager level was you're going to pass me up this request as the primary point of contact. And you're going to say, hey, tell me what the fabric topology looks like. And in order for me to answer that question, I'm going to have to write a whole bunch of code to talk to the different fabric managers that are out there. And that's a lot of work on my part. Why don't you guys take that on for me? So we talked amongst ourselves and amongst the various vendors and we agreed that in some cases what we would do is rather than the host system management software having to write these particular implementations, we would write them as plugins to PMIX and allow the resource manager, if you will, if they're the host, allow them to call that API and then we would actually take care of connecting to the actual storage system management subsystem and get that information. So in other words, we limit the application's connectivity to going just to the host resource manager Damon or whoever it is that's hosting the PMIX server. So that's the only connection they have. But when you get to the PMIX server, we provide plugins that allow it, for example, to say, hey, they asked for information about what fabric topologies are here. I'm going to ask PMIX to go talk to the fabric manager on my behalf and get that information. Then I'll relay it back to the application. So there are some things and you'll see them in our MCA layer in our plugins. You'll see references where we are in fact carrying out some of those things. Like we will have a slingshot plugin to talk to the cray fabric manager and we will have a luster plugin to talk to the luster storage system where we are in fact executing on behalf of the host resource manager, or executing the request to get that information. The other thing we do is at the server level is we aggregate all local collective operations. So if the application is calling a fence operation, we will collect all the local participants before we pass that up to the host for inter-node execution. And then we do some environment support. If they ask us to, we will collect inventory for them. Do an HWLoke topology collection to find out what NICS and GPUs are present and stuff. We do a little bit of process monitoring for some obvious ones, etc. So there's a few things we do, but in general, we just pass requests. So where does it fit in the OpenMPI package? It is one of the projects sitting over there off to the side. It has its own set of frameworks. What Jeff didn't mention, but I will, is that the framework, think of the framework as the plug-in abstraction layer. So if I have a PTL framework as our PMIX transport layer equivalent to the BTL in OpenMPI itself, what that is basically is just a definition of a set of abstracted interfaces that each plug-in is going to implement. And so PMIX has got a growing number of these. We do love our frameworks, and each of those has its own component plug-ins in them. And I'll be going over those in the second session, but this concludes the overview of PMIX I wanted to give you. Just to show you that it provides a whole host of services that OpenMPI relies on, and it's a project that sits off to the side inside of OpenMPI. And Jeff, I think this means I'm going to turn this over to you. Can you take presenter, Jeff? Yes, I will give presenter role to Jeff. Meanwhile, we already have a couple of questions related to PMIX. All right. So maybe Ralph, you can try to cover them. Let me bring them back. Okay, so the first question is what exactly is meant by Univa Grid Engine not supporting PMIX? Does it mean that OpenMPI will not work with Univa Grid, or does it mean that the startup is not optimal with UGE integration? Okay. What that means is that if you want to run with PMIX, like OpenMPI, we'll just use OpenMPI for the example. If you want to run OpenMPI under Univa Grid Engine, you have to use OpenMPI's MPI run to start the job. Univa Grid Engine doesn't support PMIX natively, so you can't use a Q sub command to start your job, to start your application. You'd have to use MPI run. Okay. And that's in contrast, just to throw in a little extra color here, to slur, for example, you can start a job with SRUN, but then you can also start your MPI application inside that job with another SRUN, because PMIX is built into the runtime and that talks natively to OpenMPI behind the scenes. So that's what the native integration means versus using OpenMPI's MPI run, where we'll typically use some other mechanism underneath the scenes, and it might be a quote-unquote native mechanism. So even in Univa Grid Engine, we're using Univa Grid Engine mechanisms to do launch, but it's not as tight of an integration as if you do a direct launch. Okay. I think we'll also get back to some of the runtime stuff later, maybe in the second part, right? Yeah. Yeah. Another question that was raised, and this is maybe for Jeff, who may cover it later, is OpenMPI API compatible with other MPI libraries, or do we expect it to be in the future? The short answer is no, but let's get back to that later. Let's do the Pemex questions here. Okay. Next question is, does the version of Slurm have any effect on the compatibility with PMIX? Yes, it does, to a limited extent. The versions of Slurm support different levels of the different versions of PMIX, starting with, I believe it was Slurm 16.x, I think it was 16.5 was their first Slurm support. That only supported the version one out of Pemex. By the time you get to Slurm 18.x, they are capable of supporting all the way up through PMIX 3, and they've now opened it up so that they can, I believe they're now working on full integration with PMIX 4. That would be coming out probably in the late 2020 release, maybe early 2021. PMIX supports backward compatibility in its library. There was a bug in the Slurm configuration script that actually didn't support it. In other words, you could not use Pemex 3 with a Slurm 16.5, even though in fact the Pemex support would have worked just because they were checking the wrong thing, and they would have said, well, it's not Pemex 1, so we won't let you build against it. They fixed that in 2018. In 2018, you were able to go ahead and use a Pemex 3 and build even though they might not support the Pemex 3 APIs. You'd still be able to use it and just use the Pemex 2 APIs that were in the Pemex 3 release. I apologize, it's a little uneven in that sense, but they've now fixed the problems. You should be able, with anything 2018 or newer, you should be able to use any of the PMIX releases against it. We've got a related question that you may have answered just now. Can one build Slurm against the PMIX included with the OpenMPI sources instead of a separate PMIX source download? No, not really. I say that a little tongue-in-cheek. There is a configure option in OpenMPI that will allow you to expose the headers that you need in order to do that, and you could do so, and then you'd be able to do it. If you were to build OpenMPI with the double-dash enable-develop-develop-headers, that will put the public headers out for PMIX, and then if you really wanted to, then you'd be able to go ahead and build to use the PMIX that came with OpenMPI. Let me throw a little more color in there too, Ralph. This is a great question. It's really just about build system logistics though, because the PIMIX that we ship, whatever version of PIMIX that we ship in OpenMPI, you can build Slurm with that, whether you use the copy that's embedded in OpenMPI, or if you go download that same version from the PMIX website. At the end of the day, it results in the same thing. There's some shell script build system craziness that we have to go through to expose the internal one externally so that Slurm can build against it. If that gets too complicated for you, just go download the same version from the PMIX website, and you're effectively getting the same thing. Let me emphasize one thing, because this may be the root of the question, that is you don't have to have Slurm and OpenMPI built against the same PIMIX in order for them to work together. They don't need that at all. PIMIX is cross version compatible. The Slurm PIMIX will pick up basically when the client connects to the server. There's a handshake that goes on that basically exchanges, hey, I'm running version three, you're running version four. We select the right communication protocols to make all that work. Slurm can be a version two, PIMIX can be a version four. It doesn't matter. Okay. It was done that way specifically for this cross version matrix compatibility nightmare, because we have all these shifting stacks and things like that. PIMIX was very specifically designed to allow that cross version compatibility for exactly what Ralph mentioned. Okay, very good. One more question related to PIMIX. Are there known examples of end user applications, maybe non-MPI applications using PIMIX directly? Yes, there are. And they're growing. I can't, I apologize, my brain being, I'm being a bit of a fossil here. I just, I can't give you a name right this instant off the top of my head. But there are quite a few applications that are beginning to do so. And there are a number of programming models other than MPI that are using it, especially in the workflow manager areas. So when you see things like Swift T or Balsam or Adios coming out of the National Labs over here, we are working with them very closely right now on integration with PIMIX because they want to be able to use the dynamic APIs in PIMIX to actually drive different, you know, orchestration behaviors within these different environments. So, you know, one of the issues has always been there if I write a nice workflow manager, for example, that does this great job of doing computation and visualization, for example. You wind up having to do it for a given environment, like a cray, for example. But then if I want to move it to something else, I wind up having to rewrite a whole bunch of it in order to drive a different type of system. And so PIMIX gives you that abstraction, so you don't have to do that. These calls are moving pretty aggressively in that direction. Okay, thank you very much. One more question. Is it sufficient to use PIMI2 rather than PIMIX for a medium-sized cluster? So the context is about 10,000 cores and the largest job being about 2,000 cores. Well, you could always use PIMI2 versus PIMIX. There's no requirement that you can't. The issue you'll run into is how long does it take to start the job, and then, you know, whether you want the orchestration capabilities or your content with just a traditional MPI job. When you're at 2,000 cores, you know, or 2,000 processes, you'll notice a slight difference in startup times, but not a major one. That's, you know, you're sort of on the edge where you start to see a big difference. When you get up into the 5,000 or 6,000 process levels, now you're starting to see some significant differences. And when you get up into the tens of thousands of processes, there's a big difference. Kenneth, we lost you there. So I'm going to jump in with the last question we got on PIMIX so far. Does LSF support PIMIX? The answer, I can't speak for IBM, because you know, this is a little bit off cheek, but the answer is they are there going to be, we're working on releasing that. It is not currently released. All right. That is all the questions we've got on PIMIX so far. So I think we're going to jump in and do the next section, which will be a little tight, but I think we can get it in. And so I'm going to go ahead and share. All right, let's talk about building open MPI. Here's the super short version. We actually take a lot of effort to make sure that this has remained true throughout the entire life of open MPI. You download the tar ball, you untar it, you CD it to the directory, you configure, and you typically give it a prefix, you know, where you want it to install and possibly a bunch of options. And those options typically deal with supplemental libraries like network communications, libfabric, UCX, things like that. Then you make it and a parallel build is fully supported. So you can make dash J as many as you like. I frequently make dash J32, for example. And then you make install and that's it. It's supposed to be as simple as that. Now the devil's in the details. So let's get into that a little bit here. There are some exceptions. If you actually do get clone from the open MPI GitHub repository, you require a few more tools. So we require developers to have things like they do auto tools, get new flex, and on master we now require can doc for generating our man pages. So you got to have these things and we don't feel bad asking for get clones to have these additional tools. Distribution tar balls, you don't need any of these tools because the tar balls themselves are all bootstrapped. There is a file called hacking. So if you want to build from a get clone, go read that file and it tells you all that you need to know. And if it doesn't, let us know. We'll update the hacking class. All right, so there is a philosophy to our configure script. Our configure script is very long and involved. It looks around your system and it searches for all these optional support dependencies, right? Like, oh, do you have UCX? Oh, do you have lib fabric? Oh, do you have this thing? Do you have that thing? If it finds them, it builds support for them. But if it doesn't find them, it just skips them because it's an optional dependency. That being said, if a human specifies on the command line, like, hey, I specifically want this dependency, like dash dash with lib fabric. That is telling configure, I need you to build lib fabric. And if you cannot build lib fabric, I want you to abort and let a human figure out what the problem is here. Right? So that is an indication of intent with. Now you can also do the opposite of that, too. Say like, no, no, dude, I don't care about lib fabric at all. Do dash dash without lib fabric. And then configure will effectively skip that. Sometimes we'll do something that doesn't really matter, but it effectively skips the lib fabric stuff. So foo applies to a lot of different things. I just use lib fabric as one example, but there's a lot of width and without foo things in our configure system. So the short version is if a human asks for something and configure can't do it, we will abort and let you figure out, we will not silently fail and say, oh, you said with lib fabric, but I didn't find it, but that's fine. It's optional. No. When you say with foo, we no longer treat that as optional. That's very important for automation, because as we know gremlin is creeping into the system and you're like, oh, I said with lib fabric and something changed and it can no longer find lib fabric and so it can't build lib fabric. You want to know because configure aborted and told you right away. Nope. Couldn't find lib fabric support. Human, please figure that out. Specifying compilers. We know in the HPC community that the compiler can be tremendously important, depending on the type of application that you've got. The rule of thumb is that you probably want to compile your MPI implementation with the same compiler suite that you're building your application with. That's not always true. You can mix and match compilers, but really weird wonky stuff can happen when you do that if you're not careful. So the easiest thing to do is definitely to have like, oh, I'm going to build my application with the GNU compiler suite. Great. Build open MPI with the GNU compiler suite. Oh, I'm going to build my application with the Intel compiler suite. Great. Build open MPI with the Intel compiler suite. Like I said, you can mix and match, but that is not for the meek. It is, it can get hairy. So the way that you specify which compilers the MPI should use is these three different shell variables. CC, CXX, and FC. We used to have F77 and F90. Those are no longer used anymore. It is now just FC for the Fortran compiler. So all Fortran code goes through that one compiler, regardless of which flavor or dialect of Fortran it is. Now the best practice is to actually specify this on the command line to the right of the configure token, like that example I showed there. So configure and then CC equals blah blah blah clang and CXX equals blah blah blah clang plus plus. The reason for that is then your entire configure line ends up in the first couple lines of the config.log. So you can look back and see, oh, which compiler suite did I build this with? Versus just having like, you know, you set your path appropriately or you exported CC before you ran configure, then that's not captured in config.log. Yeah, it'll do the right thing. No matter how you set the CC environment variable, but it's best if it's nicely in that config.log. Oh, I'm sorry, it's not best. It doesn't matter. It works either way, but it's good for your future self if you ever need to go back in that log and recreate how you built OpenMPI there. Now, something that we talk about a little bit is that OpenMPI can be built as both static or shared, actually both at the same time if you want to. So you can build livempi.a or you can build MPI.so and I'm using Linux library conventions here. The equivalent is true in macOS and other operating systems as well. So there is an enable static and a disable static and enable shared and a disable shared. These are configure command line options. The default and the recommended is that we disable building the static and we enable building the shared libraries. You can build both at the same time, enable static and enable shared. Sometimes weird things happen there. So our recommended for a variety of reasons is to disable static and enable shared, but you can override either of those defaults on the command line. Now, a more subtle thing is where are the plugins? Where are the components? Are they dynamic shared objects, meaning that we open them at runtime? They're all individual little library files, DSOs. So are they in livempi or are they outside of it? So this picture here is our default that we have livempi contains the core code and the frameworks, but all the components are individual DSO files in your file system. That being said, you could do this command line option, disable deal open, and then we actually kind of slurp all those plugins into livempi. For all intents and purposes, the functionality is the same. The use cases for why we offer it both ways is complicated and convoluted, not really worth going into. The default is the DSO way, but you can say this disable deal open and then it's all included in livempi. And this is orthogonal as to whether livempi is a shared library or a static library itself. So you could do DSO included regardless of whether it's a static livempi.a or a shared library livempi.so. Now, two dependencies that we hear a lot about are libEvent and hwLoc. We require these two packages. The super short version is libEvent is an event-based framework. So we inject stuff into a queue and they get processed and we might put things in and it supports doing timers and do something when a file descriptor fires, things like that. We use that internally inside MPI quite a bit. hwLoc is the hardware locality project, mostly out of Inria in France, but they are also a partner open MPI project as well. We require these two packages. They used to be optional, but we finally gave it up and said, no, no, we're going to need these two. Most modern Linux distros actually come with both of these packages. However, installing the header files is not very common. It usually doesn't happen by default. So you might have to install like libEvent dev and hwLoc dev or whatever your distro happens to call them. So open MPI can build against the operating system provided versions of those packages or we still embed full copies of these projects inside open MPI because of this reason. We got so many bug reports and user complaints saying like, oh, you require libEvent and it's installed on my distro, but open MPI is failing to build because they hadn't installed the dev package and so therefore the header files were not available. So therefore open MPI couldn't compile against it. So we still embed full copies of these packages. If our default is to use the system installed ones if we find them and we will fall back to using the internal ones if you want. But if you want to force using internal or external, I provide the examples down there at the bottom. You can say with hwLoc and you can optionally provide a tree where it's installed or you could say equals internal and that's a special keyword that says don't even bother looking externally, just use the one internally. Honestly, there's not a whole lot of reason if you have modern versions of libEvent and hwLoc and by modern versions I mean within the last three to four years you won't notice a whole lot of difference. So most people won't care whether you use the operating system version or the internal one, but I mentioned this because this question comes up periodically. Now communication libraries, there used to be a lot of them. But now there's type O on my slide and I'll fix that before we send these slides around. The most common libraries that we say are libFabric and UCX but also PSM is out there as well. There used to be a lot of communication libraries out there. They have kind of consolidated down to libFabric and UCX these days and I'll talk about that more in a second here. But if you're libFabric or UCX or PSM or portals are in non-default locations you might specify where the installation directory is. That's what the brackets means is you could say oh with libFabric and then configure will fail but they can't find libFabric. But you can also say with libFabric equals and a directory and that's where configure will go search to find libFabric or UCX or PSM2 or portals. So let's talk about this is a frequent question. Okay, so we have a lot of consolidation of libFabric and UCX and I'm going to talk about this more in detail when we get to the MKI layer in the next session. But this is from a build perspective for the moment. So just a little background on OFI and UCX. libFabric was originally created by a bunch of network vendors who wanted to do operating system, bypass and or you know HPC class networking but not tied to the specific abstractions of Infiniband. And so there were three initial companies my own company Cisco and Cray and Intel they kind of formed the first versions of libFabric and since then a whole variety of other network types have been supported by different vendors and third parties as well. UCX really became the next generation higher abstraction Infiniband support. So it supports Infiniband, Rocky and actually I just found out yesterday it doesn't support the work. So I will remove that from the slide here before we send that around as well. It also grew to support a couple others. So Cray Eugenie, POSIX TCP sockets and shared memory as well. So that's kind of the genesis of those two libraries here. This is what it looks like pictorially here. So libFabric is the bubble on the right. UCX is the bubble on the on the I'm sorry the fabric is the bubble on the left. UCX is the bubble on the right. You can see all the networking types that they individually support. And then they both should also support shared memory and TCP sockets. Now while OpenMPI can use both libFabric and UCX because it supports you know represents a huge wide variety of network types. OpenMPI does not use libFabric or UCX for pure shared memory or TCP unless you explicitly tell it to. By default we will use if you're doing TCP or pure shared memory we will use internal OpenMPI support for that. Now there are accelerators too. HCC is all about accelerators these days OpenMPI has CUDA support NVIDIA i.e. Mellanox since they are now merged recommends building UCX with GDR copy support. So GDR is an acronym that stands for GPU Direct RDMA. There are a couple of different flavors of GPU Direct out there. The one that HPC cares about is this RDMA flavor so GPU Direct RDMA. So they say hey when you build UCX make sure you include GDR copy in your UCX and you need to consult the UCX documentation about that. This is a talk about OpenMPI. I'm not going to talk about how to build UCX there. But then you build OpenMPI with CUDA and UCX support with the following two command lines so with CUDA and with UCX optionally supplying the paths to them if you need to. Now PSM2 also supports CUDA. Now when you build CUDA support into OpenMPI the whole point is that OpenMPI can send and receive messages directly from GPU device memory without copying through main RAM. So you basically save on congestion in the box and you also get lower latency. Although latency is not the big deal because usually you're sending and receiving very large messages from your GPU device memory. So latency is not the big deal but congestion and throughput is more of a win with GPU Direct RDMA. Now once you have built OpenMPI there's actually a command that it always surprises me that more people don't know about this command. There's a command called umpyinfo, umpy underscore info. Yes by the way we do say that umpy openMPI umpy that will tell you everything about your OpenMPI installation. So here is just the first couple of things that it shows you so the version number where it was blah blah there's lots and lots of stuff. It will also show you all the plugins that are there all the components and the frameworks that they represent. There's also a dash dash parsable version to it so it's machine friendly so you can do nice parsing versions of it so if you have automation you want to make sure you have a certain feature in OpenMPI you can. And then we're going to get into more in the next session because I'm out of time here now is there's a dash dash all option that talks about MCA parameters which is how you tune OpenMPI at runtime saying hey I want you to use this value for this thing. You can pass in lots and lots of things at runtime. How do you know what parameters are available? Umpyinfo dash dash all options are available. So that's it for Part 1 and we are I'm sorry we're four minutes over but Kenneth are there any questions? Yes we have we do have a couple of questions related to building OpenMPI one is why does it take so long for OpenMPI to run the configure script and is there something that can be done about that? Good question. Um I have to give a tongue and cheek answer first you should see how bad configure used to be. Configure used to take three to four times as long as that we spend a lot of time to optimize it. The reason it takes so long is because it actually configures a bazillion I'm sorry it compiles a bazillion things while it is poking around your system particularly when you have a license locked compiler so that every time you compile a file it has to go get a license do the compile and then release the license that can add to the effective overhead of what configure is doing so it is running a bazillion shell commands of which a large portion of them are compiling and or linking individual files and we're testing just so many different things that's why it takes so long so I'm sorry it's better than it used to be and I guess we're trying to support coffee vendors because we want you to start configuring go off and get a cup of coffee and then come back. Okay we have one more question related to disabled DL open so the configure option disabled DL open and it's implication for forked processes and the question is related to issues or problems that were seen with RMPI package and so I guess the question is a little more context for that I guess the question is what what kind of effect does configuring open MPI with disabled DL open have on forked processes could it cause problems yes I don't think it's forked processes that are actually the issue there is a very subtle issue that comes in here that I'll get into in a second here is a truism that is I think probably true for the vast majority of people on the planet if you think you understand linkers you don't understand anything about linkers I have to say this to myself all the time so the disabled DL open remember what that does is it slurps the components into the library itself so live MPI and for sake of simplicity here let's say it's live MPI so it's a shared library that actually contains all the plugins when you are using live MPI in a I'll call it a non-traditional setting let's say that you're running an application and you actually dynamically open and load live MPI into your process that can be where things get complicated so you're not just MPI running a straight plane vanilla MPI program or s running an MPI program you're running some program that partway through decides to open the live MPI library so that it can use MPI functionality strange complicated things can happen there depending on how you open that library and then how that library opens its own plugins there's things called public and private scopes with linkers and so when you open something into a process you can open into the public scope or you can open it into the private scope so if you open live MPI into a private scope and then open MPI goes to open its plugins it can't see the symbols from the main MPI library and the components will fail because the components open MPI components actually rely on symbols in the main MPI library so the short version of this too late is that if you're going to dynamically open the MPI library you probably want to do disable deal open so that all of the components and all of the core glue symbols are together in livempi.so and they don't have to do a two-step thing to try and find each other in separate linker scopes which may not work if they're private I know that was kind of convoluted if that person wants to email me off-list here be happy to point them okay very good one more question about open MPI support for AMD GPUs whether that's supported or not I do not believe that we have open CL support or any AMD vendor specific sport for that I think we just have CUDA support that being said if you want that please go talk to AMD and encourage them to come talk to us we are very friendly open source community okay good then another question is UCX the only way of enabling GDR CUDA support in open MPI or can it also be used with IB verbs at the moment in open MPI 4.x there is limited CUDA support in what is known as the open IB BTL and I'm going to talk a little bit more about this in the next session but here's a spoiler the open IB BTL is going away in open MPI version 5 in open MPI 5 and forward UCX is going to be the way for Infiniband and Rocky support that has been Melanox's preferred way of supporting Infiniband for years the open IB BTL is unfortunately kind of abandonware meaning that it's pretty un-maintained these days we're not going to kill it out of 4.0.x or 4.1.x because that would break backward compatibility with people with scripts and things like that but Melanox has not maintained it for years so it's been very loosely community supported for the past couple of years because for the most part it just works there haven't really been very many bugs in it that needed you know new work but no new features have been added to the open IB BTL in a long long time so the answer is today we do have limited CUDA support with the open IB BTL and the SM CUDA BTLs but in the future it's going to be more UCX and LibFabric is also gaining its own CUDA support that's still taking a little time they're doing stuff in the LibFabric community itself to support CUDA okay one more question if we have time Jeff or actually a couple people still want to hang on is there any particular magic to build open MPI with best features of HPCX, Melanox and FitEvent solutions Melanox provides its own build of UCX library and others such as FCA or H-Call but how to integrate them that is a great question I wish I had a definitive answer for you what I can do is look this up for the second session that is a Melanox question I'm afraid I don't have that answer they do provide their own build of open MPI itself however a lot of customers at least I know my customers like to we ship them a version of open MPI but because of CUDA and the corporate life cycle it's usually a version or three behind what's on the open source website and so if you want to just go get the latest version of open MPI and compile it against the Melanox libraries it's generally just doing like with UCX equals this with FCA equals that with H-Call equals that yes I don't have a definitive list of those I think the best thing to do would be to ask Melanox it may actually even be included in their documentation too because I know they take quite a bit of effort to both ship binaries to people who want binaries but also remain very compatible with upstream community stuff so it may actually be in their documentation already okay we have another question that's somewhat related to a question I have myself so the question is do you need to compile open MPI for every compiler that you use for the final applications and the related question I have myself is there a way to build one open MPI to rule them all in a sense where you configure it in a way that it's compatible with any system you may use it on an open MPI that you package and then can roll out on any system sure all right let me those are related but different questions so let me answer the first one first so the compiler suites if it's a C application you can probably build with any old compiler and it'll be fine I know that's somewhat heresy some people believe that GCC sucks and you can only get good performance out of Intel recent versions of GCC are pretty good Intel's pretty great let's be clear on that the Intel compiler is pretty great it's also pretty expensive you know GCC 8 9 10 they're pretty great and a lot of what open MPI does does not require all those advanced features that you get for very expensive compilers because we're not doing giant the performance difference that you see if you compile open MPI with Intel compiler suite versus GCC may not be as noticeable as you think I'm not going to speak for either of those vendors well the community and the vendor but you might be surprised so if you have a C based application you probably can't just compile against whichever C compiler you got is your favorite and you're good to go Fortran and C++ are a different issue though there is no Fortran API between compilers there never was in the beginning and now it's just way too late Fortran compilers have gone off their own way and they're never going to ABI just in any kind of practical sense so if you have users who are using different Fortran compilers for their Fortran MPI applications I'm sorry it's a lost cause your best hope is to just have multiple different open MPI installations with different Fortran compilers C++ is a little less of an issue these days because we don't even build the C++ bindings by default anymore because they have not only been deprecated but they have been deleted from the MPI standard that being said they were very rarely used in any real world applications anyway so it's not a big deal but very similar to Fortran there is no ABI between C++ compilers either so if you do have users who are using the MPI C++ bindings and they are using different C++ compilers then I'm sorry you're really just going to need to have you know different installations of open MPI with different C++ compilers so that's really the gist of it with C you can get away with it if you want to with Fortran and C++ you really can't but remember that rule of thumb that I said earlier it is easier if you are just using the same compiler suite as your application so yes it'll work with C but it's still easier if I have an open MPI with GCC and an open MPI with whatever your favorite other compilers are yeah I think that's the rule of thumb there now the other question is let's say I have four different clusters and each one of those has different interconnects and potentially different libraries one of them has CUDA, one of them doesn't one of them has InfiniBand, one of them has Cisco USNIC can you just build one open MPI and a shared NFS volume and share it between the whole thing absolutely can, yes that was actually one of our goals in most cases that'll work fine in some cases it won't just because of the practicalities of the different systems out there so for example let's say my four different clusters represent four different years of funding I bought each cluster I bought one of them in 2018 one of them in 2019, one of them in 2020 and that's three but you get the point they might have different operating systems on them so one might have red hat 7 7.2 another one might have red hat 7.5 another one might have red hat 8 another one might have less whatever they might have different distros and things like that that's where the real trickiness comes in so in HPC the general rule of thumb is that homogeneity is awesome, heterogeneity is hard it's not necessarily the open MPI that spans your clusters that's going to be the tricky part it's going to be the heterogeneity between your clusters not necessarily in the hardware but in the software stack that's going to get you if you have multiple different clusters with say different hardware in them but they're all otherwise homogeneous software stack wise you're probably good to go your users can either just use different command line parameters or environment variable parameters or file parameters to specify oh in this cluster I want to make sure to absolutely use infiniband on this cluster I want to absolutely make sure to use us nick on this cluster I want to absolutely use psm2 whatever so that's where the complexity comes in is the other heterogeneity between it such that effectively you know while it's possible to build one open MPI that rules them all it may be simpler in practice to actually still have multiple installations and I'm sorry about that but that's kind of beyond our control because it's like oh yeah it's the same except for these 10 different things over here that actually turn out to be really important so that's just a practical reality of today sorry okay yeah that's a very good answer I think so maybe two more related questions one that was raised is about the GPU support so GDR cannot work with the system UCX packages from the Red Hat repositories do I have to enable CUDA support in UCX during the build so I guess the question is if you want to leverage GPU support through UCX you need an UCX ABLE build or CUDA ABLE build of UCX so yeah you need both UCX and open MPI to support CUDA is what it comes down to so if you have a UCX install perhaps from Red Hat or somewhere else that does not support GDR then you need to go get your own UCX and install it with GDR support and then build open MPI against that one don't build it against the other one that doesn't have GDR support build against the one that has GDR support and that might mean a little complexity with your LD library path or whatever you need to do to resolve to get the right UCX at runtime but just make sure you do that okay and then related question I had myself is there any negative effect of building open MPI with CUDA support maybe in addition UCX as well when you use it on non-GPU systems so can you make an open MPI build that is linked with or is where the CUDA support built and then use that without issues on a system that doesn't have any GPUs at all so let me answer that in two ways and I'm really wishing I had done a little more CUDA home this seminar today in terms of correctness there is no problem with that at all you should be able to run and still get correct answers the same answers with the same open MPI installation on machines with GPUs and machines without GPUs so from that perspective let's say you have a cluster that is homogeneous in terms of operating system and maybe you even have one head note but you only had enough funding to get GPUs for a portion of your cluster this is actually a fairly common scenario right like oh I got a 64 node cluster and 16 of them have GPUs because GPUs are pretty expensive so yes you should be able to have one open MPI for that entire cluster and on those 16 nodes it will do CUDA level things and on the non-GPU nodes it should do non-Cuda things there is one corner case that I need to investigate I think the right thing happens but I might have to bring this back to the second session that I think we specifically architected it for exactly this case I have GPUs only on a portion of my cluster that we don't actually link against the CUDA library because on those 16 GPU enabled nodes you might have the CUDA libraries there but you don't have the CUDA libraries on the rest of your cluster because you don't need them because they're not being used and so from open MPIs perspective I'm almost certain that we actually deal open the CUDA libraries at runtime so that if we deal open it and find it and it succeeds cool we'll give you CUDA GDR kinds of things but if we don't find it we're like okay well there's no GPU libraries here there's no CUDA libraries here so therefore I'm not going to do any CUDA stuff I can't even tell if you have GPUs or not because you don't have the CUDA libraries here I don't know what UCX does in this case I don't know if they support that or not but I just know from our base open MPI usage that is what works if you have to have two different UCXs I'm not sure I'm going to have to check on that I'm not involved in the UCX community so I don't know often I'm sorry I don't have an answer but Ken if you could shoot me this question afterwards I will make sure to have the answer for the next session okay I'll send you a reminder we have three more questions so one was actually for Ralph that we missed earlier where the question is about testing and benchmarking of UCX whether there's a project available that facilitates that UCX or Pimx Pimx yeah sorry testing and benchmarking of Pimx yeah so unfortunately Ralph had to drop at the top of the hour he had a conflict of something that he could not stay on for so you're stuck with me I am unaware of that for the most part Pimx is just glue behind the scenes that most people don't even interact with it's usually the man behind the curtain you just type MPI run or S run or whatever and magically your MPI stuff starts behind the scenes when people talk about performance problems it's usually glaringly obvious that like oh I did MPI run 10 minutes later my job started and that's usually at very high scale too if you're talking across 32 nodes or even 64 nodes you're probably not even going to notice how many cores you got on those 64 nodes so that's what Ralph was talking about too that at small scale you're not going to notice whether it's PMI2, PMIx just a linear SSH behind the scenes you're not going to notice because it's too small to notice we're only doing 64 of those so what's the matter and I'm fudging too I'm trying to make a point here it all very much depends on your specific system and your networking library and file system support and all these kinds of things but in general when people ask us like hey my startup seems slow we typically tell them to turn on a couple of choice MCA parameters that show in open MPI the timing of what happened during job launch and I can ask Ralph to expand on that in the second session a bit or more to the point can you please take a note and remind me and we will have Ralph expand on that going through some of the things that you go through it's not specifically benchmarking but it's more like troubleshooting that is available okay took a note of that so the second to last question is there a way to ensure that a given transport is used over another say one thing to use TCP-BTL instead of RDMA if it is built with boats yes and I am totally going to defer that question to the next session because I have a bunch of slides on that so I'm sorry to defer your question but it's a lengthy conversation and I want to make sure to answer that correctly okay yeah we'll cover this in the second part and then to wrap up the final question let me find it again here last but not least when are you going to clean up and update the documentation and the FAQs of open MPI oh my god yes this is a very good question and please turn this into an appeal for help we're engineers we are not good writers and also we write code for living we are just as with most open source projects even professionally funded open source projects we're terrible at writing documentation we need help there's a lot of good information out there the FAQ has gotten pretty big and unwieldy and the man pages could use some love as well we could really use some help with this particularly written with a user's perspective in mind because part of the problem with what we write too is we write it from the developers like of course everybody knows this we make assumptions and so sometimes it's not the most clear documentation so if you have some time even if you have like one particular thing that you would love to update in our documentation please contact me or if you have a student who is looking for work or whatever resources you have we could really use some help with the documentation I am the first to admit that it's not necessarily the best organized it could use some updating yes I don't have a good answer other than to appeal to everybody for help okay I think that's very clear yeah that wraps it up I'm out of questions and I think we're also out of time for this first part thank you very much Jeff also thanks to Ralph and we will be back in two weeks from now for the second part thanks everybody we appreciate your time and I thank everybody for hanging on for an extra half hour I appreciate all the questions