 Welcome to another edition of RCE again. This is Brock Palin. You can find us online at rce-cast.com You can find links to the blogs to Twitter's and all the usual stuff on there Once again, I have Jeff Squires of Cisco Systems and one of the authors of open MPI Jeff. Thanks a lot for your time Hey Brock nice to talk to you again, and so This is one of those little awkward times right where we're in post-super computing Pre-Christmas, you know, you might get the episode out by the end of the year might be early January but Regardless of the strange timing we have an interesting topic today because this is actually something that I am Actively working on and with our guests that I actually work with them quite frequently So I'm gonna have to intentionally play dumb to some of these questions to give a little just just pretend It's an actual interview Okay, so yes, you gave a little bit away there, but our guest today is going to be talking about a library called lib fabric So Sean why don't you take a moment to introduce yourself? So hello Brock Jeff. My name is Sean Hefty. I currently work with Intel. I've been with Intel about 20 some years I'm involved with with open fabrics Developing network and high-performance interfaces to hardware also work with the Linux RDMA kernel Then to focus what we're looking at with the lib fabric. You may also have heard hear me call it OFI open fabric interfaces this is a basically a new set of interfaces that we're trying to develop in the open source to bridge the gap between higher-level applications like MPI and P gas and shimem to a High-performance hardware like in finna ban I warp and those sort of devices so Can you expand on what exactly a fabric normally means to most people in high-performance computing? So our use of fabric is basically a small self-contained network The immediate clusters as opposed to the internet. So we're looking at really systems that are tightly coupled Usually Often may not have routers involved. Usually it's just going to be switches if there are routers involved. There'd be a very small number So we're talking about a network that really exists within a single building maybe one floor of a building for Systems that are trying to solve one single Problem, okay, so what is lib fabric then? So lib fabric is an effort to it's a library It's basically started out as a framework of different sets of interfaces that work together to try to export Services of a fabric or of a network up to the applications So it's not just really this is an interface that you want to have the network These are really the services that the network can offer stuff like RMA or RDMA capabilities Atomic operations may be something like collective operations So we're looking at trying to define a library that exposes these interfaces these capabilities up to the apps For them to be able to make take advantage of what the hardware can really provide and what the switches can help provide as well And and so is there really nothing like this in existence already? I mean what what is the the use case for creating a new library versus for example? Expanding on an existing library So we looked at at several libraries that the most common one you see in existence really now is is lib and the verbs library it works well for infinite band devices It's designed around the infinite band hardware and what the infinite band hardware is Exposing and capable of What we're trying to do is expand that beyond this is not just infinite band We want a set of API's that are fabric agnostic for example We don't want to tie it into a specific Model of you must make progress this way You must implement this sort of feature in your hardware We want to be able to say you implement it in software implemented hardware from our interfaces. We don't care how it's implemented We don't want to tie into a specific vendor or a specific topology or Architecture such as infinite band So I don't know that there's really anything nothing we've seen that that's out there that tries to Provide this lower-level interface That's an abstraction of what the services are that the fabric is providing Now in some ways the way you just described that is kind of like the charter of MPI itself, right? It's meant to be a relatively low-level low-layer abstraction gets you into very close performance Harmonization with the network hardware itself, right? So you're adding very little software overhead and things like that You know your low-level and and provide abstracting away what the underlying network actually is. How would you? differentiate this from the MPI API So this is I think a set of interfaces that are In pi is one of our primary consumers, obviously MPI shim and the PS But MPI is one of our major major focus areas. So this is providing Interfaces so the MPI doesn't have to talk directly to hardware. We don't you really want MPI to have to Code to every piece of hardware to write to hardware registers for example So this is basically an abstraction above writing to the hardware registers now What are these are the services that the providers giving up to MPI so the MPI can really focus on? Creating up optimizing for different types of hardware Based on different features rather than optimizing on die this vendors Version three other hardware does something. I want to use that vendors version four does something different really abstracting that away from the from MPI so that the vendors as they change their hardware can add new features without requiring MPI to change And to update for every feature that that the hardware is giving So it's going to be some protection I think from MPI to have to change for every piece of hardware the hardware vendor themselves deals with Providing these these services up to the application So some network vendors already provide pretty high-level abstractions instead of verbs things like specifically MXM mx create portals what Why not just use one of those or build on top of one of those? So in some cases the the interfaces you mentioned are proprietary They're a vendor specific MXM for example is is melmox Specific so you you really would be asking other vendors to try to adopt a Possibly a competing vendors API as their own So you may even have the ability to do that because of copyright or or IP restrictions on using those interfaces In some of the cases that those API's are abstractions, but they're really abstractions on top of lower-level API's And that's what we're targeting is that lower-level The for example MXM is simple And It's really targeting a specific type of hardware, which is infinite ban MX for example targeting near net so again you have a vendor specific it sort of interfaces That was targeting a specific vendors hardware And even Intel has the PSM as an interface that targets their hardware and and their protocols that they use So this is rather insane. We don't want to necessarily pick a single vendor because you for an open-source Effort here picking a vendor. It's that you're not going to get pull or push behind it To say hey, yeah change your hardware or change your implementation to use this other vendors interface So I think there is an issue there at least from a business sense of trying to say We can't pick a specific vendors hardware or interface To move forward. Are there other alternatives that we can use? Okay, so those interfaces all proprietary But obviously you're still going to be running on those networks So is there going to still be like a vendor provided lower level thing they give to Libfabric There there will be so Libfabric is designed and we're not tossing the ideas And trying to start over from scratch or trying to make sure we leverage all the ideas that these existing interfaces have The services they have the capabilities they have and just try to improve upon it So within like lib-by-be verbs, for example, you have vendor specific plugins to verbs to interface That they do the implementation We have the same concept in Libfabric each vendor will provide their own implementation to their hardware of these different interfaces okay, so let's take a step back here and Say why why should HPC customers or even end users? Why should they care about Libfabric? You know if Melanox has frankly done a brilliant marketing game Of you know getting their brand and Melanox and RDMA and verbs into you know the lexicon of the HPC community and so Is there going to be any kind of difficulty in getting Libfabric adoption if people think oh well No, I need verbs to get good performance or so on so, you know Translate it differently. Why should HPC customers care about Libfabric? So I think the primary reason is verbs is really designed around infinite ban and infinite ban hardware So if you have verbs based hardware verbs can work for you fairly well It's when you have hardware. That's not verbs based for example Intel's own infinite hardware is not verbs based Cisco's hard has hardware that's not verbs based. I believe there's others from like Cray Bull there's there's other vendors who have hardware that's high performance Targeting the HPC industry and their hardware is really not built around verbs and supporting the verb semantics ordering progress model That you really need a new interface to take advantage of so what we're trying to do with with Libfabric is we know we have these different interfaces we have Essentially we're ending up the situation where every vendor is creating their own set of interfaces to their hardware and we're trying to say We don't want to have five or six or seven different interfaces that everybody has to code to that that ends up Fragmenting the industry and really hurting adoption of these high performance fabrics If we can get these different vendors to come together and say here's a single interface It exposes the hardware features and services that your hardware can do and it allows you to do it efficiently Then hopefully the vendors can adopt Libfabric The MPI the P gas those people will start using it And then it kind of springs off of that rather than having to code and support every different interface that every vendor wants to provide So obviously people are going to be running on actual hardware that has all these features and stuff like that So aren't you just moving? the work from The MPI vendor to the Libfabric community I mean no matter what it's got to be written. So are you just moving it from one place to the other? In a sense. Yeah, you're moving the work, but you're moving it from MPI into the actual vendor The vendor should have the best idea of how to use their hardware And they can basically add new features to their hardware Move stuff for example implementation from software to hardware change how the hardware implements certain features and all this can be done without MPI having to change so as the hardware vendor updates their their drivers changes their provider MPI will just see a performance gain Versus having MPI have to do that coding and then again the alternative is Changing MPI means you not just changing MPI, but you're changing every application that wants to write to these interfaces So by putting it into provider you have one place where the provider can make these updates all Applications so every MPI that's using this gets an update. You don't have to update open MPI Impitch OSU MPIs Intel MPI and not to mention stuff like Shemem the P gas co-ray for trans Parallel C compiling You don't have all those applications needing to update if they all just write to the fabric then as the The provider does their updates adds new features The applications can just take advantage of it So now you mentioned more than just MPI there So, I mean how much are you targeting here? Like what what are the use cases that you expect lib fabric to be used in? So we're targeting a fairly broad range with just the initial target is really focused at HPC But obviously we we want to be able to expand to as large an audience as possible We want to look at the enterprise space. We've had discussions with like the non-volatile memory extension group the NVMe We've looked talked with storage vendors and what sort of features are they looking for out of these APIs? We've talked with the Oracle database systems IBM To see what are they needing from these interfaces? And then you have other applications No data streaming video streaming type applications. What are they really looking for? Even some of the specific ones like no Google Facebook when you have discussions with them and what sort of network interfaces are you looking for? What sort of features are you looking for? Can we meet the needs of those vendors as well? And it's part of what the way lib fabric is designed is to try to identify What are the common features they're wanting? Exposed services in a reasonable way to meet as broad the applications as possible But then still allows room to grow saying we're not going to be able to hit everybody's needs But make it so it's easily extensible To say well, here's a brand-new set of requirements from this other group that wasn't known at the time We wrote the original API can we easily add sets of calls to support what that application needs Without having to to rev the entire API So it sounds like you're actually creating a new API then so if something Like this isn't a drop-in replacement for lib I B verbs if I have some software in this transition period I still need to have both libraries. Don't I? Yeah, so it's a new set of API's Look there could be a 1.0 API set It's not trying to be backwards compatible with verbs or PSM or any of the API like MXM or MX any of those API's it's a brand-new set of API's that are being defined for the applications to use Okay, so going through this whole process, you know, you mentioned that you you've worked on the open fabrics You know the kernel stuff and all that sort of things What have we learned from the last we have to ten years of verbs and Ofed and you know, what have we learned? Yeah, so verbs has been around for about ten years. There are several things have shown up over that time So that one of the one of the biggest issues with verbs when people talk about verbs is verbs as its Self is not a usable library. It doesn't have any way to set up connections for example So verbs really has this dependency on some other library In order to help set up the communication path And that's one of the problems that showed up that they even Red Hat came Came to the Open Fabrics Millen list and said hey, here's an issue. We've got these two libraries. They're very closely coupled They really should be the same library with there should be one set of interfaces to do this So we want to make sure that we we don't ignore that and say, okay We don't want these separate libraries just because we need to have these together We need to make sure that the management pieces Are not necessarily exposed directly to the application So an application doesn't need to code to the infiniband sub-dem manager The application just doesn't care what sort of fabric it was We want to design APIs around that support as well And then when you get down to smaller details this really shows up in the last two to three years when You have co processors added back to the systems. It's like Intel Xeon Phi You basically hit a power barrier that per node and You can either have a very small number of cores going fast or you have a very large number of cores going slow And as we've gone to a very large number of cores going slow You start to see inefficiencies in the software show up even more so data structures that are large Or addressing that's very large If you have to have all-to-all connections the memory flip paper becomes huge on these systems Because you have so many cores involved Every time you have to take a branch in the code You start seeing the impact there on the especially the slower cores like that the Xeon Phi or even like the NVIDIA offload You have impacts there that requires you to rethink how you've designed these APIs What are those data structures look like? What does the function signature actually look like? And this is where it becomes difficult to try to fix an existing API You really need a whole new set of function calls to be able to fix these problems So what is this status? I mean, where is the libfabric project? In terms of design in terms of implementation in terms of availability things like that So it's been done. There's a working group. They meet every Tuesday currently Again, it's open participation. So it's set up within open fabrics as an organization But you don't have to be a member to participate. We have several non-members there We've gathered a lot of requirements probably a couple hundred at this point We have a library that's been developed. We have a sort of interface that's been developed We're looking at having A release process started in q1 of next year With probably the first release of libfabric sometime at the end of q1 probably in march of 2015 Now is the design of the api stable? Is this something that Developers should start looking at the api and thinking about their applications and how they would adapt to it And and obviously we're talking lower layer of things Like you mentioned, you know, mpi implementations pgas implementations schmem implementations things like that So the api is still changing. I would say it's mostly stable. It's it's Not huge changes. There's been minor tweaks Right now would actually be a really good time to take a look at at libfabric and the apis Because you have the ability to help steer The current api before we freeze for a 1.0 release So if you want a chance to actually get in and help steer certain calls certain ways or define a certain functionality now It's the time to join If you're wanting to just code to a stable api you'd be looking at more towards probably the middle of q1 So into january start of february If you wanted to try to start developing to these apis to see how well they match So what networks do you have kind of working right now prototype or production? What do you what do you've got? So there are currently four different providers There's a provider over syscos us nick There's a provider that that sits on top of intel's infiniband PSM hardware Those are both fairly complete and and targeting fairly high performance There's also a provider that sits on top of verbs. So this is actually a layering of libfabric over libib verbs in the libarty macm And that's there to give you functionality overall existing verbs hardware So rocky infiniband iwarp devices should all work Through the verb provider You can even though it's layering it's it's still looking at trying to be as performant as possible The performance it would would probably be unmeasurable by most applications You're talking about just a translation from one son call to another in most cases And then there's a sockets provider Which just runs over a normal tcp or udp The primary focus of the sockets provider is so anybody writing to the libfabric interface They don't need special hardware. They can just write an app On top of their laptop For example on top of just some system and and see Are these interfaces actually working for them? How close does it match? What does it take to get it to run? So they don't have to run on special hardware at the moment So the libfabric is designed again very similar to verbs. You have this framework Where the libfabric defines just these apis and the semantics of those apis and the behavior that you that you really want And then under that you have different vendors Or different software libraries that feed into those apis and implement some set of those apis And we just call those plugin modules providers So there is so much like npi has plugins libib verbs has plugins libfabric also has plugins And we have multiple providers there All right, so you have these providers plugin types and so on you you listed Several different types that are there already Or are currently being worked on how how hard is it to add? Another one, you know, let's say I've come up. I'm a hardware vendor and I've got my brilliant new network type And I want to add support for it to libfabric. What do I need to do? So if you just download the source code You add your code into a provider and the the real difficulty is can you support those interfaces? And for a provider A provider could basically pick and choose which set of interfaces they want to support I think of it almost like sockets I can implement just a udp Implementation of sockets are going to implement tcp type sockets And libfabric has the same thing. It has a very wide range of apis But a provider doesn't need to support everything. They can just support what they support well and what Depending on what applications they want to target as as well So it's fairly straightforward. I think for somebody to join in to create a provider for these apis It's really how much effort it takes on in terms of talking to their actual hardware Okay, so you mentioned earlier that you know q1 Um for the api kind of selling up. What what is the path for libfabric should we start using it now if it's supported on our hardware Should we wait and then like what's your plan after that? So the plan is the beginning of q1 To be able to produce a package like an alpha package So application developers can look at these apis look at the stuff start coding to it and see Is there a gap for their application? So we can get that feedback in and then The the open fabric interface working group Which is the the group that meets every tuesday to discuss these apis Also, if we call it opi wig so They defined a basically a time-based release So beyond the initial q1 release, we're expecting every three months So basically once a quarter to have new releases of libfabric With any bug fixes Or new features or new apis defined Coming out at every the fairly regular interval For the applications Along with this initial release obviously we're we're going to have some Sort of test programs that are going to be available And also we're going to try to enable several applications such as mpi So that when libfabric is released there's at least an mpi Available to be able to to show how an mpi would make use of these apis and run over it Yeah, let me just Kind of violate my my interviewer abstraction here and throw something else onto your answer there too An interesting thing to me at least About this whole libfabric process Is that it is kind of being co-designed from both sides, right? So both from the the network provider layer and the consumer layer and the consumer At least from my perspective is mpi. And so for example There is already two different flavors of libfabric usage in open mpi in the development version of open mpi And there's at least one flavor of libfabric usage in mpitch as well And so I know that all the mpi vendors are or at least the portable ones are looking very very hard at libfabric And contributing back saying all right. Well, this api is okay But this one over here I think we need to tweak it a little bit to do this and we need a new constant to do that And you know after I tried to code it up and and uh, you know have mpi send right over libfabric I discovered a few more issues and Da-da-da-da-da-da-da and it's been an iterative process That I think has and actually genuinely made, you know, the end result be better because It's really both the networking people talking to The mpi applications people and the schmem people and the pig ass people Although I think in this case the mpi community might be a little further ahead in in terms of implementation of using libfabric and so on And then we're all kind of meeting in the middle in this libfabric playground And trying to set up our toys and be able to share well And it seems to be working pretty well Yeah, I think that's actually a really good point Jeff is this is I think differs from a lot of the the other api design that's been going on Is that this is really trying to merge Both the application developers along with the hardware providers Underneath and make sure that they get apis that fit well together Because what we've we've seen is if you get an api that's too far from the hardware You end up getting getting in your inefficiencies that way But if you get an api that's too far from the application You get inefficiencies up there So it's really kind of a balancing act to make sure you get an api that that matches well to the application But still maps well down to what the hardware is capable of All right, so let me put my interviewer hat back on here um and say all right, let's um How how do I actually get involved right? I'm going to continue my question from before I'm I'm a hardware vendor I've got a new hardware type and you talked a little bit about you know How I would you know add a new provider or plug-in, but how do I actually get involved in the community? Where where is it hosted? How do I get the code? How do I learn what I need to do for my code? things like that So I think probably the first starting points for anybody trying to get it involved Is to go to the open fabrics.org website and join the oafi wig mailing list And once you're on the mailing list you can you can submit questions Ask to join the meeting for example if you if you have the capability to join the meetings on tuesdays So there's there's a lot of people that join these meetings every week You can bring up if you have topics you can talk to one of the co-chairs So I'm myself co-chair along with paul groen from kray You can send us information saying I want to bring up a topic at one of the meetings Can we schedule that and we can you know try to schedule a time for you to bring up the topics If you have questions about the code you can post it to that mailing list um, and then also there's a github page and where we have information about lip fabric All the source codes up on github. There's a fab test Repository up there too where we're building a test framework for lip fabric You can go there Look at the issues. Look at the source code Even post comments on some of the issues or if you have a Open an issue even you can open an issue and saying hey, I have some hardware I want to fit in here. How do I do it and somebody may well should be able to respond to you up on the github is directly So you said something there. Um, is lip fabric part of um open fabrics enterprise or is it something else So lip fabric is the output of the opiwig so that the working group It's not part of open fabric distribution So the the opa distribution yet because we don't have a release yet But yeah, it'll be planned to to merge in and and line up with one of the opa releases So eventually should be included in like a whatever the next opa releases after lip fabric is released So what's the the license uh for lip fabric because that's actually quite important In our world to make sure that all the licenses aligned with each other. What should people expect? So lip fabric is is under the license. Um that the open fabrics uses which is a dual gpl version to bsd license Again, it's an open source project. So anybody can contribute to it. Um, they need to contribute under the same license Uh, but we're not discouraging Uh, for example vendors from having proprietary plugins to this. So the lip fabric is supporting Uh, basically lip fabric ships with the providers So so this is one of the changes as well as is when you get lip fabric You should have access to the providers all for the providers. I mentioned earlier are already shipped with the same package The same library, uh, but you could have external, uh, providers that somebody can just load and plug in So one question we kind of left out before is what operating systems, uh, is this gonna support? It's targeting only linux at the moment. There's been several people who've brought up other operating systems But the only one we're really looking at at the moment is is linux Is there any particular kernel version that's required? No, so so lip fabric does not Have any framework that talks to the kernel Um, so this is again one of the differences between like lib ib verbs or the lib already may cm Which uses specific kernel interfaces to the devices So lip fabric doesn't define any kernel interface. Uh, so it can work with any kernel Subject to whatever the providers constraints are So, uh, a lot of us who run systems we worry about counters and metrics and all these other things and and those are normally, you know, network Or vendor specific, but sometimes they're abstracted out, you know, ethernet being the same pretty much everywhere no matter what vendor it is Um, does lip fabric provide a tools interface or things like that or do we still have to depend on the vendor? So currently it's still vendor. Um We've had discussions About how to expose counters in a generic way how to expose topology information in a generic way And I think we're going to continue those discussions within the oafi wig Into early next year So I don't see those showing up within let's say the next three to six months Within lip fabric, but it is an objective to see where does it make sense to actually have these common? counters common event registration mechanisms And common ways of reporting the topology type information So I know you already mentioned it, but can you uh mention the lip fabric website again where people can find it and how they can get involved? So again open fabrics dot org Uh, that's the main website. You should be able to find from there how to join the the oafi wig mailing list um Also, if you go to github.com slash oafi wg That'll give you the the main github Location for the oafi wig work for lip fabric and the the tests being developed along with the pointers from there to the the website um And the main website is oafi wig dot github dot i o slash lip fabric Yeah, and I think if you google for lip fabric, um It'll probably be in the top Page worth the results right now when I do it today I see a couple of blog posts that I've written about lip fabric and then the github and and things like that So it's still it's suffering from that new project, you know google ability type of thing There's not a million links to it yet, but uh, it should still be in the first bunch of results that you see Okay, Sean, uh, thank you very much for your time Thanks, Sean. Thank you