 Okay, thanks everyone. Our next speaker is Todd Kabling, right? Yeah. And he's gonna tell us about SPAC. Maybe you have some different opinions than Kenneth, and it's gonna be a fun talk. Okay, so I'm Todd Gamblin, I wrote SPAC, so I'm biased about SPAC. I'm gonna be talking about our new binary packaging capability, and so Kenneth was talking about how the build-from-source tools were slow. We agree, we're trying to add binary packaging to SPAC, so that's what this is about. I'm gonna give a little bit of an overview of first, who's familiar with SPAC? Not many people? Okay, cool, so the overview will be worth it. All right, so SPAC's a general purpose from Source Package Manager for HPC. You can think of it as kind of a combination between Homebrew and Nix. We're targeting HPC and scientific computing, and the community is growing. This is the contributions to packages over time by different organizations. And you can see that up until 2015, it was pretty much a Livermore, I'm from Lawrence Livermore National Lab, a Livermore-only project, and then we started giving lots of contributions after Supercomputing 2015 from many of the other labs. The goals of SPAC are a little different from maybe some of the other tools. We want to allow people to experiment with performance options, so we're not opposed to having you build something that no one's ever built before with SPAC. We'd like you to be able to play with that and tune your software, get it working, and we want to make it easy for you to try different compilers versions and build options, change compilers and flags, and then swap implementations of ABI in compatible libraries like MPI, Lopac and Blas. We can run on laptops, Linux clusters, and we also run on some of the largest supercomputers in the world. They use SPAC at Oak Ridge, NERSC, Livermore, Argon, and some of the other DOE labs, as well as EPFL who's contributed tremendously to the project since the beginning. We've got some great collaborators there. SPAC, from a command line perspective, can have showed a little bit of this already. The idea is that if you just clone SPAC from GitHub, you can get going by just running the SPAC command. It should work out of the box on most systems. There are a few dependencies like Curl and Python, obviously, but we try to work. We work back to Python 2.6. If you clone SPAC and just type SPAC install, some package name, that should work. That's sort of unconstrained, and that should do something sensible on your machine. If you want to get more specific than this, though, and you have special needs, then you could say SPAC install MPI leaks at a particular version. You could say SPAC install MPI leaks at that version with a particular compiler. And then you can add options. The different packages can provide their own options. You can inject flags into the build. And finally, this syntax is recursive. So essentially, any of these constraints that you put on the top-level package, you can put on a dependency as well. And the idea here is that the user just says what they need, and we work out the rest. You don't have to specify all of the parameters required for doing a build. SPAC packages, if you actually want to write them, it's fairly easy. You can just say SPAC edit name, and it'll pull up the package for that thing. They are simple Python scripts. And these look like homebrew packages if anyone has seen a homebrew package before, but they're in Python instead of Ruby. This is all metadata up here. This tells you the versions that you can download. This is the home page for the package in the URL. These are the dependencies here. This is what you would have to say. You can use that same SPAC syntax to constrain dependencies. So you can say this depends on boost at 1.42 or higher with an option. And then in here, this is just simple build instructions. And this is supposed to look kind of like shell, because all this is doing is saying, with this working directory, call cmake with some options, make, make install. And that's it. So it should look pretty simple to a typical user. The other thing about SPAC that we added to address a lot of the problems that we saw with HPC builds is depending on virtual packages. And so if you look at MPI or BLAS or LaPAC, those aren't packages, those are interfaces. And you essentially depend on the interface and not on the package. So a package that can build with mpitch can also build with open MPI usually. And so packages in SPAC, they don't depend on either of those two implementations. They depend on MPI. And you can swap in an implementation. So essentially, the MPI leaks package here would depend on MPI, maybe at two or higher, if it requires the MPI to interface. We can version these interfaces. And these two packages provide interfaces at the interface of other versions. And so essentially you can swap any of those into the build using the same command line syntax. You can tell it which MPI to use. We build with compiler wrappers. This is somewhat important for the binary packaging aspect because we use this to add R paths to the build. And so the idea with SPAC is that we don't actually require you to use modules or any other environment management system. When we build the library, we make sure that the right R paths go into the libraries, which means that they know how to find their dependencies. And so if you build an executable with SPAC, it should just run and it should know where its dependencies live and the user shouldn't have to do anything. And moreover, they can't screw it up with LB library path if they try because R path takes precedence. And we actually like that. We get lots of user support calls from users who have put something in their LB library path. They don't know that they've done it or what it means. And then every one of their packages is screwed up in some way or another. We could very easily put run path in here if people have strong feelings about that. But for now we're using R path. We also inject include in library search paths for dependencies so that essentially you don't have to modify the package build script very much to find your external dependencies. We try really hard to make it so that the build works as though those packages are on the system. Note that this is not the same kind of sandbox that like geeks or nicks use. This is just a separate process for the build where we use these compiler wrappers. We're not doing full isolation yet. And we find it somewhat hard to do that on systems like Cray where there's so much vendor stuff in the environment that you have to rely on. We would like to sandbox more. And then SPAC is designed to handle all the versions that you would wanna build of these packages. Essentially if any aspect of the configuration changes you get a different hash. And so in that way it's very similar to geeks or nicks. We take this whole bag and the metadata on all the nodes and we hash that and every configuration of a package gets its own unique install directory. And the libraries in here like I said they know how to find their dependencies in other directories. We're actually now here again is that you probably knew what you were doing when you built the thing or at least the package manager did. But by the time you get around to running it you've probably forgotten all the things that your library with 50 dependencies relies on. And so you shouldn't have to remember it at that time just to run the thing. The other thing that we do that maybe is a little different from some of the other package managers that have been discussed except for Kanda is we do something that we call concretization and most people call dependency resolution. If you give SPAC this description this is a set of constraints. We generate an abstract DAG from that that tells you here's all the packages you're gonna build with the constraints that you cared about on it. And we run concretization on that to basically fill in all the blanks. And what that does is that by the time you get around to the install method in your package SPAC is going to pass you a full description of what it is that you need to build. And so your task as an implementer of a SPAC package is basically to translate this full description into build instructions. You don't have to do things like is this installed here, is this installed here, is this installed here, what dependency am I building with and things like that. And we store that on disk with fairly extensive provenance and you could rebuild this configuration from that YAML file if you wanted to build the exact same thing again. So source installs are great, but they're slow. I think most people prefer using a binary package manager for that reason. I know I use Homebrew on my Mac which is pretty much a binary package manager these days. I use APT on Debian and that's very nice and it's super fast and very reliable. So we'd like to have the best of both worlds. We would really like to have optimized binary installations where if I'm running on, say, a Haswell chip, then I get that nice AVX optimized version of FFTW when I say install and I get it in a couple seconds. Traditionally, binary package managers don't provide that because they have to build generically so that you can run the binaries on most machines that you would want to. And so I think what we would need to make this happen is while we would first need a binary packaging capability in SPAC, that's most of what I'm gonna talk about today, we would need some metadata describing architecture-specific builds so we would need to know that a particular binary is for a particular architecture. And then we would need good dependency resolution so that we could actually select optimized or generic versions of the package depending on what people want. And so I will say that while I would like to have these optimized builds, some of our users would not because they run on clusters with heterogeneous hardware. So particularly the guys at CERN and Fermilab, their clusters have all different kinds of architectures and if they were to deploy an optimized binary, then it wouldn't run on half their machines. So they like to deploy the lowest common denominator generic things and so we would need some settings for that. So we recently released SPAC version 0.11 and quickly released 0.111, thanks to Kenneth. And so that has this binary packaging feature and so that's what I'm gonna talk about. We have this build cache command where essentially you can say create some sort of, so build cache create will take an existing SPAC installation and create a binary package out of it. You can list available binaries and then you can install a binary, although people don't typically use this directly, the SPAC install command would just do it for you automatically. You can do SPAC install dash dash use cache which says if there is a binary available for something then go prefer that to the source build. We don't make this default yet because we don't currently ship stable binaries. We don't have a repository out there with binaries in it yet. We were just providing this capability. So you could use this site local with your own binaries if you wanted to but we don't currently have a repository of binaries that we're building. We're planning to do that by, I guess, September of this year for certain OSs and I'll talk about that later. And so this was a collaboration between us, Fermilab, CERN, and Kitware and so we're very thankful for our contributors for stuff like this. So we're happy that SPAC's grown a community where we can actually get substantial core contributions from people. So if you want to make a binary in SPAC, essentially what you do is you set up GNU GPG. We actually use signing for our packages and I'll talk a little bit about that later. But we have some SPAC wrapper commands that let you create a signing key pair and then you can do GPG init to tell SPAC to trust the thing that you just created. And once you're done with that, you just install something as you normally would. This thing will go and build from source and then you can call SPAC find to see what's installed and it'll say, oh, okay, you installed M4. It depends on lib6segv so that got installed and M4 is also installed now. And then you can just say SPAC build cache create, give it a pass to where you want to put the binaries and then the name of the thing that you want to install and that's just a spec of something that's installed. And so this will write binaries and metadata into this mirror and so that's a build cache and then you're done. So let's take a look at what that looks like. Inside the binary mirror, this is actually inside of a, we have these concept of mirror in SPAC where you can basically take a directory full of source tar balls and host it wherever you want. The binary mirror sits inside of a regular old source mirror so you can have a mirror that contains both sources and binaries and you can point SPAC to them the same way. You just say mirror add URL and it can go and fetch packages from there. Inside the mirror, we've got things separated by the architecture, the compiler that was used and then the particular library name and version and then the package files are these big, ugly things that you don't have to worry too much about what they're called. But this dot SPAC file here is the actual binary package and then that dot SPAC file is the binary package for M4 so we built two actually, one for the package and one for its dependency and then these down here are metadata that tell a client what's actually in the repository. And so to point SPAC in the mirror, all you really have to do is once you set that directory up you can either host it on a webpage somewhere or you can put it in your shared file system on your cluster, you just say SPAC mirror add, you give it a name so you can call it my packages that's just so you can refer to it again later and then the URL of the thing that you want to add. And once you've done that, you can verify that it's installed if you see it in this list, you say SPAC mirror list, it actually goes into config file and then that you have a mirror installed. So here's what happens in fetching in SPAC now and this is fairly involved. When you say SPAC install say MPI leaks which is a tool that we use for finding leaks in MPI programs so handle leaks and things. What happens is it loads the package file so that package dot py that I showed you earlier. It concretizes the spec for it and it goes and tries to fetch the source code and to find the source code, it looks at several different places or this shouldn't say fetch source code, just fetch. It looks at each of these places and it checks whether there's a binary available. If there is, it goes and verifies the signature and makes sure that it's signed by an authority that it's back trusts. It goes and it installs the package and it does relocation and we'll talk about what that means in a second. If you don't have a binary available we just do the same old build from source so you verify the check sum of the tarball and then you go and configure build, build, install and some of the packages have more phases than that so that's why there's a dot dot dot there. But this is all handled transparently for you so you basically just types back install like you normally would and the ones for which there are binaries available should go faster. So what's in this binary package? It's just a tarball. It has inside of it a tarball of the source. It has a YAML file with the metadata for the actual binary package. That's the spec and some other information and then it has a signature on that YAML file so essentially we sign all the packages with a known key and we will be shipping spec when we actually have a public binary mirror with our own public key so that you know at least which packages are from us and can trust them and someone else could set up a repo with their own key that you can trust. The signature here is actually signing this metadata file and what that does is it associates this spec hash here with the check sum of that tarball so essentially that's how you know that this is a valid package. So why do we check some source files? So typically when we download a source tarball we just go and put the check sum in the package file and say that's what you need to check when you download this thing and then why do we sign binaries? Other systems do provide check sums for sources and binaries in the package files just the same way. So homebrew if you look at some homebrew packages you'll see that there's like a section for bottles in there and it has a bunch of shot 256s for the different binary images that you can download of that package. The reason is because in SPAC the number of binaries associated with a particular source tarball can be large. We are all about letting you configure all of your options and so there could be thousands of different binaries that you build from the same source because we let you have all these configuration options. And so if we actually made you put those check sums for all the different binaries that you had generated in the package files and it would be a maintenance nightmare you would have to keep updating the package files with new check sums all the time every time you update your binary mirror. So to scale this we made it so that we actually signed the binaries. And so we just include the public key or at least this is the plan we would include our well-known public key with the SPAC distribution and then you could easily check at least the binaries downloaded from us. And then if you know Firm and Fermilab and CERN wanna contribute more binaries then they could also include their public keys with the distribution or there would be an easy way to register them so we could increase the number of known signers. And so we've found that this scales pretty well. We just check the things when we download them with GPG and we don't have to update the package files with the check sums of all the different binaries. And so what's relocation? If you download a binary for a distro like RPM based distros like Red Hat or if you download a binary for like APT they're generally not relocatable. And what that means is that the library paths in them are absolute so if you have a dependency somewhere then it basically relies on the thing being installed at exactly a particular location on the system. For SPAC we want users to be able to clone this in their home directory and do this on any machine wherever they want. And so we need relocatable binaries which means that we would download the binary and we would need some way to make it work in some new location. So all that means for SPAC really is that in the actual binary, in the metadata for the binary package which is in that tar ball we record which libraries we need to go and fix up on installation and then which shell scripts have she bang lines that need to be updated. So if there's a hash bang that points to a Python that's in some other package that we depend on then when we install that in a new SPAC that's installed in a different location we need to go and update that path so that it still works on the new system. And so essentially the way that we do that is when we create the binary package we just go and traverse the tree of the package. We look and see what looks like a library what looks like a script and we write down the things that'll need to be fixed up on install and then when we actually go to install them we fix them. We try to make relative R paths in our SPAC packages which means that the library is sort of no relative to SPAC opt the top of the SPAC tree where all the packages are where they live and so we don't actually have to fix up that much but we do support customizing that directory out in different instances of SPAC and so sometimes we'll have to fix up paths within the installation if the dependencies if their tree structure looks slightly different. And so this enables you to get a binary and then have it work in your home directory on an arbitrary system as opposed to requiring SPAC be installed in a particular location. One thing we're not currently doing is relocating compiler runtime paths so the binary stuff will probably only work nicely if your compiler uses the standard G-Lib C and standard C++ version of the system. We're working on adding this so expect this in a future version pretty soon so that if you're using a fancy compiler with SPAC that you can still relocate the compiler runtime paths to an instance of that compiler somewhere. And I'm not sure I know of any systems that do that so that should be interesting. So how do we decide which binary is to fetch? And this is an interesting question because this gets into the interplay between the way that your package manager resolves dependencies and what's available on the binary mirror. Right now what we're doing is we're taking that abstract DAG, we're concretizing it into this full spec and we go and look at the hashes on this DAG and we fetch the exact hashes from the binary mirror. So essentially we look at the graph and we say is that hash on the mirror? No, okay, we'll go and build something else. So the problem with this is that if I build lots of binaries and if my distro is changing over time, there are probably plenty of binaries out there on the mirror that I've built before that would satisfy the things that I need in my spec. If I just said I need MPI leaks and there's a binary for a slightly older version of MPI leaks, I'd probably be happy with it. But because I do this first, I go and concretize and get a description first, I've bound myself to a particular hash description. And so that's what we're doing now. What we're also working on is improving the, this wasn't supposed to be animated, so ignore the order of the arrows appearing. We're improving that process so that essentially we would get this abstract DAG, we would download the available specs from the mirror, so that's why that metadata is at the top level so we can fetch it independently. And then we would actually integrate the available binaries with the concretization process so that the spec would say, okay, I need an MPI implementation. Which MPI implementations are available as binaries should I prefer those? And try to plug things into the DAG that way. All right. So we can ship optimized binaries, time's up, okay. Well, I'll zip through this part. We can ship optimized binaries with spec, essentially the architecture description is part of the binary. And so we can find out if something is well suited to our machine. And there's a lot of things that we're working on doing for trying to detect which binaries will actually work well on our hardware so that we could prefer generic or prefer optimized if we want to. Some issues with that are that some architectures don't lend themselves to easy descriptions so we may have to include some more data about the ISA in the binaries and do some more matching there. And selecting the correct tuning may be tricky. So we're looking at how we could prefer generically tuned or specifically tuned binaries. We're trying to build a binary mirror for spec that's available publicly so that there's actually binaries out there. You don't have to build them yourself on the site. And we're expecting to get that done by September. So that's what we're working on now. There's also spec stickers in the room so please take some if you want them. And I guess I'll open it up for questions. Thanks. Just to be aware of the F-bots because you will not define a consuming file or even some library but also it's about Yeah. So we actually, we look for, we look at generic text files so that was an oversimplification for the she bang thing. So if we find a text file and then we'll go and look for the path in there and try to relocate it. For the binaries we use patch-elf or install name tool. Yeah. So currently they have to be installed by spec but spec does support external dependencies. And so we could look at the metadata that we have on externals. If you have a particular thing installed as an external we can relocate for that. Currently it's just back installed stuff. Any more questions?