 Okay, so I'm Todd Gamblin. I am going to talk about compiler dependency resolution in SPAC or rather how compilers affect dependency resolution in SPAC. Just for some context, I'm from Lawrence Livermore National Laboratory. It's this square mile facility in Livermore, California. It's about an hour from San Francisco. It's maybe 6,000 people who work there. We do national security stuff for the Department of Energy. And we are pretty vested in high performance computing. And so that's where SPAC comes from. This is Sequoia. It's a 1.5 million core system. The cases are off, so you can see the network on there. These are the kinds of machines that we have to try to build for, and it's not pleasant. And so that's where SPAC has come from. SPAC is a general purpose from Source Package Manager. It's inspired somewhat by Homebrew and Nix. And like I said, we're targeting HPC in scientific computing. It has a growing community. These are lines of code by different organizations in the packages over time. And so you can see that like up until about 2015, it was all Livermore. And then we started getting contributions from all sorts of people after that. And it's been going well. The goals of SPAC are focused around flexibility. We're not trying to be a stable distro yet. We're mostly focused on helping people get things built for high performance on big machines. And also running on laptops in other places that scientists need to run in commodity Linux clusters. We want to make it easy to build packages with lots of different compilers, different compiler options, different flags to swap compilers in and out of a build and to swap implementations of libraries. So like MPI and other ABI incompatible libraries, we want to be able to swap those in and out easily from a build. And so the end goal is to help scientists in the Department of Energy and elsewhere actually get their software up and running on fairly exotic architectures. How do we do that? We have a CLI syntax that's supposed to make it easy to install a package lots of different ways. In the base case, it looks an awful lot like another package manager. You might say SPAC install MPI leaks. That's a tool that we developed for analyzing MPI programs at Livermore. And SPAC should do something sensible if you say that. If you want to get more specific, you could say install MPI leaks at versions 3.3. So you can specify the custom version. You can say what compiler you want to build it with. So you can say percent GCC at 473 for that. You can add build options that are exposed by the packages. You can inject C flags into the build. And then this syntax is recursive. So you can basically apply those same constraints on your dependencies. So that's saying I know that thing depends on MPitch somehow. So build it with MPitch version 3.2 and build that with GCC. So how does this look from the package perspective? We're trying to keep that simple, too. The SPAC packages, this should look familiar to anyone who's ever seen a homebrew package. They're just Python classes. It has a simple Python DSL. And what's going on here is all the stuff at the class level is metadata. So this is your home page URL. These are the different versions that you could build with the package and also the hashes that you could download them from. We can sort of try to extrapolate these URLs from this one. These are the dependencies. And so this is CMake. It's a build dependency. These are all the link dependencies. And you can see that you can provide those same constraints that you can provide on the command line to the actual dependencies. So you can say that this thing depends on boost 1.42 or higher with the multi-threaded option. The methods, the actual instance of a package is basically an installer. We call these methods on it to get things installed. And so all this does really is it deals with CMake and it calls make and make install. So we passed the dependency DAG in here through the spec. And so you can refer to all of your dependencies in some fixed configuration from inside the package file. The idea is that by the time you get to the install method, we've already worked out what the DAG should look like. So you shouldn't have to do any searching around on the system. We just tell you what to build and you translate it into build instructions. The other thing in SPAC that we do to enable that swapping is we have virtual dependencies. And so the idea is that you depend on an interface. Something like MPI or maybe people are more familiar with like JPEG where there's JPEG and JPEG Turbo. Those are interfaces and they're not implementations. And so in a lot of package managers, you can only depend on specific packages. Here you can have one package that provides MPI and they can provide it at different versions when they're at different versions. So I can say I'm in Batch. I provide MPI version 2 when I'm at 1.9 or higher. And then the actual packages like this one here, you depend on MPI at a particular version. And so SPAC does the job of looking at this DAG saying, oh, it depends on MPI. What version of MPI does it depend on? And I'm going to pair it up with one of these providers. And you can also specify a specific provider on the command line. And so it's really easy with this to type a bunch of commands and essentially get lots of different builds of the same package to compare and to sort of settle on a nicely tuned one. To do the compiler swapping, we build with compiler wrappers. This is kind of like homebrew shims. These are basically little wrappers that add, include library and R path parameters to the things that SPAC builds. And so what this guarantees you is that all the libraries that you build with SPAC, all the executables and things, they know where to find their dependencies, which can be useful if you're building lots of versions of things and you don't know what you were doing at build time when you get around to running things. To manage all that versioning, the way that versions work in SPAC is, I mean, we have the actual version of the package, but we have those other options as well. You can think of a SPAC version as this entire dependency DAG for a project. So this would be the version of MPI leaks plus all of its dependencies plus their metadata. That stuff turned into a document and then hashed. And so then you take that hash and we give every package that we install a unique installation directory like this. And essentially the libraries in each of these directories, they know which other directories within the same install route to find their dependencies. We like R path a lot because the rationale is you knew what you were doing when you built the package or at least the package manager did. And by the time you get around to running it, we don't want to put it on you to remember all the dependencies and things and where they live for these packages, which is what people have done manually for a long time. So we're trying to fix that. The dependency model centers around something that we call concretization and what most package managers, I guess, call dependency resolution. The concretization is kind of like dependency resolution except for, like I said, we're solving for more than just package inversion. You can depend on something plus an option. You can depend on something with a particular compiler. So the constraints can be a little more extensive throughout the packages. So what happens when you type that on the command line, if you say install MPI leaks with this dependency at that version and this other dependency in another version, is we build an abstract DAG with some of the constraints on it. And then we take that, we run it through the concretizer and its job is to fill out all the details on this thing. So that by the time you get into install, you have all the information about the thing that you're going to build and you don't have to ask any questions. You don't have to go searching for things. You just say, okay, what did SPAC tell me to build? The other thing here is that, and this will be important later, is that the nodes here, they can have different compilers. So currently we're modeling the compiler as an attribute on the node and I'll get a little more into that later. The other constraint that we impose on this is we want there to be only one instance of a particular package in a DAG. So if I do one build of an application, I enforce this constraint that basically there can only be one call path library in here. That's imposed by the native language runtime. Languages like JavaScript have support for multi-version DAGs and their module system will actually allow you to have one package over here calling one version of a dependency and another package over here calling a different version of the same dependency. We're dealing with native stuff and so we have to deal with LD.so or LD or whatever the linker is on the platform. And so you can link an executable where there are different libraries in your DAG that depend on different versions of something like lib standard C++ but you really don't want to because it's a race. At runtime basically the first version of that library will get loaded. The second one gets invoked and then it ends up calling a function in the first version and if the ABI is incompatible you get a nasty error and things explode. And so one of the things we have a lot of problems with is managing all the different versions of C++ compilers in the standard lib with the recent proliferation of versions of C++. I think that's a good thing personally because I think a lot of good things have happened in C++ lately but it is something that makes the build harder for users especially if we're using older versions of the OS like REL6 where the default standard C++ is pretty old and you often want to build with a new one that's not the system version. So in general you don't want to have two versions of one library in the same process base. Alright, so why aren't compilers proper dependencies in SPAC right now? Well it's mostly for expedience. We wanted to mix compilers in one DAG because we wanted to play around with lots of different types of compilers and we can't do that with our one compiler per, or one thing per, one instance of each package per DAG restriction. And so what SPAC does is when you run SPAC for the first time or if you run SPAC compilers it searches your environment and finds all the compilers in your environment and sets up a file like this. Essentially this says, here's your C compiler, C++, different 4-tran compilers. What operating system was it for? Things like that. Modules, if you can load it from a module. And it uses this as the description of the compiler and you have this SPAC identifier down here. This is what gets put on the DAG right now. So we'll just say use that particular compiler and we trust that if we detected it then it's the same version across different systems. So that's what we did. Why do we care about that so much? So we want to use lots of different fancy compilers for high performance. We like running the Intel compiler. It tends to get better performance than GCC on the Intel machines we run on. If we run on a Cray we tend to like the PGI compiler or the Cray compiler. And there are other nice tuned compilers for different systems. On the IBM's XL tends to get the best performance slightly, even though it takes forever to compile. And then the other thing is that on many of our machines we actually have to use a special compiler because we're cross compiling for the compute nodes. So like on our Blue Gene systems, the front end is not the same architecture as the back end. And that's the case on a lot of the Crays today that use Xeon Fies. They have a Xeon front end and they have a Phi back end and you can't compile, you can't run back end executables on the front end. The front end is where people log in to build. And so the other reason that we care about the compilers is people want advanced compiler features. So a lot of our programmers are using things like OpenMP to do threading on the big fat cores that we have on the front end nodes. We care about those C++ language levels and features like that. We care about building with CUDA. All of those things are features that are exposed by the compiler, or you might say that their interface is exposed by the compiler. And we want to be able to experiment with them. And it would be nice if we had a proper dependency model for that stuff. So all of these things pose some challenges for the dependency models. I already talked about this a little bit, but the fancy vendor compilers do tend to get better performance. The other issues that you can have with those are that if you're using those, most of the OSS projects don't test with them. So if you're trying to build something like CMake with the Excel compiler, it is unlikely to work because they probably haven't tested with it. And there's also no reason to do that. So for a lot of these build dependencies, we tend to not build them with the nicer compilers. We'll just build the numerical stuff that does all the number crunching and we'll try to build dependencies with something a little safer because we don't care about the performance. So what does this look like? We have a DAG like this where I've got a bunch of different packages. The B's are build dependencies, the L's are link dependencies, the R's are run dependencies. I want to go and build this so that the build dependencies get built with the easiest compiler possible where I don't care about performance. But anything that links into this thing, even on the back end, which I intend to run fast, is going to get built with the fast compiler. What we've tried so far is we've made it so that we just, if anything is a pure build dependency, anything under there that needs to run in the build environment, we just build that with the easy compiler. And so what ends up happening is, so this is a pure build dependency here. You get this whole tree building with the easy compiler, so like GCC or something. This is a link dependency, so it forces this guy to be built with the fast compiler along with the root. And so on down this link and run chain, and we know that this needs to run in the run environment so we figure it's something computational. So we'll build that with the fast compiler too. And then this is a pure build dependency, so we just make that easy again. And you'll notice that this is a build dependency, but it's something that's fast that depends on this other library as both a link and a build dependency. So here we're just prioritizing the link dependency because we can still use the code that we built if we're using two different compilers for the same architecture. So we get away with that here. This seems to work pretty well for a lot of our numerical codes and we're planning on merging it in. Things get a little more complicated when you try to cross compile. So why do you care about cross compilation? Well, your big machine has a different architecture from the front end. So the user login nodes, we tend to provision with these nice big fat cores for building like Power 9s or Power 7s on the actual blue gene machine. They run a standard Linux distro. It's a nice, easy environment to work in. But to run on the compute nodes, you know, we run a special core. It has a lightweight OS. It's not the same runtime that exists on the front end. And that's for performance and for reducing OS noise in these parallel applications that are synchronized frequently. If you have a lot of noise, the synchronization can be a lot more costly. So we care about that when we're running on a million cores. You basically build the things here. You submit jobs to a workload manager and it runs them back there. And then that's how you get your speed. That's how we get machines this big is by packing the processors more densely and using special cores. Why not build natively on the compute nodes? In many cases, if you build on the compute nodes, it's pretty slow. We have lots of those cores, but that doesn't mean that they're fast. They're fast when you use lots of them. But if you look at, say, a Xeon Phi, it's like it's one chip. It has 72 atom cores on it. Each one is 1.4 gigahertz. And if you tried to build any kind of significant thing on that, it could take hours and hours. Typically, those things only talk to the network file system, too, because the nodes are diskless. And so this is a giant pain to build out there. And then also, many of the compilers aren't even ported to the compute nodes. They only run on a straight Linux distro. And so we don't really have the option of building on the compute nodes in some cases. We've also had people try to do a make-j out on these Xeon Phi chips and they've blown out their Intel license because there were so many threads on the thing. And so they started 256 copies of the Intel compiler and only had 64 licenses. So that can be sort of a problem. So there's all kinds of reasons not to build in the exotic place. So how do the build dependencies actually work with cross compiles? So recall, we've got these three dependency types. Build are things that you want to run at build time. Link are things that the package is going to link with. And runner are things that you might invoke at runtime. So the build and run ones are basically commands. You have an issue now because you need your build dependencies for the architecture where you're building, because they have to run at build time. And we got away before with cheating and basically building some of the things that were both link and build dependencies for the back end. And so we can't do that anymore because we're actually cross compiling. If we build the build dependencies for the back end, they really won't run on the front end even if they won't run on the front end. Because it's a different architecture. And so we could use our build dependency trick here, sort of, but we have to modify it a little bit. So here's that same dag from last time. You can do the thing where you basically run the front end compilers for these libraries that are basically dependent on through pure build dependencies. But you have a problem. This build dependency here, it needs to be run in the build environment. And this won't run anymore because now it's for a different architecture. Same down here. This build dependency has a run dependency on this thing. And so this won't run because it's a different architecture. What we do here, or at least what we're planning on doing, this is the new model, is that we would actually duplicate these guys. And so if you have these kinds of dependency relationships, you actually have to make a different version of these two guys to run on the front end. And so we have to actually split the dag. Now you have sort of a violation of our original constraint, which is that this, now you have two versions of five in the same dag and two versions of eight in the same dag. That's not so good, but we can handle it. We have to modify our model a little bit. But at least now we can run, we can actually run on the cross-compiled machines. It's just a little more complicated and there's a few more relaxed constraints that we have to deal with. There's some other interesting constraints that come up in this environment that like this one I wasn't even expecting. There are certain tools like setup tools in Python that actually add code to the installed package. And so you may think that you could just do resolution kind of separately for the front end and the back end. But you actually can't because this setup tools is running in this Python on the front end and it's going to inject code that needs to run in this Python on the back end. Now Python's interpreted, why does it care about architectures? Well, you don't care about architectures, but you do have a constraint that these two pythons now have to be the same version. Otherwise the code that setup tools is going to inject to run back here is going to fail. And so we now have, we have a dag where we have to have two different versions of the same package in it and we have constraints across architectures that run through the dependency problem that we're solving. And so that one actually surprised me. It was pretty cool that we actually have to deal with that. All right. And then the last issue, compiler dependencies. I said that compilers are a special case in SPAC right now and we did that to get it working quickly. That's true. But what we'd really like to do is have a more natural dependency model for our packages where you don't have to remember all the different versions of the compiler that support the level of C++ that you care about. We don't want the packages to have to say, I need GCC 493. I need Intel 17. I need this other compiler at this version. We just want them to say, I need C++ 17. Or I need these features from C++ 17. The other issue is that compilers in our current model can't easily have their own dependencies and actually this comes up. You would be surprised. Some compilers actually depend on other compilers. So if you run the Intel compiler, it depends on GCC because it relies on it for its lib standard C++. And so there's the nastiest version synchronization problem that you've ever seen when you try to build with new versions of the Intel compiler on systems where the standard C++ library is very old. And so you have to actually, you have to tell it to use a newer GCC than the system GCC. And you have to remember to R path that in so that when you actually run the binary, it finds the right lib standard C++. Either that or you have to have the user remember to source all the right files to get the application that they link to run. So what we're planning on doing is factoring the compilers out as actual dependencies. And actually this would allow us to have virtual dependencies for the different language levels. So we could have C++ 11, 17, 14, things like that. OpenMP is another one that I think is really good. I could say we depend on OpenMP 4. And we could have compilers provide both say C++ at different versions and OpenMP at different versions. So suppose we do that. You've got a package here. You've got it depending on the Intel compiler at version 17, which then depends on GCC. This is actually kind of fine. We don't have to do very much special here. And you could actually do this with conditional dependencies. You could just say Intel at 17 depends on a specific version range of GCC. And you wouldn't have to introduce anything too special to this DAG. But suppose that you now want to build against something that's already installed that used an older version of Intel. And this actually comes up because we have so many compilers running around frequently. Now you want to link this against the second library. And that one's built with Intel 16. This is already installed. And that built with GCC at 4.9.3. Well, if you have these version ranges here, you have this Intel compiler depending on a specific version range of GCC. You have this Intel compiler depending on a version range of GCC. It's still not constrained enough that you can sync the standard C++ versions between the compilers and ensure that they're consistent. And so what we have to do here is actually introduce another type of dependency. So what is a compiler? It's a build dependency that imposes some link dependencies on the things that it builds. And so in this case, it's the runtime libraries. It's things like lib standard C++. It's G lib C, although I'm not going to cover that here. It's all the different libraries that are bundled with the compiler that you don't see on the link line. And so what we have to do is add hidden dependencies to the DAG and normalize them like we did originally, where we only have one instance of them for the whole DAG. So the new dependency model looks kind of like this. If I have that DAG from before, now I say, okay, these compilers here, I need to figure out some way to get their standard C++ synced up. So I have them impose a link dependency on the thing that they're building. So now you get these two implicit hidden dependencies from the compiler on lib standard C++. And we force this to have one node for lib standard C++ per DAG. And when those constraints get merged, then you'll get a conflict if the lib standard C++ are not synced between these different compilers. All right. And so this is the model we're planning on going to. I think it's pretty cool because it allows us to actually have virtual dependencies for compiler features and ensure consistency of the runtime libraries and get the R-pads right, which will make a lot of people happy on our clusters. So that's the summary. We're working on constraints for compiler integration and it's not easy. Weird things can happen, surprising things. Cross compiling can affect how your pythons have to be synced between the front end and the back end and also you have to add additional dependencies to the DAG. And we're trying to make this part of SPAC's build configuration. And we're hoping that this will make it really easy to write a single source package that you can compile in lots of different ways on some pretty exotic machines. So I have a bunch of SPAC stickers over there if people want stickers. So come and get them. And that's all for my talk. So the size of the truth of DAGs that you make, how long does concretization take? So right now I think, so it was taking forever because it's some really, oh sorry, I must repeat the question. So the question was for the sizes of DAGs that we're dealing with, how long does concretization take? It takes no more than like five seconds right now for what we've got. It may take longer once we go to a full SAT solver. I'm not sure. We're a little greedy right now in the way that we do the solve. But most of that is Python being slow and not how fast SAT can run. When things go wrong, how is that exposed to the user? I guess the ones I'm most interested in is if the concretization... You can't resolve something. Or similarly, say you ask someone to be built with GCC 4.7 and it can only build 4.8 and that's not being defined in the package because no one's tried that particular combination. What do each of those cases look like? Okay, so the question was how is that exposed to the user when concretization fails and if the package doesn't expose all the right constraints and if the build fails? So the short answer is not well. And so if concretization fails, I wouldn't say that we have the best error messages in the world yet. If it's a simple constraint, you may get the two constraints conflicting that you expect. So it may say this thing depends on GCC and you're trying to build with Intel. Often it takes some thought to figure out exactly what it's trying to say to you, which is not great. So that's something we can improve. For the build failure part, we've tried pretty hard to make it so that a big build will give you intelligent error messages. We stole the magic regular expressions from CTest and so we'll actually go and parse the build log and spit out highlighted error messages with context for what happened, but yeah, it's still an error message. So this is still an improvement over building by hand, which is the bar that we're trying to get above. Yup. What's the time frame of the improvement you were speaking about and would it be compatible with the old? Would it introduce a new version of SPAC and compatible with the old? It would introduce a new version of SPAC. It wouldn't be incompatible with the old one. Oh, sorry. So what's the time frame for all this stuff and will it make SPAC incompatible with old versions of SPAC? We try to stay backward compatible with the stuff that we install so that a new version of SPAC can understand the old database. So you would still be able to see your old packages. It may just rebuild a lot of stuff. Yeah. And so the time frame would be by September is sort of the latest. We're hoping for earlier. Cool. Thanks. So I think my time is up. Thanks.