 Hi, I'm Eric, and I'm here today to talk to you about path agnostic binaries, co-installable libraries, and how to have nice things. If those words don't mean anything to you yet, that's okay. It's because I just made them up. This talk is going to be about introducing some terminology and more generally it's about how software is packaged and how that could be easier. So the first thing I'm going to do is talk a little bit about how packaging is not a solved problem just in case anybody thinks it is. Then in the middle introduce some terminology to help talk about things we could improve. Then I'm going to talk about a bunch of existing systems and package managers with that vocabulary and then last and sort of scattered about there'll be some techniques for like legitimately GCC flags that might make your life better. Some of this is line broken a little strangely because of resolution. All right. So packaging is not a solved problem. We really need the installation of packages to be easier and we really need almost everything about the way we interact with packaging to be easier. There are a couple of big things that point this out to us. One of them I think is the rise of containers in the last couple of years in our industry and this is something I talked about a bit more last year actually and there's a whole talk about that that was wonderfully recorded that you should go see. But one of the big points from that talk was containers gave us the ability to install more than one version of a thing and people liked this surprise. So this is something that we can do now with containers and it's caused a huge popularity of the system. This release of energy and enthusiasm I think indicates that we have room to go with the way we package things and I also want to ask if containers are necessary to do that and I think the answer is no. Containers are something that made it easier for us to install multiple versions of a thing on a machine and that's good. But this form of easier might not be the only form of easier in the world. Containers have some other baggage with them. The other thing I want to talk about that makes me think packaging is not a solved problem is something I've been meditating a lot on the last year. So is that there are a lot of distros in the world and despite the fact that we're all people trying to work together, we're all open source nerds trying to make the world a better place in some way, we have great difficulty sharing things, especially the binaries that we produce from any of our distros. They're basically completely non-portable in any sense of the word. If I have an Ubuntu machine and a WM machine, these have almost the exact same tooling, almost the exact same packages, but can I reliably copy a binary from one to the other? Maybe, yeah, would I bet on it? No. If I have Fedora and CentOS, it's the same thing. They're mostly RPM and YUM and some new acronym lately. Can I copy a binary? No. Even Nix and Geeks, which are two relatively recent Linux distributions and are extremely similar. They use all the same build tools, all the same linking conventions. Can I copy a binary? Absolutely not. This is weird because we're all trying to work together. So there's some some quiet force which is causing us to become vulcanized, like very small communities that are unable to work together, and it's happening without our intention. And I think we need to ask a lot of questions about this. And I think these actually, strangely enough, have some shared root causes. Standards for composable installation is a thing that we don't really have. I don't think we have enough language to talk about what we mean with portability and composability, and I think we should work on that. So here's an attempt, and here are some definitions. I think we should talk about the ability to co-install things, and the definition that I would offer for that is anytime I install one thing, installing a second version of the same thing, should not be any harder than installing the first thing was. And that includes using it, not just having it on disk, but being able to use it. This sounds trivial, but of course nothing's easy. The other term that I want to introduce, again from the title of the talk, is path agnostic, and that means a user of a system, the person who is installing the thing, not the package or not the builder, should be able to decide where it goes. Any binary I have, I should be able to take the folder that that binary is in, use the MV command, and then keep using the binary. This should not be hard. Path agnosticism is also a really nice property, because it quite trivially gives you co-installability. If I have a binary that I can move, then I can take other versions of that binary and put them in any path prefix that I want, and of course it's trivial to install more than one version, right? And if we could do this, I think this would fix a huge source of that tendency towards balkanization that Linux distributions often find themselves in. So there are many ways that you might try to implement path agnosticism, and something that I want to introduce early is things that you can do, and things that you should do, are not necessarily the same thing. So for example, we already talked about containers earlier, and containers, broadly speaking, are a form of cheating. They're a form of change-routes, and this is something that works, but it's something that has a lot of additional baggage with it as well. If we use trutes as a form of packaging, well, trutes don't compose very well, right? I can package precisely one thing in a change-route, and then that's kind of it. I have to package an entire Linux file system, the whole thing, all of the libraries in one big old monolith, and this is problematic for a lot of reasons. It's quite opaque. The tools that I use to do this are going to have a large amount of side effects, and all I'm doing is bundling them in one trute, and this doesn't help me understand, right? It doesn't help me diff. There's a lot of limits there. Another form of path agnosticism that you might be thinking about is setting up some environment variables, like somebody is probably thinking, LD preload. That's a thing you can do, but I think it's very questionable whether we should do it, because this causes lots of wrapper scripts to show up. It also doesn't compose very well, because if you set LD preload or any environment like that that's trying to make things path agnostic, all the child processes inherit that too, and that's probably not what you meant, and this just, it doesn't compose very well. It has side effects you didn't expect. So the kind of path agnosticism I think we should chase is having whatever is in your binary, in your file system, it needs to explain itself. It needs to be context-free without any other environment, and this is kind of the harder one. So for some systems, this is easy. If you're statically linking a binary, you've only got one file, and making a single file thing path agnostic is pretty trivial. It's not looking for anything outside of itself, so you're done. But let's say for some reason or another, we are convinced that we cannot statically link the entire world. So we're going to do some dynamic linking instead. Now if I have more files, things are getting a little more interesting, because if I have like one main binary in a package, let's say, and I have some other files around it, I need them all to be referred to relative to that main binary. If I'm going to keep the property of being able to MV the entire directory around, that's easy, right? No, not really. So let's talk about this a bit more for a second. What happens when you try to do this in practice with dynamic linking in the world as we know it? If I look at how bash is linked on my system right now, this is the readout that I get. LDD, a lot of people might be familiar with this, but if you're not, it's a thing that looks at which dynamic libraries get loaded when you execute this program. So on my system, this is what bash does. These are absolute paths. So right out of the box, we can very quickly see, because there's a slash here, this is not path agnostic. If I move bash or if I move any of these libraries, it's not going to work correctly. So where does this come from? This is kind of a quick primer on how the dynamic loader works for anyone who's not familiar with it already. So these absolute paths come from nothing in the binary itself. Redial F is something that will read the executable headers out of the binary and tell you what it thinks of them. So here it's showing me the same library names, but they're not absolute paths yet. The absolute paths came from somewhere further. For me, they come from this lovely place. This is, of course, another absolute path. So now we finally hit rock bottom. These are all of the further absolute paths that the linker is going to look at when I run bash. So this is how this all came to be. So if we wanted something to be path agnostic, we would want our linker to be able to load these object files from somewhere else, somewhere relative to the binary. Can we do that? Yes, it's just a little arcane. You might want to take a screenshot of this because where do you find those docs? I don't know, they're somewhere. But take my word for it, that's a thing you can do. And this would give you a binary in which, now if you read the headers, you'll see the same requirement for shared libraries and then this new flag appears and Redial F is telling us it's going to look for this library run path, our path, that is relative to the path of the binary. And if I ask LDD what it actually resolves, it will do something relative. So we can have path agnostic dynamic linking. It's not commonly done, but this works. This is a feature that's been in LD in the thing that interprets your binary dynamic links for years, for ages, in every form of LD ever. As far as I know, there are no Linux distros, which use this commonly, but it's absolutely out there. Like, go run LDD on an electron binary if you've got one or three or more on your computer. It does this. So let's consider that whole problem solved. What I haven't talked about yet is how we should actually organize sharing of objects again. So we can have path agnosticism. If we have path agnosticism, we can trivially have co-installability. And now let's talk about raising the bar even further. We want path agnosticism and co-installability and to be able to share things. So this requires us to do a little more organization. And there's more than one way to go about this, so I'm going to introduce more terminology. The word I'd like to use here is splay. This is a word for, like, if you're selling something in a store, you're going to spread out things for display. So here I want to use the word splay to describe the way we spread out any shared objects or dynamic libraries in a bunch of directories in some organized way that we can reference. There are, of course, more ways to do this than I can possibly count, but they can be grouped into some distinct categories. So these are the three major different ways I can imagine you would ever splay out libraries. The first one is what I'm going to call a precise splay. And this is simply when I have some library and I want to know what path I'm going to put it in, and I'm going to hash all the contents of the library and put it in a folder with the name of the hash. I'll probably use a cryptographic hash for this because why wouldn't I? This is probably sounding pretty familiar. We also call this content addressable. This is a nice way of organizing information because it's completely automatic. It's basically immune to conflict. And so this, going back to the reason we're talking about any of this, a precise splay, something content addressable trivially satisfies co-installability. If I have more than one version of a library and I add however many more versions of the library, I will never conflict. So this means I can automate everything with this organization. You have one of these on your computer. It's called Git. We tend to like this for all the same reasons. Because you can insert an unbounded amount of stuff and it never generates a name conflict in itself, this is also like automatically decentralized. Since you're using cryptographic hashes, you also get integrity checking for free. This is just a really good place to be. But it's not the only way you could imagine displaying libraries. So another way that you could go is of course go full manual. Assign names to every file that you need more than one version of. You can do this. But another way of saying manual organization is basically you're always doing conflict resolution. And so I think this is very difficult to say is co-installable. And this is, of course, kind of the norm. If you're thinking that this looks and sounds like my libraries on my system, yeah, it probably does. If you do an LS in slash usr slash lib on your computer, you're going to get tons and tons of sim links like this on most distros. If you're a nicks or a geeks person, of course, you have a very different life. But on most distros, you're going to get this. You're going to get this very manual organization. And so if I was going to install a new version of a library in here, I could give it a separate name using my human brain seconds. There's nothing automatic here. But remember our definition of co-installable explicitly said, not just have the files on my computer, but be able to use them. And these sim links will, at this point, betray us if we have a link which says library name .so.for and then it points to a more precise version. This is no longer co-installable then. If I want to install a different version, I can give it a different name. I can have the file here. But can I use it as easily? No, not without performing active conflict resolution. So this is not co-installable. The other most interesting category of things you can do is what I'm going to call a property-based display. And this is if you calculate some property of the libraries you're going to share and then use that as our index. Whether or not this is co-installable can be an interesting question. So we're going to go over a couple of examples of these in order to try to figure that out. One common form of property-based display you might have seen is anybody who's doing things with Docker images, it's very common to have a shell script which, when you're publishing an image, tags it with the source code hash. And this is something people do because the hash is already there because of git. Thank you, git. And so it's very easy to do. But this doesn't capture a lot of things, right? If I do my build again with a different compiler, that of course is not represented in my git source hash. So that's not covered in my display then. So this, I would say, is, again, not co-installable. If I use a different compiler and I want to install that thing on the computer as the other thing with a different compiler, I have conflict resolution to do. I'm going to skip this slide because I'm running out of time. So what if I got better at this and I came up with a description of a property where I have not just the source code hash, but I have all of the other executables on my path, all of my compilers as part of my property description as well. And this is what the Nix and the Geeks distros do. This is really cool. So this big hash in here includes not just the source code, but all of the other tool chains that were used in building it. This is still distinct from a precise display, however, because that hash is not of a content. This can still get in conflict. This would be equivalent to a precise display if we could assume that all compilers are pure functions and all compilers are deterministic. This is unfortunately just not true. There are some people in the room laughing. Yes, that's a whole nother talk. There is a reproducible builds project and a reproducible builds community out there who is working on this problem and believe me, it's a problem. So if we want to share libraries, we can choose any of these categories of techniques, but if you ask me, please choose precise, it's by far the most correct. So now I want to get all of these properties back together. I want to have path agnostic and I want to have co-install and I want to have shared objects. If we could have some binaries that are path agnostic and we could have a splay of all of their dependencies that is path agnostic, then we could move both of these things around together and they would still be path agnostic and we would still have shared objects and everything would be awesome. But how? So since we just talked about Nix briefly, I want to use Nix as a further example because they do some interesting things. They use our path, much like the thing I mentioned earlier where we can use relative linking, but they don't quite do relative linking. This is what you'll get when you read the ELF headers on a Nix system. There's actually several library paths here you can see and they're joined by colons. This is cool because it's close to co-installable if you're ignoring the whole determinism of compiler's part, but it's also not path agnostic. It still starts with a slash and any time there's a slash in a path, we've kind of lost. Some people say that Nix, in fact, can be installed in any path and that's sort of true. Ken should questions come up here a lot. These paths, as we saw, are literally embedded in the binaries. So if you're going to install one of these binaries from Nix in a different prefix path, if you're going to try to make it path agnostic, you basically have to rewrite this header either by recompiling the whole thing or by using some tool that patches headers. So this is not path agnostic and going all the way back to the concept of auto-bulkenization, this is a fascinating example because the Nix and the Geeks distros are almost the same, except this path on a Geek system is different. It doesn't have slash Nix in the front of it. So despite almost everything about these systems being identical, that slash, it really gets in the way. So in the last 60 seconds, because I'm not talking nearly fast enough, I have a new proposal, what if we compiled binaries with this our path origin, we put all of our libraries relative to the binaries, but these could be Simlinks. They can be the full content or they can be Simlinks. This is actually the exact same thing. You can switch back and forth between these versions of linking and never need to recompile this binary. So this is path agnostic and if you bundle all of the library full text and make a tarball of this, it's also path agnostic. If you need to patch things on a system that is arranged like this, you want to replace the libraries separately from your package manager, go ahead, it's just Simlinks. This is easy. We should try this. We might be able to make tarballs of software which we can distribute and run without needing a distro. If we want to share libraries, we can have a distro, we can have a form of organization which makes it easier to do that, but we wouldn't need it. One of the reasons I think this would be cool is if we wanted to dump a bunch of binaries into a content addressable system for permanent storage and sharing, we could do that. If it's path agnostic, you can mount it anywhere, it'll run. If I wanted to say add a bunch of things to IPFS, I work with IPFS a lot, I could do this. This would work. But only if the compile is path agnostic. This talk was a lot about C-style linking and I'd like to apologize for anyone who doesn't do C things. Imagine this with Python path. Imagine this with anything else. All the same principles apply. This thing I offered at the end is just one possible way of arranging some Simlinks and stuff. You don't have to love that solution. But I'd like to talk about these terms more and I hope that these are useful concepts for exploring how we compose software. Thank you. Actually, no time at all for questions. I am so sorry. But one more quick mention. There will be a hack fest later in the week if anybody wants to talk about documenting what these terms mean, trying to make more concrete manifestos, maybe tools. Let's talk later. Thank you.