 Okay. So, this is a talk about segregated dynamic linking, which is probably not hugely informative if you don't already know what it is, but basically it's about allowing libraries that would otherwise conflict because they have the same surname or different versions to coexist in the same link chain. It's not actually on the slides, which it should be, but I should say the work is sponsored by Valve for me to do in my day job at Collabra. Right. So, the introduction. What this is about is a problem that these days applications and games and so forth are often containerized. They get their libraries from a runtime and they're kind of insulated from the host system. They have access to the hardware in one way or another, but they're basically not interacting with the software on the system. Well, I say that. It's kind of a lie. The libraries mostly come from runtime, but some of them, for the awkward reason of, you know, hardware being hardware and needing drivers, sometimes they still need to come from the host. Notably, Mesa, the libgl drivers, they kind of have to come from the host. There's no way a runtime can know in advance what your drivers are going to look like. So far, so good. Peachy, you might think this is something we can easily handle. Unfortunately, sometimes the host and runtime libraries are incompatible. They might, for example, pick something completely at random. Your GL drivers might be linked against Libstead C++ and the Libstead C++ and the host might be incompatible with the one in your runtime, which is going to cause a huge amount of sadness the first time you actually try and put graphics on the screen because bad things are going to happen when you start talking to the wrong Libstead C++. So that's kind of a description of the problem. This is sort of what it looks like. You have your runtime and library in the runtime. You've got a host. The executable needs that library from the host and you have a dependency. And at this point, it doesn't really matter which side the dependency comes from. What's important is that one or other side of this link chain is not compatible with that particular version of the dependency. It's kind of how the linker in the UNIX world works, which probably a lot of you already know. What we want to know is can we do something about that? What might a solution look like? So we have here is an example, a very sort of hand-wavy, lightweight example of what this looks like. And here you can see a very similar link chain, except that we have somehow, which I will describe in a moment, isolated one side of the dependency from the other. So you have two incompatible versions of the same library in your link chain and the right libraries can see the right versions of the symbols. So hopefully everything works. Everything is peachy. Nobody knows there's another incompatible copy in your link chain somewhere. So let's go on and talk about that. Let's go through our objectives. We want to expose only the library that we want to isolate, the segregated link library. We want its dependencies not exposed to us in any way. So if it has further dependencies, we don't want to see what they are. We don't even want to be aware that they're there. We don't want to make code changes in the application because obviously that's a non-starter. You have an application that someone has shipped in a flat pack or a snap or a steam run time or similar. We don't want to recompile for the same reason. It's a non-starter. We can't ask people to recompile. So it will run on a particular host that misses the point of having a run time in the first place. And of course, no performance hit. It's all right to talk academically about, hey, we've isolated things and we've made technology, but again, if there's a significant performance hit, that's probably going to be a non-starter for people. We need the library that we're isolating to be loaded transparently. We can't be asking users to do this sort of thing all the time because they won't know in advance or even at all that this thing needs to happen. And we want minimal manual intervention. Now, what do we mean by minimal intervention? So in order of preference, our ideal solution would be a completely purely run time isolation mechanism. We'd start up. There'd be some magic on the system. It would all happen. Nobody would have to intervene at all. That's the ideal world scenario. Next scenario, some compilation required, maybe build some sort of shim and tell it what the targets are, but basically automatic, very little actual work for a developer to do. And then finally, there's the manual intervention thing where someone has to go and figure out what the isolation, where the lines need to be drawn in the isolation, do all that work and make it happen. That would be sort of acceptable, but we're hoping for something towards the top end of the list that I've just gone through there. Right. So let's have a look at the various pieces of the puzzle that we're going to have to address to achieve this thing. So private dependencies, libraries should have isolated dependencies. In order to understand how that works, we need to look at how it works today. Normally, all the dependencies in your link chain are in a single link list. I'm slightly simplifying, but those of you who know better can hold your contradictions to the end. And the linker will go along that chain until it finds something, it finds a symbol it thinks you need and then say here you go. So even if we can get another library into this chain, the position matters. And because we've got different libraries with different requirements, the fact that it's a single link list means that we can't just use this mechanism to do what we need to do. So how do we achieve this? It sounds tricky. It sounds like I'm going to have to write my own linker, which, you know, I don't know about you. It's not my dear fun. I really, really don't want to have to do this. So it turns out that the nice developers at the C Library have already provided something which kind of does a big chunk of this work. It's a new library called DLM Open, which is like DL Open, but it can create new namespaces. So you get a new link chain, which is isolated from the existing link chain. And it's more or less workable from about glibc 2.19. That's another one of those little lies I'm going to tell. The basic functionality works, but there are some patches that's still waiting to go upstream that make it work properly the way we fully need it to do. But they've done a lot of the heavy lifting there, which is great because, you know, it means I don't have to. So DLM Open just like DL Open automatically loads dependencies. And libraries are picked from the search path, which is kind of slightly not what we want because remember we want to pick libraries from the host in order to isolate them so that they can be sort of parachuted into the runtime. And the first match wins. So it's kind of doing a bit of the work we want. It's providing the isolated namespace, but we still need to do some work to work around it. And what can you see there, slide-wise. Sorry, there you go. Sorry, my monitor keeps cutting out. Right. So isolated libraries should come from the nonstandard search path, which means we want to divert the searches somehow. And so, linker loads all listed dependencies of a library, but the linker won't reload items that it already sees in the link map, whichever link map it's currently dealing with. You know, if you ask for a library that's already there, it just kind of says, oh, you want this? Here you go. And it gives you the already existing mapping. So by loading libraries in reverse order and using the explicit full path to load them instead of just the short libfoo.so.x, we control exactly what gets into any particular link map. So effectively, we're kind of doing the lookup and the loading manually. But by doing so, we can sort of trick the linker into doing exactly what we want and we can sort of mix and match host and runtime libraries in the new private link chain that we've created with DLM Open. So we've achieved step one. We've loaded a library and we've isolated its symbols completely from the main program. And if you poke around in the memory, you can find them, but you kind of really have to know what you're doing. It's not going to happen by accident, which is great. But now that we've done that work, we need to undo some of that work. Because remember, we want this to be automatic. Some libraries need to see these symbols and they need to do it without us doing any further work. So can we do this? It turns out we can. And in order to do that, we need to understand a little bit about how dynamic library calls work. Now, we're going to dive into a little bit of detail here, but don't worry because you don't need to know it. What we're going to do is walk through a dynamic library call. And what we will establish at the end, having understood it fully, is that we actually don't need to know most of it. We have proved that we don't need to know the details, so we can use a simpler method. So this is what jumping to a foreign function looks like. A foreign function is one in a shared object, a library. That's not the one that you're calling from. And so you've got your program text, which is going to push the args onto the stack. And then you're going to jump into something called the procedure linkage table, which is going to, you're going to jump to a fixed offset. There's one entry in the procedure linkage table for each function. And that is going to look at a relocation record, which is basically a big table of addresses of where things really live that gets fixed up by the linker and jump to it. Now, the first time you do this, it's a little more complicated. What happens is you look up the name of the function, you ask the fixed up code point of the relocation record initially doesn't point at the actual function. It points at a special function which knows to ask the linker for the real address. The linker then searches the link map that it's interested in for the symbol that it's interested in. If it finds it, it writes it into the relocation record that it itself was just called from. And then the fixed up code jumps into the relocation record slot. And then this is what the actual function call looks like. We jump to the function. The function pulls some arguments off the stack. It does whatever boring thing the function in the library is trying to do. Pushes the return value onto the stack. We return to the caller and we pop the return value off the stack. So we learn a few things from this. One of them is quite important is that although the arguments go onto the stack and their layout matters to the function that's being called and the return value comes off the stack and it matters to the calling code, none of the intervening code knows or cares what the signature of the function looks like. As long as the caller is the same, it already knows because the compiler has written it to do that, how to put the arguments onto the stack and then when control returns how to pull them off. All the intervening steps, they just need an address to jump to. That's it. So the first thing we've already established which is great from the point of view of someone writing an isolating shim is that we neither know nor care what the signature of the function looks like which makes life a lot easier. Right, so if we can control the relocation record therefore we can control where the call ends up. If we scribble on the relocation record before the first call, the PLT fix up code will never be invoked. The linker will never resolve the symbol's address. We have total control over where the function call ends up and at no point do we need to know the function signature. So basically, golden, if we can find this relocation record we can control exactly where any given dynamic function call will end up. So the key question is can we find the relocation record? Yes we can, otherwise this would be a very short talk indeed. The link map which is available to you via some C library function has the link to the mapped in ELF library data and LibELF conveniently allows you to interrogate this so you can find the relocation records and we can do what we need to do. So putting those pieces together we can make a shim library with the same SO name as the target so effectively the system thinks it's the same library. If it finds that library first, if you call it if this SO name is libgl.so.1, the library thinks that's libgl. We put it on the search path before it would find a normal standard library and then during the shims initialization we deal them open the real library from within the shim. We make sure and we deal them open all the reverse dependencies to make sure they get into the link map so that the real library itself can actually see the symbols it needs to. As we said before we do this in reverse dependency order so we go all the way back down to libc and then we work our way back up and we search the alternate library path in order to do so so we find the host libraries instead of the runtime libraries that we would normally find. Then and I'm putting packing quite a lot of work into this sentence here we walk the link map and we scribble on the relocation records so that the linker is never actually invoked to find symbols we sort of do all that work beforehand and because the relocation record is its guide to where to find everything it goes oh this work's already been done my work here is done you want to go here and that happens. So that's the brief summary there are of course a few extra details to get something like this working. Some terminology before I go ahead we call an isolated set of libraries a capsule and I say set of libraries because it turns out you can't always isolate a single library occasionally just because of the way they've been written a group of libraries have to be encapsulated together. We assume that they are visible to us on a file system mounted in my examples and in my test code I mount I mount the external host system at slash host because that seems like an uncontroversial name but that's a detail that can easily change. A few other gotchas. DL Open can't be called from inside a private namespace this is a limitation in libc it can be lifted but it requires more patches to go up to libc so we kind of need to since we already know how to redirect symbols we can cheat and we can replace the capsules dlm open with a wrapper that goes actually you kind of want to call dlm open if you call dl open you're going to segfault and that's that's going to be bad and this will be fixed eventually as more patches go upstream to libc but for the moment because we already did all the work to do this I cheat and the other thing so minor detail dlm open doesn't currently support RTLD global the semantics haven't been hashed out again nothing fundamental about that it's just that more work needs to go upstream right other things dlsim now has a split personality if you're outside and you happen to be using dlsim to find your symbols instead of simply linking against the library we need to do a bit of extra work we need to transparently search inside the capsules handle because again remember the outside world has no idea that any of this has happened so if someone looks for a symbol we kind of need to go oh actually that one came from in there we'll hand the right symbol across so dlsim outside the capsule should find the inner symbols but only the symbols from the libraries we want to expose so our shim needs to do a little bit of heavy lifting to do that and again we don't want to find any dependencies from you know say the incompatible library that we're actually trying to hide other gotchas dl open outside the capsule must trigger the relocation record scribbling so if you dl open a new library you've done this after link time new libraries open if we don't realize that that's happened and scribble on the new set of relocation records that came with that library it will find the default library not what we wanted so again we need to notice that's happened so we need to wrap dl dl open with the wrapper that does that so any new library that's open we automatically divert the relevant the interesting relocation records into the capsule uh right this was true the next bit extra problems this was true until three this morning it is no longer true currently each namespace has its own g-lipsy copy and this works I mean my initial expectation was that this would be you know cats and dogs living together total chaos the end of the world turns out it sort of works if you don't pass any memory allocation or free across the boundary so something inside the capsule doesn't allocate memory and expect it to be freed outside and vice versa and you're not using threads it basically works which was a surprise to me but there it is but um yeah there was a deadlock in threads which caused a bizarre symptom that if you called setuid and you had lib p thread loaded your entire program would live lock not even deadlock and just spin an infinite loop inside libc for no readily apparent reason that's fixed now in my patches which are going to go up to upstream immediately after this comp and again I mentioned the alloc free pairing problem um you kind of need to deal with that somehow or avoid it I have this is what I implemented RTLD shared which allows you to say I know I'm creating a private namespace but actually this library needs to be the same in both link chains and there was sort of some infrastructure available to do that in libc already because it uses it to make sure it's the same linker in both namespaces and I've sort of abused it to make it available to callers so that that can be done and for now in order to make my demos work I use my mechanism that I've already described for diverting functions so that both glibcs get their alloc free cluster your realloc your calloc etc to be the same as on both sides of the capsule membrane which seems to do the trick again I didn't think it would work but I tried it and it worked so whoo and then once these patches have gone upstream won't be necessary anymore so finally does it actually work yes otherwise this would be a terrible talk these things have been tried and work you'll notice the star next dungeon defenders that's because at some point in the last six months something not related to the work I'm doing caused it to stop working it was really annoying because it was a good demo because it was a proprietary game that I could demonstrate work with this but unfortunately even without any of my libcapsule work being anywhere near it it crashes in exactly the same way so I mean I guess it works the same way with libcapsule was not but yeah there it is the code is currently available there the source for this talk is available at that link there and I guess we're at the point where I say any questions total silence show of hands who sort of followed that is there anything you want me to go over again yes no we still have about 20 minutes for a question sure don't be shy yeah I didn't 100% follow all that okay but what's the sort of performance overhead of doing the there isn't there isn't one because of the way you're you pay a small small price because my currently very stupid pseudo linker at the beginning which scribbles on the relocation record isn't optimized in any way because I prefer to keep it simple so I can actually see what it's doing so it doesn't do any of the hashing that the real smart linker does but that's a tiny tiny hit when you link library and then after that it's actually faster because the procedure linkage table never gets invoked and so you never search again so you may be familiar with something called railroad linking where so what happens there is that libraries are pre-linked when you load them up at the beginning instead of being delayed until function call all the relocation records are populated initially and it's effectively that case the address is already there so you simply bypass all the dynamic lookup code okay cool thanks so what's the future of this code are you proposing to like I noticed you was saying something about valve yeah about this so it's gonna be in their store yeah I mean basically well it's going to be we're looking at improving the runtime isolation of games so there's a whole bunch of other stuff that's not actually lip capsule but which my colleague Simon Vitti is working on which is to improve the runtime isolation and one of the things that's going to do is assemble all of this to host library trees and the runtime library trees make sure you have the same copy of libc so you can load the newest one because that is one thing you do require you can't load the older libc because otherwise something will say oh you don't have the libc symbols I'm looking for and then once we've proved that it actually improved stability rather than making it worse so there's going to be a lot of banging on games and things with hammers but once we've proved that it works if they say yes that's good we're glad we paid you all that money to do it then it will appearing in esteem runtime near you at some point other things we're looking at is pushing the code up to things like flatpack for the same reason you know your mesa drivers have to come from the host they can't be shipped in a runtime so you know we're going to talk to flatpack and snap or whoever and say look we've got this library isolation technology we think it could help other things it could be used for is just generally in a much simpler case there's a much safer way to load modules because you can load a module expose only the symbols you find in the module and anything else the module links again you don't care you'll never see it so if some if some software uses modules it could use this approach to do that so just to follow up on your last remark about a safer way to load modules I mean at the risk of hype propagation is this like a super lightweight kind of container concept I mean ish there is no actual memory segregation but the mechanism that you use to find your functions you know so the dl sim lookups and the dl info information all of that because it's built into libc will find the symbols from the namespace that you are in okay and then lib capsule what lib capsule allows you to do is just export a little bit of that into the main namespace right okay so I mean yes it is super lightweight in that yeah it's so lightweight you missed some of it yeah nothing would stop you if you knew it was there from going and finding the other set of symbols and using them but you won't bump into them by accident so how is feedback going say with the glibc folks are they happy to accept that yeah they looked at my first set of patches and reviewed them and I got some feedback to say this part's okay that part's wrong because which is good because I'm always a bit nervous about changing the linker and the with good reason yeah it got delayed a bit because there was some drama on the libc list which I'm sure a lot of you remember about a joke which happened just after I landed my first patch set which was you know it was literally seconds after I landed the first patch set on the list and then this message turned up and I thought three, two, one and there it is drama so that delayed things a bit but the second patch set is currently waiting for review and then so that part is the mechanism of doing the RTLD shared thing that I talked about so you can have the same libc and both side in both namespaces and then once that's done there will be another patch set which implements a little bit of policy so that if you want complete isolation you can have it if you ask for it if you want total sharing of a library and all its dependencies you can ask for that too and then the default behavior will be the fundamental libraries the libc cluster libc libp thread libm librt the stuff that really really needs to be the same on both sides of the capsule will automatically be the same and the idea will be that DLM Open will be very very simple for a programmer to use and currently I do quite a lot of heavy lifting in lib capsule to manually do the things that DLM Open should do for you and that is what the patches are intended to do awesome so I remember when you were talking about this earlier you were suggesting you might even have a demo I do have a demo and it might even work who knows let's see is this the stable demo or the whiskey demo this is what we'll do this we'll do the the risky demo first no we'll do the we'll do the stable demo first so can we want to back? there we go so you're going to see a lot of spew here which is because I've set some environment variables I guess you've got to kind of take my word for it that there's no libgl inside this cheroot but you'll see a lot of spew which is lib capsule saying ah you want libgl I've loaded a shim I'm going to do that and then yep so if I do that and that's running inside a cheroot which has no libgl in it and if I go up a bit I should have reduced the amount of debug before doing this demo there we go we can see it's dumped out some information so if you can see it says deal handle capsule that's dumping out the libraries that are in the capsule I don't know whether you can see up there a lot of them are prefixed with host so they've come from outside the cheroot and some of them are prefixed with just slash lib as you'd expect they're normal things I'm using a cheroot as shorthand for a runtime right now and then if we go down a bit further come monitor show me what they're seeing you can see deal handle default there's a separate namespace that's where and you can see some one thing two things begin with slash shim that's where the libraries that are the shim libraries that are using libcapsual come from and then there's the whole bunch of information about where it's finding libraries I'm happy to show that someone like closer up if you're interested that's the and if I'm lucky and nothing has broken since three this morning we will get come on run you were running before it's not running ooh his it's not working okay that was working it was open arena and as obviously sort of a bit of a better test than GLX gears but oh I think do I know what that is let me let me try a couple of things see if I can get that we're working but yeah sad times the dark angel of demos has visited me no joy sorry about that I will probably start working as soon as I've stopped giving this talk okay anything else just wondering where does the word capsule come from if there's already existing word work related them with his name yeah naming things is hard I had a few names for it that I went through they're all collided with other things or were overall quite reasonably for being terrible names and then I was looking for a name and I looked inside the dev unit repositories and I didn't find anything called lib capsule so I thought great I'm okay and it turns out there's another piece of software related to games it's already called lib capsule so yeah if we think of a better name I'm okay with calling it something else for now saves me doing a search and replace across all my code not sure if that question makes sense but you said that your library doesn't have any impact on performance yeah when I load a C program a C++ program which loads a lot of C++ libraries which loads in more other C++ libraries with Antrop in a large lookup table for symbols could your approach actually improve performance by just exposing those symbols that I actually want instead of all of them that are possibly accessible so it depends on how the lookup is done if the lookup is being done dynamically yeah I guess so it might do that the approach that people tend to take these days is to actually just take the hit at the start and force everything to be looked up right at the beginning which is actually something I end up having to work around because I need to unlock all the memory areas go no actually I'm writing my own addresses into these and then do that so yeah potentially it could do that yes okay I guess that's the end of the talk