 Blessing. Blessing. Hello. Can everybody hear me? Yeah. Great. Guys. Everyone, thank you for coming here today. I'm very excited to have you here. I'm here to introduce Will. He'll be presenting on Welled SO and beyond. Going to be a great talk, and I'm really looking forward to seeing this come to life. Bye, guys. All right. Hey, everybody. Am I? Okay. I feel like I'm audible. If you can't hear me, then. Cool. All right. I got somebody in the middle. I guess I'm going to say I want to thank you in advance. Do some circumstances beyond my control. These slides are not finished, and my luggage arrived right before I got here. So I don't have slides, but I do have clean underwear on. So in the end, I think you'll agree that's probably the better choice. So I'm here to talk about Welled.SO and beyond. Welled.SO is a thing that we kind of made up. So I'm to give you a little background. I am a senior software engineer at Red Hat. I've worked on installation and upgrades for 15 years. And so our team has done, has a deep history in doing things that involve image construction and packaging and the sort of weirder use cases of that. The stuff I'm going to talk about is not like super, or it is fairly specific to RPM and the way that we at Red Hat build images and things like that. But I think the larger lessons sort of apply to the entirety of the Linux ecosystem. So, oh yeah, and some quick disclaimers. Nothing that I say here is like, this is software. Red Hat is definitely going to do. A quick show of hands, who in this room works for Red Hat? Okay, for the rest of you, the people who had their hands up, we don't know what number comes after seven. We can't count any higher than that. So this has nothing to do with whatever numbers might or might not come after seven. I'm fairly sure there aren't any. And yeah, I wrote these like two hours ago. So thank you in advance for bearing with me as I sort of ramble at you. So the whole thing I want to say, the sort of promise of this well.so thing is that like I'm pretty sure looking at the way that we construct images. And by images I mean like file system images like containers or virtual machines or even just like doing an initial install on a system. The way that you do that is like, we're using, you know, we're using RPM, we're using packages to do that. And RPM and friends were designed in the late 90s and they made sense at the time, but there's a lot of slack and stuff getting in its own way that makes everything we do harder than it needs to be. And with some tweaking, we could have a system, basically a model for Linux distributions for going from upstream projects releasing sources to code that you're running somewhere that was like insanely fast and reliable and just did everything really easily to the point where I'm pretty sure we could start up a build a container for every process as it starts up using only the things that it actually needs in its file system. So every process would have its own view of the file system. The same way we do with virtual memory where every process at startup, the dynamic linker goes in and links in the libraries that it needs and then actually jumps into your process. We could do the same thing with the file system. And if you think about it, it's kind of weird that we don't do that with the file system. Why is it that every process gives its own private view of memory that the kernel then arbitrates, but we give them all the same view of the disk? It's sort of like it was designed in an era where disks were really slow and there weren't a whole lot of computers and the idea of like doing that was just crazy and they're like we'll get to it later. If we can do that, if we can build images that quickly, milliseconds or less, we can actually do things like OSWide CI and CD which has been sort of troubling for Red Hat to do that sort of thing or in Fedora for that matter where we can't, we don't do nightly builds of Braheat or we do sometimes and sometimes we don't, we don't do nightly installer image builds because we do? We try. We try. Ah. That's kind of my point. I'd love it if we succeeded. That would be cool. But yeah, so I'm pretty sure that we can just tweak a few things and I say tweak a few things but it's actually like a fairly, it's a lot of small system-wide changes that we would need to do to make this sort of a world possible. But it is a lot like the shift from statically linked binaries in the late 80s and early 90s to dynamically linked binaries and we don't like think too hard about dynamically linked binaries as being like a crazy new thing anymore but like there was a time where that was new and controversial and people hated it and there were Solaris admins that were like I will never have dynamically linked binaries on my system ever, you know, fist on the desk. So like if you're getting, if you're getting a feeling in your chest like this is crazy, it'll never work. Why am I, why am I listening to this guy? Like give me a minute, just like clear your head. So okay, what is weld and welder? Just an acronym that we kind of made up for a experimental Linux distribution that we're sort of working on. Like what's it going to be? The meaning of the acronym changes depending on my mood. I think it was originally Wills experimental Linux distribution. At one point it was the Wigum Enterprise Linux distribution because of Ralph Wigum. But yeah, we're basically like figuring out new ways to do Linux sister stuff because the way we do it now is I'm saying it's like, it's gnarly. Everything is a lot harder than it needs to be and we seem to have a lot, we spend a lot of time fighting with ourselves, building new tools to deal with problems inherent in the system rather than fixing the system itself. Everything, like think about how many things, features of the RPM ecosystem if you're familiar with it like comps groups, every time we need to add something we add another layer, extra layer of metadata and another extra layer of code to parse that metadata. And at this point if you're dealing with, I think, modules involved like you have to fetch some YAML which then tells you where to get some, a SQL lite database that tells you where to get the XML, no the XML tells you where the SQL lite database is and then you can start parsing like other, like there's 18 different types of data involved and it's a mess and we could do a lot better. So I already kind of went through this stuff but I think that the heart of the whole problem is that we're stuck on RPM and anybody who has talked to me at all in the past like 10 years is sick of hearing a yell about RPM and I'm sorry, but the problem with it is that we have so much of what we do encoded in it that everything we build is built upon it and we keep building layers and layers on top of it and nobody understands how it works anymore. Like nobody actually knows the entire details of how the dependency system works or what's in the dependency tree or what's in all of the scriptlets and they're like non-deterministic and we don't know what they're doing but we have to run them as root because they might need a root so we have like this enormous mountain of code that we can't introspect on, we have no idea what it's doing and it has to run as root and that seems like a fine to everybody which kind of makes me cry at night but that's whatever. Sorry? Oh, the it's fine part. Well no, everyone just sort of looks at it and it's like I guess I mean I don't know what we can do about it and I'm like we could not do that maybe like just throwing that one out there and yes, so people are working around their problems and one of the things that I think is happening with our big shift to containerization is that and or like go where go by default build statically linked binaries and it's because of the troubles that we have with dependency resolution and things like that where you have your red hat or you have your fedora system or your rel system and you're like okay but I have this thing that needs these libraries and I have this thing that needs these other libraries how do we do that, how do we get two sets of libraries that oh my god the package has the same name but they're different versions they can't possibly coexist in the same system I don't know why, they're just files but okay RPM says they can't and so instead of like addressing that problem we as a community slash industry have been like what if we just went back to statically linking everything that was easier and like it works better but it's not a better idea it just works around that one problem so the whole thing and what I kind of went crazy doing for the past I'm not going to subject you to it but we don't really have a good model for how RPM works like if you look at the Linux kernel people have done formal memory models for how the Linux kernel deals with memory and what it does when it takes locks and all of that stuff to sort of maintain the illusion that everything is safe and reliable it is safe and reliable, asterisk but as long as you trust your hardware haha we don't have anything like that for RPM or packaging in general like we don't have any sets of like assertions like okay a well behaved build system needs to maintain these sorts of invariance or like we can assume that this is true but we can't assume this is true so like we just sort of make it all up as we go along and that's where things get weird like oh we can't install two packages with the same name with different versions because reasons and it's like okay well if we had a model maybe we could deal with that so you know I have a big like crazy model for how like written out mathematical model for how you do a Linux distribution but it actually involves the pieces aren't that difficult so it turns out if you kind of take it all apart and instead of like when you build a system using RPM what you're doing essentially the installer does, the installer is a mini-distribution and you boot a DVD or whatever and it starts this mini-distribution and it used to be like a full-fledged mini-distribution that had its own like mount binary and a knit system and everything that we had to maintain ourselves and then we made it into bonkers so eventually we made it so it's just a small fedora distribution that's very very stripped down that we boot and load into memory and we format your hard drive and then we start installing stuff into it which is kind of bonkers when you consider that most of what people do after that finishes is they remove all the stuff they didn't want or they make a bunch of changes afterward because we don't know what happens in the middle it's all mysteries to us and it's really weird that we take a we take an empty box and we have these little packages and people talk about packages like they're bricks and they just sort of stack up and you make a wall but they're more like tiny little robots with chainsaws and arms on them and you dump enough of them into a little room and they sort of fight it out for a while and then they build Voltron and you're like cool that's neat and we made it all work and that's amazing, like good job us but maybe we could make bricks instead is kind of my whole point like I'm gonna come back to this a lot and I'm gonna use a lot of dumb analogies like that but my point is that as a community and an industry we need to start looking at the places where behavior is not introspectable or not deterministic and start stamping them the hell out because they make everything else it sort of propagates upward if you have parts of the lower levels that need to run unknown code as root well your upper level either has to work around that or just deal with that maybe happening sometime and you can only make certain guarantees about what's going to happen in the middle it's not great so we sort of did an experiment on our team where we're like what if we were doing basically what RPM does what the installer does to build systems instead of actually doing the thing we open the box we throw in all the robots and let them build Voltron we take all of the pieces and we scrape off all of the code from the outside and just make them little bricks and we just put the bricks in place and we do the same for every package just lay down all the contents and then not run any of the scriptlets at all and it works like through a first degree approximation it works it turns out there are some things that actually do need to be dynamically generated but they're really well known I have a different talk about this about what RPM scriptlets actually do there's only like six things and it's like create users you know generate like it's basically sometimes you generate a file of users okay fine sometimes you do things like you create a host key well you don't always want to do that like you only want to do that if you're installing on bare metal if you're making an image for the first time that's going to be a golden asset you're going to replicate everywhere it does not need that machine specific key so we need to look at all of that but that's not what this talk is about the point is you can kind of throw it all away and it all still works so when we did that we built a thing we can put together a system I think it was a and Dave you can confirm this maybe it's a hundred times faster than our current stuff we can build yeah it was so we had an internal team trying to do continuous integration stuff on the kernel and so their whole deal was build a minimal VM, spin it up, do some stuff with it or build a minimal system and then spin it up and do some stuff took them about six minutes to build that image and then they could run their tests and we could build the equivalent image in six seconds and that's all I owe so going back to my yes this is basically dynamic linking is what we're doing there at that point what we're doing is taking in the same way you do with memory you're taking an empty process and you're sort of dumping in the pieces that you need this is the same thing as dynamic linking and we can just sort of borrow a lot of those ideas and make image construction way easier and way faster this is one of the things that Elf got right that RPM gets wrong where RPM is hard to extend like when was the last time we've added a couple of tags now and then we had weak dependencies after ten years of fighting about it we have build requires I don't think we have test requires yet yeah we've been fighting about that one for my entire career at Red Hat so RPM is notoriously hard to extend and also it changes without warning like fun fact there is a specification for RPM technically in the Linux standard base in that it's a de facto specification they wrote down how RPM worked at the time if you implement RPM from that specification it won't do anything we changed how we store file names in RPM headers and didn't even upgrade the increment the version number of the file format we just kind of like change stuff out from other people all the time without telling them it's not great so this is the other thing I'm going to hammer on to us as a community we need to actually start documenting how stuff works and like commit to not breaking it unless we're going to is that ten left okay cool commit to not breaking things without at least warning like incrementing a version number it's not that hard RPM I think has a 32 bit version number so like they could do that a couple of times and we'd be fine but the point here the point that this is dynamic linking is an interesting one because there's a whole lot of fun stuff that would happen if we started treating building images like we treat dynamic linking and I think this is where I run out of slides yep so I can either show you my big outline or I can just hand wave at you and I apologize for the hand waving but here we go so one of the problems that we have with containers is that we ship them around as statically linked blobs they're kind of not statically linked there's a bunch of layers to them in the same way that there are layers to statically linked binaries when you link in stuff you get you know you've got your compression library and all that so you can build them and then they're built and then you don't really know what's in them anymore and this is why we have things like container scanning when there is a CVE of some sort we have to go back and look at all of the stuff that everybody built and figure out which ones have the tainted code and then rebuild all of those if we were doing it dynamically where instead you are the image that you build your container is like an elf binary it's your code and some headers that say okay I need this version of this symbol I need this version of these Python libraries the same sort of symbols that we're using in RPM as dependencies with some tweaking because we want them to be deterministic because as it turns out and I have a talk about this tomorrow I'm pretty sure RPM dependencies are now Turing complete and you can use them to run arbitrary calculations which is not like the best thing that you want out of a dependency system it's kind of just like a fun party trick more than it is actually worrisome but it is like a thing to talk about but anyway the point is if you have a reasonable dependency system like elfs you can pull in just the pieces that you need at process startup time as I understand it most container runtimes don't share memory when they build, or at least with like the thin pool stuff when you're building and somebody correct me if I'm wrong, when each container has its own block device backing its image which means that the block device is different for each of them which means that if you have a thousand containers using a thousand of the same copy of OpenSSL you have a thousand copies of it in memory and on disk which is like it's not always like that okay good, so there is some improvement but the last I heard it was like that it was sort of well you need a block device so I am sort of stomping for a fairly significant change in how we expect a system to behave we should expect them to behave more like we expect memory to behave the expectation that you should be able to write to any part of your disk should be like silly because like we don't expect you to be able to write to any part of memory that's obviously silly the expectation or the expectation that you can write to the disk and other processes will be able to read it by default that's also kind of silly we don't always want that that leads to a lot of fun problems this is where we get temp directory attacks where you have well known file names and why we have to have temp trajectories at all is because there is a well known path that shares the same file system space well then you have problems we can eliminate that entire class of problems container like container scanning we don't need to do that anymore if we are automatically creating stuff we can eliminate directory name attacks by dynamically creating your file system just for you so that other people can't actually look at your file system so that's the well.so concept is basically let's look at how we put together our dependency chains let's try to winnow away all the parts of rpm that are not deterministic let's try to make dependencies themselves sane and get to a place where we can just sort of mash the package content all together when we need it and to do that effectively back in when dynamic linking was invented you needed mmap to make it work because just actually copying the whole bunch of memory into place that takes a while so then we're like hey we've got mmap mmap if you don't know how that works it's basically you say hey you tell the kernel put this library into this memory space and it goes cool okay whether or not it's actually in memory yet you don't have to care the kernel put it there when it needs it we could do the same thing with files we could be doing this rather than when you start up a container rather than building the entire container then we give you an empty namespace and when you try to look up stuff then we start putting stuff into it in fact we already have the capability to do this in the kernel it's just bind mounts so we need to do some stuff with rearranging paths to make this work but instead of actually decompressing a whole bunch of rpms and copying all the contents in you could just bind mount the contents of each thing that you need into your private space for your process this should take milliseconds it requires us to do some janky stuff with like where you look the paths that you look for libraries but we can do that we have control over the entire system we know how to do all of these things so the point that I'm making here broadly is that all of the things that we need to do to build a system where everything is reliable and deterministic shared in a way that isn't like it is now we have control over all of it we can just do this I just need people to buy in on the idea of making a somewhat radical shift in how we put things together and I don't know that that meets with a lot of resistance most of the time what I get is what I call the MacGyver problem which is sort of like this if you've ever watched MacGyver MacGyver's super dude with ingenuity and using things in unpredictable ways so a lot of times if you go back and watch an episode of MacGyver the entire thing hinges on him unlike somebody needs to get a piece of information to somebody else or else the ambassador is going to explode or whatever it is but you have to tell the ambassador not to say whatever it is but if you have cell phones the entire plot just falls apart you're just like hey ambassador don't do that it's MacGyver so the problem is if I show up in an episode of MacGyver and everyone's like what are we going to do the ambassador is going to explode and I'm like just call him on a cell phone everyone's going to look at me like what and then I'll be like okay so all you need to do is build a worldwide network of radio towers and then invent pocket super computers and teach them to talk to the radio towers and then you can just call him on his pocket super computer and they're like we're going to go with MacGyver's plan we're going to be loading now and it doesn't mean that like building this network is a bad idea but it means that it doesn't solve the problems that people are immediately facing and I think this is the other thing that we as a community and an industry have been doing is MacGyvering the hell out for years and not really looking at the larger problem of what it is we're trying to accomplish and like the larger thing about what we're trying to accomplish when we do image creation and what we're doing with containers is that we're trying to sort of do what we do with memory already to the disk because we have a lot of stuff like interpreted languages like Python that want to in the same way that C does where it's like okay I might need this library so make sure it's available for me and the kernel will put it in your memory space if it needs it we can do the same thing with the file system it just makes sense to do it that way but that doesn't, it's the MacGyver problem like it makes sense to do it that way but it would require systemic changes to everything we do huge changes, little changes but it requires like system wide changes and like you know I can do parts of this like how you store everything on disk you want to do it sort of OS tree style in a content addressable store so you get automatic deduplications so it's efficient and all that good stuff we have all of the pieces that we want to build this sort of a system we haven't put them all together yet and really it's just about getting everybody to get the idea in their head so that's sort of what this is going to be about it was going to have a lot more slides and the weld.so thing is basically once we get to that point, once we have made these changes we could have a system where your program, your thing maybe it has an elf header on it and it actually calls out to weld.so which then dynamically constructs an entire file system for your thing and it dynamically constructs memory space and then it runs, like we can build all of this all of the pieces are already there and that's about all I wanted to say I think about time, so yeah are there any questions about any of this, or do you just want to hear me rant about RPM more because I can do that all day so is there anyone doing this today, any of the other distributions any other OS's no, as far as I can tell no and I have a theory on this but no, I've asked around to anybody to wherever I could find it because of it, I've seen parts of like the building of packages more deterministic you see a lot of that in NixOS and like doing atomic updating and quick generation of images you see from OS Tree but they're still sort of constrained by RPM or their current system and so it takes a large like it takes an industry-wide effort essentially which is how elf got implemented in the first place in dynamic linking because the industry was a lot smaller then and you could kind of throw stuff around a lot more easily and there aren't a whole lot of people, companies making operating systems left and so there's a lot of MacGyvering and not a lot of like hey we should all work together to do this massive system wide change for the good of the industry. Containers sort of showed up because they scratched a niche but I haven't seen any I haven't seen the larger effort to attack the larger problem that I'm trying to describe and I'm sorry I should have repeated with the question. The question was is anybody working on this yet and yeah I've asked around I haven't seen it yet. Have you uh, have you like given any thought to what a migration path would look like? I mean hypothetically you get all this working and you say let's like get from here to there because it sounds kind of disruptive. Yeah and that's a big that's a big one and I think the way that we do that is not as hard as we might think in that a well-designed a well-designed system that adheres to a good model of how we want distribution to work would by sort of default like eliminating some of the gnarlier parts of what we have now it'd be compatible with it in the abstract so like what we did with our experimental image builder was we just imported RPMs. We strip out the parts that would not be allowed in our system but we just import content directly from RPMs and I think we also extended it to work with NPM modules. Is that right? Okay. Okay yeah so yeah the idea is that you can any model that is sufficient to make this work is also sufficient to or you can probably wedge existing things into it and so the plan is to sort of bit by bit look at how the whole system works next distributions are a loop basically right you have sources and then you make builds and you put a bunch of builds together to get an image and then the trick is oh when you did that build of a when you did that build that was inside a build environment which is an image so you've got this loop going of source to build to images thing so you can take pieces of that loop and replace them one at a time with something that takes the same input and has the same outputs but maybe is different in the middle and then you can start cutting out pieces like spec files for instance oh boy spec files there's like four different turing complete languages fighting for dominance in there and it's horrifying but if you had something that was you know data that you could compile into a spec file well now you've got something else that you can write that's a little more reasonable and we can still plug it into our existing stuff and then we'll wait what if instead of doing an rpm build and then putting the rpm content into our weird content store why don't we just build straight from this thing into the content store and we're hopping over rpm at that point so part of it is looking at the bigger model identifying each piece of the system and figuring out which ones we can most easily replace with something that's compatible but better which I know is an abstract hand-webial answer but I hope that makes sense it seems feasible all right cool you may actually have touched on my question already which was that I've first I want to say great you've said it way better than I've been ranting for the last 15 years right in working also with container builds there were we're doing this thing most of the time where we're building rpms and then running inside containers which I need air sick bags right but one of the things in looking at maybe ways to address that specifically was to take the rpm system and break it in half and say alright here's a build piece which creates artifacts and then we take them and we put them into an rpm or we put them into a container image you may have more sophisticated ways of thinking about that but that seemed to be to be a relatively simple way of reusing one bit but then I got to your then I got to your point of rpms no one knows how they work so part is hard but sure we do want to maintain backwards compatibility so it's not so bad to be like I think you're I think that's the right instinct there is to say all right we're going to have something that could do and do things a new way but also builds the old way if you want it to we didn't get as far back in the sort of process as looking at the spec side of it I mean I have ideas about it but that wasn't what we originally started attacking we started attacking sort of rpm as a storage medium and the dependency resolution and image construction start part of it I'm concerned with problems yeah exactly and yes I also have concerns with that and yes let's talk the the efficiency in the is mind boggling in containers and it's a good solution but lots you know we just keep getting more memory faster computers bigger would sound like there's a good security viewpoint from yours that seems like a maybe the biggest selling point if that's the right direction yeah it depends on who I'm talking to but yeah there's some interesting stuff about that I mean if you think about all of the memory protection in elf over the years right like you could do an equivalent to address space layer randomization file system layout randomization where you have marked in your in your container every place where you call say bin bash and just like we do when you start up an elf process we go through and relocate all of those so instead of being bin bash it's some randomized path and when we link in bin bash into your image we put it at that randomized path so even you don't know where bin bash is so like attacker can never run a shell because they don't know where it is and you don't know where it is all of the protections that we have for memory we can apply equally to the file system which is really interesting when you talk about interpreted languages that need the file system to get their libraries it's sort of a wonder we didn't do it before in from a from my point of view but yeah I think for a certain crowd of folks that's a very interesting I have a friend who works for the DoD who is very interested in this very problem so yeah I think that's something worth exploring further and yeah again this is a general thing any ideas that this makes you have about what you could do with a system like this I'd love to hear because I know the parts that I care about having done like installs and upgrades for like way too many years but I want to hear about the other stuff like I have vague notions about security things but I don't know what specific classes of problems it would eliminate for your stuff so please come talk to me let's figure this stuff out anything else we have time for one last question so yeah you brought up bash and actually this is where I'm trying to write my head around this so if all processes only had access to I guess the files that they sort of owned sort of how memory does how would like an interactive bash shall work like I teach an intro like the command line course right and so they learn about CD and all that kind of stuff so how would something like that exist in the environment you're thinking of in what sense like how would the what part of that would be tricky would you get a bash that could go to wherever in the directory hierarchy that wanted so your bash process is going to have its own memory view so yeah you're going to want to have like we do with the sort of file system containers like there isn't you're going to need a set of tools and this is what I'm alluding to you're going to need a set of tools that look at the sort of your system is going to have a root content store that contains all of the possible packages everything on your system is going to need you're going to want something that's your sort of hypervisor log in whatever a standard workstation type shell and that one's going to have been bashing there in the normal place your usual this is how you log in so like that that contained image is going to be a standard whatever but all of your other ones can be funkier if they want to so when you log into your your shoe that your shell sort of defines as what things should be in its file system but it's not going to get everything for everybody on the system right like if you don't use PHP at the command line you're not going to have PHP in there there's going to be another set of tools that you use to look at the global or not global but your systems content store to say oh okay I do have this copy of PHP here or whatever so yes yeah exactly or like it's a get or get style content store it's just content you have a package and your packages are all in some big heap and when you start a process you pull the right ones out of the heap and you run them and your login can be in that heap but it's not going to have everything in the heap unless you really want it to but why would you do that and there's collisions and stuff you can't really do that but anyway yeah no that's a it is an interesting point because it sort of disrupts the idea of logging into the system because there isn't a system there is each thing gets its own view of the larger whole of the components that are used by all of them but there isn't one canonical thing that unless you're directly looking at the store itself did that make sense okay cool and I guess that's all I got time for thank you all very much I really appreciate it our next talk is going to start in five minutes so stick around you need to use your own slides or do you want to okay well thank you I need I need a VT I need a VT I need a VT I need a VT sorry go cool I'll be right back okay so you just want to make sure that all the I I I I I I I I I I I I I I I I I I I I I I I I I I I