 Wow, so many people, this is quite intimidating. Switch off your phones. Hi, I'm Tom. This is Lars. We work on OS build and today we're going to talk about what we do and why. So we figured that, sort of a starting point is that we thought that the tools that we have available to build operating systems are not as good as they could be. So compared to the software development in general over the last 20, 30 years, whatever, it's like we have gotten a lot of cool new tools like source control and so on that has really improved how we are able to develop things, how the speed that we can do it and the sort of the quality that we're able to produce. I think we think that building operating systems are sort of lagged behind. So we are still very much ad hoc and either the tools are really, really old. They're reusing old things that don't really fit together or people making new things. They do an ad hoc solution for exactly the new thing that they want to build. But we don't feel that was really a sort of general purpose toolkit for making the operating system of the future in a way or playing around to figure out how you want to do things. So we basically figured out there's some principles that we want to, some of the tools that we want to have to follow and we'll talk about them first. Three things. We want them to be extendable, comprehensible and for the process to be reproducible. So feel free to interrupt by the way. Cool. So firstly extendable. What does it mean? So we want to be able to, like once we have a tool to build an operating system, firstly it's not enough to be able to build just the operating system they have today. We want to be, you know, move toward the future. We want to build Fedora 40 and Rail 15. So we want to make the tools that can do that. But it's not all also, it's not really enough to have a tool that can just build a brand, the shiny new future, long time from now if it can't handle the thing we have today. So we basically, we need something that can both do the past stuff, the current stuff and the future stuff and can move gradually between them. So we need some way of having our tools modifiable in an easy way. Right. And one of the big points I always say, and this doesn't really fit in extendable but it's like experimentable, happy to experiment. So right now it's very hard if you, you know, have like some way to build an operating system or to make an image, let's say, to like just switch out one of the parts, right? Like, oh, I want to try like a different bootloader because so much depends on it and it's very hard to, you need to understand the full second. I think that's a big thing we're trying to solve here. I just want to add that. And the next thing, comprehensible. So we, we want the process of building operating systems to be as easily understandable as building software. Like we have a make file and it's verbose and it's, you know, a bit complicated, but at least you can figure out the steps if you look into it and you spend some time. So we want the process that we have, like we have an input to our tools that describe the process that will happen to produce the operating system and we want that to cover all the things that we do. So there should be nothing that our tool does that is random, for instance. You don't put any, like you don't have any randomness that ends up in the image that we just make up in the tool, it should all be specified up front. So if you need some randomness then you need to provide it to our tool. Or if there's some policies, for instance, like you don't want to encode the policies in the tool because then in the future you want to make a new thing with a different policy, you have to change the tool itself. So it doesn't really make it very easy to experiment. So that means that the input to our tools is going to be very, very verbose and not the most easiest to quickly read, but it will cover all of the things. So a very nice example that we always bring up is the kickstart that we currently use a lot inside Red Hat and Fedora. You have this command called auto-part. Amazing thing, it's very obvious what it does. It automatically partitions your disk. But what does that mean? Like, what does it mean to do automatically? Well, it depends, right? So depending on whether you're on Red Hat or Fedora or which version you are, it does automatically different things. So just looking at the input, the kickstart, you don't really know what happens. And you have to look at the source code and then you need to know the version that you're looking at and so on. So we don't want to do that. We want to, like, if you want to do partitioning, you need to specify exactly all the things that it needs, which probably means that you want some tool to generate these things. But at least the lower-level tool that builds a thing, it doesn't do any sort of policy. And lastly, reproducible. This word conjures up lots of ideas in different people. And those people are very excited about reproducible builds and reproducible software for theoretical reasons. Or, yeah, it's very elegant and cool. But we don't really care about that. We are much more on a practical level. So that if you say that I want to make a suggestion for how we build Fedora. And on my machine, I try it out the current way using our tool. If you use that tool to build Fedora, I try out the current way and I make some change and I try it in the new way and now I see that, oh, this works. It's called the test pass and this is what I want. But if I'm not confident that when I hand this change off to release engineering of Fedora, it has the same effect, then I can't really, you know, it's like, it doesn't really help much. I cannot really do development. I cannot really contribute to Fedora if I'm not confident that what I do on my machine is the same thing I'm going to end up in release engineering. And currently, the tools we have now, it really isn't the case. Because the tools, when you run them, it really depends on the environment you run them in. So you must make sure that you exactly mimic what's going to happen in your setup and so on in order for you to get the same results. And even then, it's not really clear exactly which things matter and which things don't. So one of the main things we want to make sure is that wherever you have an input to our tool, the operating system image that it produces is always the same. And of course, the same, what does it mean to be the same? We are not really interested in, like, bit-for-bit reproducibility would be cool. It would be amazing. But that's just not what we have. We can't do that. The tools that we have, in general, are not reproducible. And they're not really aiming for it. We don't really need that. What we really need is that if you have produced two images with the same input, they are functionally equivalent. So we want to say something like functional reproducibility, behave the same. You cannot detect the difference between them. So that's the bar we are aiming for. And of course, this is not mathematically well-defined exactly what that means. But that's sort of what we are, the aim we are going for. Cool. And let's talk about implementation. So those were the basic, the ideas behind the thing that we are making. And then let's talk about how we did that. So the tool we said earlier already, I think, I hope, OSBuild is what we're working on. And the build manifests. So the input that we pass to our tool, you call a pipeline. It's a JSON document. And it describes each of the steps that you want to do. And we say that we have a set of stages that each take a file system tree and modify it in some way. So it starts off with an empty tree. And then we run a DNF stage on that tree. So that populates it with RPMs. Typically. I mean, you can, the whole point there is this should be something for experimentation. So you can then do things differently. Or you can do a git checkout here. You can install RPMs directly. You can do whatever you want. But this is the typical way that we are doing it at the moment. So you fill a tree with RPMs. And then maybe you want to change the host name. You install some grub config. You set up the users. You do maybe enable or disable some system units. You put in the FS tab. You configure the firewall. And maybe you want to drop in some configuration that's going to happen on the first boot. Like your Ansible playbooks or whatever else you want. You haven't implemented this. This is why it's gray. We haven't implemented this. But this is like the idea of it. And finally, of course, a Selenix thing we all love. We apply all the labels to the file system. So now what you have produced here after all of the stages is just a file system. But we want that our image should contain this file system. But you haven't actually made an image yet. And this, for the people who are used to building images, this is sort of the other way around for how it's usually done. So usually you first set up an image and then you fill it with stuff and then you finish. But we first create a file system. And each of these things, we can also cache. Like we can reuse them or we can argue about them and have tests about them to make sure. Just talking about the files. When you finish that, we have what we call an assembler. And we just take the file system tree and then put it into some image format. So to be a QCAP2, that's what we often do. Or all the things. And the point here is that we want to make sure that each of these things have a very distinct responsibility. And they don't overlap. And they don't depend on each other in unpredictable ways. So for instance, the assembler should always put only the file system into the image. When you mount it and you compare what was the input image and what's the thing that you put in there. What was the input tree? And what's the content of it? Image should always be exactly the same. That's the big picture. Go. Do we even talk about how they are separated from each other? I would love to. Well, then go ahead, please. So many of the tools that we have, all of the tools that we have, they are basically built for modifying a running system. Like DNF is made for installing and updating packages on the system that you have. And Grubb is made for installing the bootloader on your running laptop. Which is not what we do at all. So these things don't really, and so they make some assumptions that don't really hold in our context. And this is a problem for the current, that Lorax and Anaconda installed at the moment. And all these things, we all face the same problem here. That how do you make sure that DNF and Grubb and so on don't get confused by the fact that the machine that you're running on is not the target machine that you're installing for. So typically what will happen is that they will maybe pull in the kernel command line from your current host. They will look into Etsy to find some configuration. What repositories do you have? And then they put that into the, they use that to get packages to put it into the target. And what you of course want to do, that we want to make sure that our tool is completely independent from the host. So if I want to build RHEL 8.1, and I'm running a Fedora machine, of course I should be able to do that. Currently the tools we have, some of the tools we have, you need to have the same operating system on your host as you're building for. Which is a bit odd, that's how we generally make software, and we don't avoid that. So principle here is we want to make sure that each stage is separate from the host and from each other. So there's no communication between the stages. So they operate on a shared tree one by one. But all of that, you don't pass any configuration between them. So if you want this to be, for instance, maybe the Grubb stage needs some stuff about the file system, as does the FS-tab stage, as does the thing you finally make on the image, then you have to pass the same information in configuration to all of them, in order to be very strict that there can be no communication, no leakage of information between them. And we run each of these things, we run them in N-spawn containers, or the point is to have a container to make sure that this is really the case. Anything I forgot to say? Wait, am I just a flower? What? I'm just a flower. Exactly. You're very good at it. Right. So let me think. So this is how it actually looks like. Just a decent document. We have a list of stages, and then we have one assembly at the end. And we have named them just in reverse domain notation, so we make sure that the names are unique. So we must make sure now that once we have made such a pipeline, then we said, well, it's reproducible. We said also in the future, if we're going to go and build tool made 10 years from now, hopefully, then it will still make the same image. So we cannot change the behavior in any of these things. So that's why we make sure now we have or go and build DNF, and if in the future we make a mistake, then we want DNF to behave in a different way, hopefully we made it low level enough that that's never the case, because it's just really doing what DNF does. But if we make a mistake, we must make then a different DNF 2 or whatever. That's fine. That's why we have named them in such a way that we don't have to make any mistakes. So another thing, another little quirk that people working on these things are probably very used to is that, well, DNF, having a DNF state, we're using the tool as one thing, but of course, DNF may change its behavior over time. So one thing we always make sure, and maybe we'll limit this, is if we save the information about the build environment so that we can reproduce the build closely as possible. And what we wanted to say is that DNF can only manifest. Nothing else can possibly affect the outcome of the image, which means that if DNF, the version of DNF you're running, could affect the outcome, then you better encode that somehow in the manifest. So that's why we introduced this notion of a build pipeline. So it's a sub-pipeline. So this is a set of stages and an assembler, and before that, you have exactly the same thing again. You can have any pipeline you want in this build instruction. Typically, you only have a package that we do, and what it does, it installs the packages that you need later on in the file system tree, and then you run and spawn when you're in the container. This is then the root file system of your container. So you can properly specify all the tools that you have. Right, and that's those built. Those are the principles, and I hope you've given you some idea of how it works, and now to applications. How do we use it? So currently, the first program that we are working on now, which is something deliverable to people, is Image Builder, and the target, the aim of Image Builder is to be able to build images for cloud providers. So both Fedora and Rel, we want to be able to generate what's it called? We want to generate specific images for the specific cloud providers. If you want to have Fedora31 running in Azure or in Amazon, then we have Image Builder allowing you to do that. And here you have then a sample running on Fedora. So if you say that you want to have a specific version of Fedora running in a specific cloud provider for some specific use case, you can configure some tool print. But you can specify how you also customize your image. And you can say this, for instance, should be an HTTP server. And then our tool using OS build as the back end will produce an image with Apache installed and set up in the correct way with the target cloud provider that you want. We can also do the same for download the image and you can run it locally in VMware and Spawn or whatever you want. And so on. May I? Yeah. The whole point where we're using something like OS build here is that here we kind of need to take a this is what it means to be a Fedora31 image and this is the kind of customizations you want to do on top of it, right? Like I want to install some packages or you said something as well, but yeah add SSH keys or something. And how we're doing this now typically is we take a base image and then boot it somewhere and then do all of the changes and then all of the customizations that I just talked about and then, wow, this is my gold image now. But now we've booted that and you want to replicate that. It's kind of not a very clean process I would say because once you've booted it once it's a little bit weird because you need to take some stuff away again like the machine ID for example or a random seed or stuff like this. So the idea here is that we have like an OS build pipeline that generates the image without ever booting it but having all of these customizations already baked in. I think that's a very important point that we certainly believe strongly in is that booting is instantiation that you can never boot something and then instantiate it afterwards several times because lots of our tools assume that on the first boot you set up some things that are unique to that instance and then you reuse. And the most, I mean the random seed is one thing but the machine ID is very important that we have made that mistake has happened in the past is that you have ended up with lots of instances with the same machine ID on them and that's lots of software that's never going to happen so we absolutely won't avoid that so no image that we ever produced has been booted before you give it to the customer to instantiate it. Yeah and it also makes it much easier like oh I have this OS build pipeline and it's reproducible and I'm like ah I have a problem with this image what's going on here, this is what I built here Tom please try it out, can you do it and he has like a way to completely reproduce what I did without having to ask me about all the steps. And I think that concludes our talk. If you have any questions say three, oh so many questions Spisak you're first. Yes Oh sorry, the question was if you want to build an image with multiple partitions how is the assembly of that done? And if I go back to this picture here is that now, so say that you want to have boot and the router first on separate partitions so like there's two things you need to make sure of that, right? You need to tell grub about it in the configuration and you need to tell fstab about it but that's just the content of the partitions and the configuration on the image that reflects that it's the last separate partitions. And then finally when you make the KUKA so what we do there is this is quite a lot of steps goes into making the KUKA and we are thinking about how to separate these up more to look more like separate stages but basically what happens here is you make a file with the image file and then you apply the fdisk on it to make the partitions and you provide the configuration about how the partitions should look like and so on so you would just partition this as you would normally do a disk and then you copy the stuff over and then you wrap it up in a KUKA Did I answer the question? Yes, so you're parsing the same information that is in fstab and using that you mount all of the, you partition your disk and you mount all of the sub volumes and then copy everything over. Yes we do. Please go ahead. Good question because I have a nice answer. So we have a tool, we feed it an image and then we mount the image, we look at the image and then we go through all of the files on the thing and then we do the checks of each of the files we check all the SA Linux labels and all of that stuff and then we also check more high-level stuff like the RPMs that are installed the users that are installed and so on and we spit out a huge report basically saying what's in the image and then we run the things twice and you can then see what changes and basically the things that we see that changes is that your InitRD is different every time because it's not the way that we produce InitRD is not reproducible and your RPM and DNF databases are different because the database format is just not stable so the questions that we are producing are exactly the same. We don't ship it but it's in the Git repository and it's called ImageInfo it's in one of our Git repositories. We might want to move that up. Yeah, sure. So the question is how does it compare to something like Packer? Lars, would you like to talk about that? Sure, well Packer works in that way that I explained a little bit before where you boot up an instance somewhere and then you do all of the modifications and you save that as your gold image I think they often say and we think that's the wrong way around we think you cannot boot an image and then make that into a gold image and copy it to make many instances out of it so that's the main difference between the two. What? Yeah, we do this all the time Right, we do this all the time there is no command line switch right now Oh, sorry The question was if you can basically interrupt the process at any time and hash the result of one of the sages Yeah, we do this all the time but what we do right now is we just give it a shorter JSON document that's exactly the same because we hashed that document and we know that we already saw that but there's no command line switch right now but it's very easy to add Yeah, exactly I very often just start off here if I want to see what my changes in this stage are I just take everything off here next time because it uploads downloads all the packages and installs them and stuff We're here Sorry, which one? I don't know off the top of my head I know we looked into it at some point but it's a little bit too far Please come after the talk and we look into it again Yeah, as you see most of them are called even after some of the tools that we have. Sorry, the question was do we rely on tools that already exist or have our own implementations we rely on tools that exist already that's why they're called like this the only thing is we have little wrapper scripts around them so that they can run in that kind of confined environment that we have but yeah, we do call grub for the system G1 we do call system control so we just reuse what's already there And the biggest challenge in writing these stages is we can just call the tools but the problem is usually as we talked about earlier is that the tools think that they are running on the system that they are installing on so we just have to make sure that they don't get confused and that we try as much as possible to reuse existing tools Thank you so much Thank you very much