 So hi, I'm Andrew and today I'm going to be talking about the core OS boot process And I call this a journey through time and space because we're going to be walking through both from the Start of when you boot to up to a running system, but also we're going to be walking through where in the disk image We actually like pull various bits of the boot process from so A little bit about me I Used to work for a core OS the company and as they're worked on mostly container Linux Ignition and the Fedora core OS config transpiler and Then we got acquired by Red Hat and now I work in addition to maintaining container Linux also on the Fedora core OS equivalents that's Fedora core OS also still ignition and FCCT which is the Fedora core OS config transpiler And recently I've been working a lot on setting up how we do our boot loading and so that's what I'm going to talk about today So in this talk we're going to walk through the boot process and as an example We're going to walk through the boot process where we configure a the slash bar to be on a separate partition And we'll set up a system D mount unit to mount that on every boot and On Fedora core OS the first boot is where we do all of our configuration. So booting and configuration are inherently tied together So a quick note. This does not apply to rel core OS The boot process for Fedora core OS and rel core OS are very similar, but they are not the same So do not take this talk as applying a hundred percent to rel core OS I'm going to start off by going over a little background about Fedora core OS if you were at Benjamin and Jonathan's talk yesterday. That'll be entirely review Talk a little bit about ignition OS tree and then before getting into how Fedora core OS boots I'll talk some about What a like normal Linux boot process looks like for comparison Then when we actually get into how Fedora core OS boots, I'll talk about how grub gets loaded What's actually in that grub config and then the meat of the talk is about what goes on into the in-at-Rama fest because that's where we Do all of our configuration and then finally like What goes on in the real route and what's different on the first boot versus subsequent boots? And then what we still have yet to implement for the boot process and where we need help there So for our core OS, this is like super high level basics it came out of Container Linux and atomic host and it's only for running containers. Don't try to use it as a desktop OS. It's not designed for that It's it follows the like immutable infrastructure idea where You set it up on first boot you do all of your provisioning and then if you want to You should instead reprovision with that change. So all of your configuration is Set up once and then you don't touch it It has automatic updates that are atomic So because you're running everything in containers, you shouldn't really care as much what's on the host So we should be able to update that without breaking your workload It uses OS tree for Handling these atomic updates and it uses ignition for the configuration and we'll talk a little bit more about both of those and then finally it's set up so that your install is just a DD2 disk and then Injection of a config into your boot partition And the advantage of this is it makes your bare metal case exactly like your cloud cases So on clouds you'll get instead of injecting your configuration into boot You'll grab it from like your ec2 metadata endpoint but in both cases you're starting with the same image and The boot process is the same your configuration is the same And so there's nothing special about bare metal in that case So this is a diagram of a Fedora 4 less disk This is the same on clouds and bare metal. It's eight gigs in size And don't worry if this is a little overwhelming. We'll return to this and walk through everything in it But you basically have it's a GPT disk You have your MBR and if you have a GPT disk all your MBR says is There is one partition and it's GPT Then you have your actual GPT partition table And on this disk we have four partitions. We have boot your boot slash EFI or the EFI system partition BIOS boot which is just used to hold grub for when you're doing booting on BIOS and Then root where all of your actual content is and then finally at the end of the disk There's a GPT backup header and unlike MBR GPT has a backup so in case your Initial GPT header gets corrupted or deleted or something you can recover your partition table and It's worth calling out that Normally on a UEFI booting system. You'll have things like your kernel and And at ramfs and grub config in your EFI system partition That is not the case on Fedora CoreOS. The only things in our EFI system partition are your EFI executables and a small shim grub config So let's talk about OS tree for a second It's tagline is kind of like get for your operating system. So the idea is you have Kinet objects that you can deploy It lets you do atomic rollback and unlike container Linux where we had a AV partitioning scheme OS tree uses hard links. So in a given deployment you all of your files are hard linked into the repo So you get sharing between deployments And that means you don't need twice as much space for having two very similar deployments And each of those deployments are basically just like a get commit checkout so If you look at like what's actually in an OS tree disk This is like if you were to just pop the disk out of your computer and mount it We have a directory for boot and that's actually not used by OS tree. It's just a convenient mount point to have And then everything is under slash OS tree There's a repo directory and you can think of that kind of likes your get directory Where it has all of the actual objects and then you have Deploy and that's where you check out those commits to OS tree supports actually installing multiple OSes at the same time on Fedora for us There's just one so this like OS name would be just Fedora for OS and Then in there you have you know multiple deployments in that case in this case It's hash one and hash two those are like full shaws, but those don't fit on a slide very well, so And in each of those that's where you'll find your normal like slash user slash Etsy that it looks like a normal like oh Linux root and then Because when you do like an update you don't want to delete all of var like you want to like carry state between updates Within an OS the var is shared and as you'll see in the next slide that That will get mounted to their correct place on boot So this is what a running OS tree systems important mouth points look like you can see that your root is mounted to that deployment and That you'll notice that user is also has a special mount and that's so that we can mount user read only This is to prevent people from making changes to the system because it's a beautiful OS You can see that bar is mounted to the OS is var directory and There's a couple other mounts here the boot and boot BFI are just normal mount points like you'd expect on other systems And then there's also this mount point sys root and that's kind of an escape hatch back to the Route of the actual block device. So when you do something like doing a new deployment, you actually need to access that repo and so you do it through the sys root and Finally, we want all of our state in var and so all of your directories like home or opt or Serve where you like find state normally are all sim links into corresponding directories in bar And so this means everything that changes on your OS goes in bar So let's talk about ignition now Ignition is how we do all of our machine configuration It configures machines using a JSON config It's a declarative configuration So instead of saying like I want to do like partition it then create a file system then create files on it You say I want this partition to exist. I want this file system on it and I want these files to exist on that File system and it's designed to run in the inner ramifest so the inner ramifest is very early in the boot process before you've mounted anything and So because nothing is mounted we can do things like repartition the disk you're using that you normally wouldn't be able to do on a running system And it'll handle things that are traditionally handled by like an installer so things like partitioning creating file systems But as well as also like things that would normally be handed by configuration software like cloud in it So things like creating users creating files setting up system to units that kind of thing And because we have this unified disk image and because it runs in the inner ramifest that means that all of your configuration on clouds and on bare metal can be the same and Depending on what you're setting up. You might even be able to use the exact same config The configs themselves are not easy to hand write their JSON file and Like things like inline files get based 64 encoded And so we have a tool called the fedora core west config transpiler that takes a human readable format that has sugar for calm in actions and transpiles that into your JSON configuration file It runs in four stages and we'll get into more of this later But you have the disk stage that handles your typical like Installer type tasks of partitioning creating file systems. It has a mount stage for setting up Mount points that the file stage uses to do all of your configuration and then an unmount stage to tear down what it did in the mount stage and Finally actually lied earlier. There's actually two configuration files One is what we call the base config and that is all of your like OS vendor configuration so on fedora core west there's a default user called core and that's actually not shipped in the OS image But instead in the base config and so the base config says that this user should be created or should exist rather and Then the user config is actually that you specify is merged into that config So this is an example of a fedora core west config Yes The so it's only happens on first boot so there is nothing to roll back to beyond that This is a Example fedora core west config in this case when fcct runs on it. It'll actually just generate an equivalent json config There's nothing magic happening It specifies that on device dev slash vda There should be a partition with Partition number five labeled bar The start and size zero are special keywords basically saying find the largest available block of space and use it It says there should be a File system labeled bar that exists on that partition And it should be formatted as XFS and when ignition files runs, we should mount that at slash four And then finally there's a system d unit that will get put in the real route That says that should be mounted This is the ignition config. It's the same thing just as json So now I'm going to talk super briefly about how normal Linux booting works So in the simplest case you have no in a ramathas Your firmware loads your bootloader like grub grub will load your kernel and there's a kernel command line argument called root That just says what device should be mounted as your route And Linux will mount that there and then exact sbin slash in it as your pit one And typically that's like system d now But there's a problem with that of like what if I'm using blocks or what if I'm using raid and I can't just mount a device as Route so that's where the in a ramathas comes in and the idea there is you basically take a file system tree and bundle it with your kernel and Instead of directly mounting your root argument you the kernel will Unpack that and mount that as root and then that will set up whatever it needs to do to get your actual root file system available Mounted that typically sys route and then switch route into that and that switch route operation You can think of us basically just a charoot and exec And it's not a fork and exec. That's your actual just exec. So the new Slash sbin slash in it replaces the original so Nor core OS Is a little bit more complicated because we do all of our configuration also in the ramathas So that's everything in bold here is in the in a ramathas So you still have you know your firmware loading your bootloader your bootloader loading your kernel in it ramathas But now we need to do a lot more so first Ignitions only runs on first boot so we need to figure out if this is first boot or not and if it is we need to Partition and format disks with ignition Then we need to find root mount that have OS tree set up all of its mounts and Mount any other file systems like in our example. We're not also setting up a mount for var And on OS tree systems Var is unpopulated by default. It's just empty and before we running the all of our configuration We want to make sure that we're configuring things on top of what the base system looks like so we need to populate bar Then we can do all of our configuration with ignition like creating users files Files system D units and then we need to tear everything back down So where we just have the root mounted because we don't want the first boot to be different You can imagine if a boot only worked because we left something mounted that ignition mounted Then when we go to reboot we're gonna run into an issue where it worked the first time didn't work the second So we want to avoid that And then we do our normal switch room and Then you're running in the real route. You can start all of your system D services normally So let's talk about how we load grub on BIOS if you know there's nothing special about this It's a standard BIOS boot loading The BIOS will load the first Sector of your disk and start executing that so that's this grub bootstrap and you only have 446 bytes to work with there So all it does is just load the rest of the grub code which is in the BIOS boot partition over there And that loads grub and then grub is configured to load Your grub config from slash boot and so that's that grub config in red and I'll show you that a little bit later Also in slash boot we have obviously our actual Colonel and it ramifesses But we're also using the bootloader specification. So instead of The grub config defining all of the entries in your menu Those are all defined by bootloader snippets, which are all of these entry dot-conk files And so they typically look like this. It's a pretty simple format You have a title that should be displayed paths to your net ramifest and kernel and then all of your kernel options and By using this we can actually get rid of the need to ship like grub make config and Regenerate the grub config every boot that enables us to have a static grub config. We don't need to change and The grub and it's meant to be a bootloader independent specification. So grub has an implementation of it and all it does is look in slash boot slash loader slash entries and Create menu entries for all the things to find there. So if we look at the kernel options we Use There's a standard root option that says like what our final root should be and then there's this dollar sign ignition first boot So that's actually a grub variable that will get expanded and I'll show how that works in a sec There's a platform ID So this is taken from a key new image and this just tells ignition like hey You're running on Kimu like you should fetch your config using the Kimu method and then there's this OS tree parameter That's that path is actually a sim link that points to where the actual OS tree deployment is So this is our grub config The first two lines just find the boot partition and set that to both the boot and root variables The root variable is used in the next section there where it says If the file exists slash ignition Dot first boot that is relative to the root variable. So that's on a running system That would be slash boot slash ignition dot first boot And that's just a file that says this is the first boot and at the very end of our boot process will delete it So it will only happen on first boot And if that file exists we set that variable that was included in our kernel command line to say This is the first boot also. We need to turn on networking which is this rd.need net and IP equals to the HCP And then finally this last line the BLS config says go and read all of those config files and generate the menu entries and That one uses this boot variable that we also defined earlier So on UEFI things are a little bit simpler UEFI knows how to find the EFI system partition And if you don't have any like EFI configuration set up It will fall back to loading this boot x64.efi and that is our secure boot shim So that is signed so secure boot can load it and then that will in turn load the grub executable Which will read this grub config file and that grub config is just a shim That loads the same one that the BIOS loads So we share the same grub config between UEFI and BIOS The grub prefix variable is a special variable that says like this is where I expect to find like my config and when you switch to normal mode that gets implicitly loaded so BIOS and UEFI are basically the same path so now that we've got the kernel loaded and Loaded the init ram fs Now we get into the need of things and this is everything the init ram fs needs to do The things in red are things that happen every boot and Everything else is things that happen just on the first boot So we need a way of deciding if it's the first boot and conveniently we have a kernel command line parameter that grub passed us Which tells us if it's the first boot or not We need to do anything we need to do pair to run Ignition disks to do all our partitioning and file system creation. We need to run it We need to mount the root device and set up all the mounts We also need to mount var on a normal boot that would happen in the real root But since we need to populate we need to mount it in the init ram fs on first boot Then we need to mount any file systems we defined in our ignition config We need to populate var do of all of our configuration and then tear everything back down and switch route So if you look at the man page from system D for boot up There's a bunch of system D targets and mount points There's some basic initialization that happens and once that's completed you've reached basic dot target Then there's a target for saying I have found the device that you said was root There's a mount unit to mount it to such this route There's a target for saying Okay, so I've mounted it and now it's ready as a normal Linux file system So this will happen After what we've done the OS tree prepare route and fixed all of the mount points to look like a normal Linux file system These two in gray are not used by Fedora core OS. So you can basically ignore them There's This in it our DFS dot target is for any other file systems that you might need to mount So if you were on a system that had like a separate user and needed that to be mounted before you switch route To that's where that would come in And then in an or Ddup target says I've done all of my in it ran a fast configuration. I'm ready to do the switch route so this is what that looks like and Everything on the right is the stuff that only happens on first boot and so we have a system D generator and system D generators run before you start any units and they can do things like create units create dependencies between units and Things like that. So we use that to pull in all of these units on the right if that kernel command line option is set So and if you kind of break this down into groups of units The core OS GPT setup and that chunk up there is all stuff to prepare to run ignition discs this section here with the In it our D route device is all of the stuff to set up your root of this and Then the stuff over there is the stuff to populate Bar and configure it and then finally once you've done all of that we're ready to switch route and we'll get back to the slide do So What needs to happen for ignition discs? Well, we need a disc that has a valid GPT Partitional layout and we actually don't have that by default now. I'll get into that in just a sec We need to find our base config And we need to find our user config So if you're on a cloud the user config actually won't exist because you're getting it from a cloud metadata service But if you've installed installed the bare metal you need to go grab that off of your boot partition and Then finally since ignition can do things like fetch file over network we need networking up so the GPT is invalid by default because It has a disk UI UID which is a unique identifier and since you're debuting the same image everywhere It's not going to be unique. So we need to scramble that And the backup header Should be at the end of the disc and since you just debied an 8 gigabyte image. It's not at the end of the disc It's 8 good 8 gigabytes in and so we can fix both of those with an s2 disk man So that's all this unit does The ignition setup just copies some files around so it uses that platform ID that you saw in the Kernel command line arguments earlier to determine what platform you're on and grab the correct base config for that and Then it also will mount boot Grab a user config if it exists and then copy it to that path And that's where ignition it will read these from and then finally for networking We just use brackets legacy networking. We want to move away from that and use Network manager that'll enable us to not need these rd.neednet and IP command line parameters because we can pull in network manager just as another unit in our graph But we can just use network dot target. So when we switch to network manager, we won't actually have to change those dependencies So now we're ready to run ignition disks And this is the section of our config that ignition disk cares about so it'll go through and Do our partitioning and then create the file system The other thing it will do is it'll actually fetch both the configs the base config and the user config Merge them together and cache them to disk so that all of the future ignition stages don't need to go through that process So this is where we are in our boot process. We've done all of our setup to run ignition disks and We also the root device has been available this whole time. So that's also ready. So now we can do our mounting we can set up all the mounts for OS tree and Then start populating bar. So sysroot.mount is a normal mount unit. It just counts the device This is generated from a another system degenerator that reads that root kernel command line parameter And generates the mount unit to do that OS tree prepare root dot service. This is what OS tree does to Fix all of those mounts so that when you look at Slash sysroot, you don't see like boot and OS tree But instead you see all of your like user at sea all the things from your actual deployment and then Coral s mount var is what actually just mounts var in and again on a normal boot You wouldn't need to do that second happen in the real root, but we need to populate it. So we need it mounted now Ignition mount all this does is look through your file system section and say oh I need to mount this file system to slash var relative to the sysroot and so you can see before we our var was mounted as part of our root partition at a different path and now we have it mounted as our Partition and file system that we just created so now var is mounted all of our mount points are set up and we can populate var and Because OS tree has a blank var by default if you have like packages in your base OS that need to put files there We need a way of doing that and so our PM OS tree, which is the tool that actually creates the OS tree commits for us We'll look at all the packages. It's using and if they have files in var It'll generate system d temp files entries to recreate those So it'll generate a bunch of configuration for system d temp files And then we can run system d temp files and put all of those files back in var And so after we've done this our system is in a state similar to it if you have just installed an operating system But have done no configuration yet So now we're ready to do our configuration with ignition files You'll get a graph. We're almost done all we need to do is do our configuration and then we're ready to switch route So ignition files internally is split into two parts There's user creation and group creation and then everything else so like files directories and system d units and If you're thinking that oh, we don't have to worry like this is our Configuration here all we have is a system d unit. There's no users to create If you remember that base config has the core user so that'll actually happen first We'll create that core user and then we'll add this system d unit. And so now that we've done that All of our prerequisites for ignition complete dot target have been met So that's done and in order all the prerequisites for in order d dot target are done. We're ready to switch route and when you go to switch route that happens by isolating to a new target so And when you do that it's going to stop in an or d dot target And if you've been wondering this entire time, why are those two units in red? That's because They have a special thing called exec stop. So system d units allow you to Specify actions that should be taken when a unit stops. And so this is how we do our tear down so when the ignition mount unit gets stopped we run the ignition unmount stage and when Coros bar mount stops we run its unmount and Again, this is to ensure that the first boot isn't special. We're not carrying things that ignition did In terms of setting up mounts into the real route so that we don't run into a situation where it worked the first time And then we reboot and then we fail so switch route happens pretty much like normal switch route would it Basically just trutes into the CIS route and then execs has been in it in there and in our case that's system d again and the nice thing about doing all of our configuration beforehand is if we wanted if we want to configure things in the real roots Boot process we did that before we started it So we're not trying to configure our boot process during our boot process, which is problematic so Now we switched route system be a started up It's a normal system d start up and eventually we reach this boot complete dot target and that just says hey we successfully booted and Now we can go and delete this boot ignition First boot file and so that next time we boot when grub looks for that file It won't find it We won't add the kernel command line argument to say this is the first boot We won't turn on networking in the anagram FS and our boot can Proceed normally, so now we're done So Final part is like what do we still need to do? Life pixie is One case so container Linux supports running out of RAM so you can just pixie boot container Linux and Have your entire root be ephemeral and typically when people do this They have all of their data in like bar and have that on persistent storage and not that in and Just run ignition on every boot in that case because they're starting from a fresh route every time so you can do that We want to be able to do that for four less as well. We haven't had time to implement it yet We don't have automatic rollback yet. So on container Linux if you fail to If you're like kernel fails to load or something like that It'll automatically fall back to the previous Installation and that instead We don't have that implemented yet. It's something we want to do And that has to a lot of that is in the grub configs where we need to Keep track of have we tried booting this did it work? And so we can use that to determine like which entry should we boot The other thing we want to do is Detect if you're using ignition to change kernel command line arguments Because if you're changing kernel command line arguments, you probably want them to be applied on your first boot So after we've done all of our configuration We want to detect that and if that is the case reboot so we can pick those up and then go through the boot process normally With this like single image Where we're not doing like a typical install that means moving where root is is problematic So if you wanted to do like root on raid or encrypted root We don't have a place to pull all of that information from after we've created the new Device especially if you're replacing root. So we need a way of detecting like hey, are we going to? Be moving root if so we should take all of the contents of root load them into RAM so that After we've blown away the old route we can still put all those contents back and finally this Boot process isn't quite worked out on power PC and system 390x with their boot loaders yet So we need support for that You guys have any questions How did the ignition configs get fetched so I'm going to jump way back If you look these are our kernel command line options there's this ignition dot platform id and Ignition has a list of different platforms. So like Kimu AWS Google Cloud all of those and they all have a entry in there and Ignition will that parameter gets passed to ignition and ignition will then Look up how to look that up. So on like AWS It'll hit the easy to metadata endpoint on Google cloud. It'll hit that endpoint On Kimu. We use a kind of hacky method where we actually pass it in with via the firmware config But it depends on which platform you're on What protocols does it support? So? It supports whatever you need on that cloud So on most in most cases. It's just HTTP Ignition also supports chain loading configs. So in your config you can say I want to actually Like fetch another config and merge that into mine and for that we support most common protocols I think we support HTTP and HTTPS TFTP Think one other but I don't remember the top of my head Sorry to repeat the question Doesn't use matchbox so matchbox Will serve its configs over HTTP and there's actually something I didn't cover is There's an optional kernel command line argument where you can Specify a URL that ignition should use instead of its normal config So I think it's ignition dot config dot URL if you specify that kernel command line argument and give it a URL It'll instead of doing its normal fetching fetch the config from there When things go wrong, how bad you triaged it what are the techniques you use? When things go wrong, how have we triaged it? so ignition and Fidore CoroS in general follows the philosophy of if you fail you should fail hard So part of ignition's whole philosophy is you should either get the system you specified or nothing at all Like the worst case is that you get a system that comes up that is like partially provisioned And so Unfortunately, this means that if you fail you're going to be dumped into an inner ramifest shell and this does make debugging harder We try to make it obvious like when that has failed like how to get logs But if you're like on a cloud platform that becomes harder because your console access is typically not great One of the things we do want to do is support like forwarding ignition logs to various like Cloud endpoints or things like that, but that's still in the works any other question Yes Like So the question is We talked about in the future being able to move the root of Fess Does that also include being able to further split up the root of Fess into? like more mount points To some degree we already support that so things under the like OS tree Section like that your everything that's not under bar do need to stay together But we already within bar you can actually divide up bar however you want like that can be arbitrarily complex already any other yes Yes, so the question is I mentioned that this does not fully apply to Relcore OS. What are the differences there? in a go to another slide, so Relcore OS is using an older version of ignition Which does not have this ignition mount and unmount Section and because of that and because we need to populate bar on first boot What rel core less can do with bar is less flexible and so a lot of this is the same so like Pretty much everything up until this point here the in at our D root of Fess dot target is Pretty much the same. It's this stuff here where we deal with mounting bar populating it that part is different Any other question sure? So The question is is fedora for us the upstream of rel core less Mostly So things do move from fedora core less into rel core less But there's also some things that move backwards the other way. It's not a strict upstream and downstream relationship Yes Right there was this flavor So the question is in the fedora core less config there's a variant label that says SCOS What is that about? so We try to make our configs descriptive about what they are. So there's both a variant and a version Because writing ignition configs is hard. We wanted to make Our transpiler easy for other people to extend for their own OS's But at the same time we don't want a bunch of configs floating around that aren't descriptive about like what OS they're targeting So that variant there is saying this config is for fedora core OS So that other people who want to use ignition want to write their own similar tool There's a standard way of saying like I this config is conforming to this version and targeting this OS Don't try to use it on other OS's The question was a lot of the boot process here doesn't relate to containers Where does that come in? So yeah, so if I go to our List of everything That needs to happen. That's way down at the bottom and start normal services So there's nothing special about how we launch containers on fedora core OS But if you wanted to set like where this would come in places if you wanted to set up a system D unit to like say run pod man run or Start docker or that kind of thing you do all of these set up for that to get it ready to run that in this boot process A question is there's nothing in here that prevents us from doing a different kind of boot Maybe Yeah, so the the boot process here is not prescriptive about what kind of things you could set up So fedora core OS is not designed to like run on a desktop for example but if There's nothing in our boot process specifically that prevents people from trying just don't We're trying to keep the boot process generic and not like limit the scope of what people can do with it Any other questions? All right. Thank you