 Hey, everyone. Thanks for showing up. Let's get started. This is the early boot provisioning talk. So today, I want to talk to you guys about provisioners that run during the boot process. So to do that, I'm going to first lay out sort of the problem, the provisioning problem we're trying to solve. I'll talk about how CloudInit initially aims to solve this problem. I'll then introduce Ignition. I'll talk about how it's different from CloudInit. And I'll walk through a typical boot process on container Linux to talk about it. I'll go through a few examples of how you can use it. And I'll wrap up by covering the future potential for early boot provisioning as well as Ignition. A bit about me. I'm Dalton Hubbell. I used to work at CoreOS from 2015 until a few months ago. While there, I built various services and installers for doing cluster provisioning. I worked on our Tectonic enterprise offering of Kubernetes. These days, I maintain a separate Kubernetes distribution. And that's informed part of this talk because it all uses Ignition and early boot provisioning. Provisioning itself is a super broad domain. It generally covers how to provision many Linux hosts for some particular use case and for some definition of many. That might involve anything from how to manage and version configurations to how administrators are authenticated, do access controls and auditing, stuff like that cataloging inventory. But I'm scoping this down to just the actual provisioner tool, the thing that gets the right bits in the right places on disk when they're needed in user space. Among provisioners, there's this whole slew of different categories as well. There's the sysadmany type tools that just run arbitrary commands on hosts and parallel. We're going to skip over those. Dynamic configuration management. We're going to skip over that, too, that runs arbitrary commands either in a push or pull like fashion. Maybe there's a DSL involved, but generally they still do iterative mutations of machines. The hero of our talk is going to be the distro-supported mechanisms, CloudNet and Ignition. They're a system where you provide a configuration document to a machine when it's booting, and that's enacted sometime during the boot process to actually produce a fully provisioned Linux host. So to define our use case, it's going to be large-scale clusters. So for infrastructure cases with clusters, it's increasingly this last category that we're interested in. And it was also this last category that CoreOS was interested in having a minimal OS that was focused on containers being package-less, and every host was a dumb node in part of a higher-order cluster. Concerns like simplicity and speed and scale were part of why this last category was important, whereas concerns like dynamically reconfiguring applications was not as important because Kubernetes and clusters handle those types of things these days. So let's start with CloudNet. CloudNet's pretty ubiquitous. It's an initializer. It's installed on most major Linux distros. Let's users write a CloudConfig YAML document. It describes what the system will look like in this mix of declarative and imperative sections. People can set users in SSH keys and install packages and repos and do a little bit of network configuration. CloudNet works by fetching a CloudConfig or user data from some metadata service, like a network endpoint or a config drive or some other pluggable data source. It's implemented as a pretty large collection of Python modules, and it runs on nearly every boot in user space. Here's an example CloudConfig document showing that mixture of a declarative write file section that says, hey, Python module, write the system D unit to a particular path. And an imperative section, a run command, list of arbitrary commands that are run. The order of these sections don't matter. You generally go and read the CloudNet docs to figure out which sections run first. In this example, super important that write files happens before run command. So you just read the docs. But to initialize a host to a desired state, a CloudNet runs in user space. It runs before networking and also after networking. So specifically, it runs in four separate stages and four separate system D units. It uses a system D generator. For those who don't know, a system D generator is like a small executable. It generates system D units before any units are ever loaded on the system. It's usually a bash script. The CloudNet system D generator checks for a CloudNet disabled kernel arg or a CloudNet disabled file. If it finds any of those, it doesn't run. But by default, it does. It enables the CloudNet target, which wants these four different services. There's CloudNet local, which looks for a local CloudConfig. It's usually some minimal thing that just sets up networking. There's CloudNet service. At that point, you have networking. It goes and fetches another CloudConfig, probably from your cloud provider. That, if it contains a network configuration, has to re-initialize networking. There's CloudConfig, which runs other modules. And there's CloudFinal, which installs packages and runs arbitrary commands, or at least the run commands. There's also two flavors of there's boot command and run command. ContainerLinux shipped with CoreS CloudNet. It was a go implementation of CloudNet. It supported a subset of the CloudConfig document that we just looked at. It was implemented in go because there was really no interest in shipping Python on the US because it would bloat the image. And go was pretty popular. It only had to implement a subset because a lot of concepts in CloudNet just weren't relevant. Managing packages was not relevant on a packageless operating system. Or user management wasn't as big when pretty much everything in a server context uses a user core. This was at a time when configuration management systems were in their heyday. And CoreS was pushing this idea that you should just use a CloudNet document. And more than that, we were pushing never changing that CloudNet document. It was an early form of immutability of just don't change it. Fast forwarding into July 2015, CoreS was vaguely popular. People were writing CloudConfig to describe their clusters. Infrastructure patterns were evolving. People were starting to use initially fleet and then Kubernetes. And the scope of what somebody wanted to provision a single host to do was shrinking because it was just a member of this bigger cluster. So I would say CloudNet was successful in getting people to provide a configuration document as part of the boot process. It enabled that idea. But there were a number of pain points with CloudNet. So it ran on pretty much every boot. They've made some improvements in this recently. I'll give them some credit for that. But at the time, it slowed down boot times a lot. Having that run every time, it disincentivized people from using it because the more you wrote, the less aware your boots were. It also ran pretty late for various types of provisioning tasks. I mentioned networking earlier. Network configuration is tricky because CloudNet starts. It doesn't have networking. It has to load an initial CloudConfig, and then it has to set up networking. Your CloudConfig might specify some fancier networking, so you have to restart networking. Simple tasks, like I showed the example of just starting a system D service. That was way more cumbersome than it had to be. Because you had to go write the unit and then run a sequence of commands. You were mixing declarative and imperative just to start a service, the most basic thing. You also have to be really familiar with the different stages of CloudNet. You had to go read the docs for every section to know what ran before what. So it's complex. And for a lot of users, a ton of users, CloudNet became this mechanism for running arbitrary commands. You'd see CloudConfig that looks like this. Basically, it accepts actual bash scripts, which a scary amount of people end up actually doing. Or glorified bash, which is what I call this one here. This is really, really problematic, because it turns out poorly written scripts will prevent your system from actually coming up for very unimportant reasons. Just the other day, there was a case where someone had makedir instead of makedir-p, and that prevented a, this causes some outage related to the nodes not being able to actually join a cluster. Like, it's so simple. Also, that example I showed earlier, starting the systemd unit, I don't know if you noticed that dash dash no block flag? That turns out to be super important, because if your service is waiting on Cloud Final, like Docker, if you don't have that, then, again, your system will hang. So authors have to also be really aware. They have to be cognizant of the fact that their CloudConfig is going to be run again on every boot. So makedir versus makedir-p is a great example of that. This was also really problematic, because your system would seem to run fine. It would reboot after seven days or something, maybe some auto update, and that's when you would discover an issue. There's this disconnect between a bug in your CloudConfig and the incident that it ends up causing. Developers start writing defensively, because they start realizing this is a problem. So you end up with partially configured machines. People make things optional. Within orgs, these scripts tend to grow out of control. There are multiple ways of doing things, declarative versus imperative. It also just increases the learning burden on a team, because suddenly, people can call anything. We've reverted to a world where bash is configuring systems. So enter ignition in 2015. So ignition was designed at Chorus as this replacement for the Chorus CloudInit. It was added into the OS, implemented in Go, and it runs from the init ramfs. It's in early user space. So that allows ignition to fully configure the assist route before it becomes the real route and before PID 1 and user networking have actually started. Like CloudNet, it accepts a configuration document, but very differently. It only allows you to do things declaratively. So it can partition disks, format file systems, assemble rate arrays, write files, write system D units, mount units. But these are all writes, you notice. There's no mechanism that allows you to run arbitrary commands from the init ramfs. And you can write a system D unit that runs a script in user space, but that's like a separate problem of a thing that happens in user space. Ignition happens in the init ramfs to write files to a disk. Any failure to fulfill an ignition configuration intentionally drops the host into an emergency shell in the init ramfs. This provides sort of a, in large scale clusters, this is useful because it provides a guarantee that if a system gets into user space, ignition successfully fulfilled it, like everything in that document, otherwise you'd never get there. Prevents bad hosts from ever joining a cluster. The way Chorus integrated ignition into our distro was so that it would only run on first boot from disk that forced an immutable style, which is something we were just encouraging before. Let's walk through a container Linux boot process to sort of show how this works. So ignition runs in early user space. Why does that? Okay. We're gonna skip that in the interest of time. On container Linux grub acts as the boot loader. It's responsible for like going to picking our kernel, loading it into memory, and actually starting it with the right kernel args. But grub does two super important things on container Linux related to this. First, it has to pick between two user partitions, A and B. Partitions three and four. That's based on GPT priority attributes. Ignition is totally fine with this system that the user partition can be swapped all the time. This is how auto updates work. The second is that grub is the integration point that supports the Ignition's mechanism to run only on the first boot from disk. Let's hop into an example of that. So I actually did a screen capture beforehand, which I hope works. Yep. It does work. Thank God. Okay. I'll boot a container Linux instance as if it were a bare metal host. I'm doing it in KVM because it's easier to screen capture here. So we're creating a blank QCOW 2 disk image. I'm gonna start the VM from that ISO. I've sped this up a little bit. I figured you didn't wanna actually wait. Once it comes up, I'm gonna download a simple Ignition config. That config is just going to say to add my SSH public key to the core user's authorized keys. We can look at it. So yeah, just add that public key to the core user's authorized keys. I'm then going to run the course install script. It's just a script that does the actual install to disk. It writes the production course image bit for bit to the target disk. And as part of that install, I specify the Ignition config. This is how you do it by hand if you're walking up to a bare metal machine with a little USB stick. But in the real world, probably for large scale clusters, in basically the pixie world, you're going to pixie boot all of your bare metal machines in this exact same way into a live container Linux environment. And you would just script out this, the same install process. Let's take a look at the two partitions that are interesting, partition one and partition six. So partition one is the OEM system partition. Inside of it, you'll find a course first boot file. That first boot file is actually read by grub. Yeah, we see the files present after a disk install. If we inspect the grub config on target here, grub is actually looking for that first boot file and using it to set a kernel argument. Of course, first boot equals detected. So this is the mechanism that ensures that a kernel arg is always going to tell the subsequent systems that this is the first boot from disk. The other partition that we were going to look at was partition six. It's the OEM partition. If you're wondering where that ignition config ends up, it's partition six. This is the exact file that we just added. And before I reboot the system, I'm going to modify the grub config to set an init ramfs breakpoint, which will be useful for the next demo. We're setting a breakpoint so that we can halt in the init rfs. Yeah, you get the point. Let's talk about the init ramfs. So the Linux kernel introduced the init rfs in 2546. It's the CPIO archive that gets paired with the kernel image and it contains initialization code to initialize all your devices and mount the real root file system. And this all happens before starting PID 1. Originally, most of this code lived in the kernel itself, but it started to grow and it had too many responsibilities, things like, it has to go find all the block devices, whether those are local or remote or you start getting into like nfs root file systems, like getting DHCP leases, setting up networking, knowing about rate arrays, LVM, decrypting partitions, how do smart cards work? It's a lot of stuff to put in the kernel, so it was moved out. And it runs in this early user space environment. It has memory protection and syscalls in Caleb C and it runs before PID 1. It doesn't have to be a politics complainant. It's a weird early space world. On container Linux, we used to track it to prepare the init ramfs image in the course SDK. We can quickly, we have to do this again. Maybe this wasn't the best idea. We can quickly cork enter into the course SDK. It's just a cheroot. We see a set of scripts that help build the distro and there's a source repo called boot engine. There's a source repo called boot engine that keeps all the trackit modules and configurations. Trackit's pretty common tool. Many of you have probably used it before. It's a tool that takes a set of modules that specify how to copy files into the final CPIO archive. If we look at the ignition module, this 30 ignition one, we see we call the inst modules directive. It says to copy in the ignition binary and a couple other useful ones like user add, user mod, makefs. Further down, we see it copies in a few systemd services and an ignition generator and that's really all it is. It's like a couple binaries. It's not a large set of tools. I think it's actually 10 binaries if you count them that it's calling. Let's go ahead and boot that system that I showed earlier. So this is going to be the first boot from disk of our bare metal instance that was set up by doing a disk install somehow, whether pixie or by hand. Grubb's gonna select the default menu entry. It's gonna find the right user partition based on GPT attributes. We're going to drop into the trackit in it RFS. Right before all the interesting stuff happens because I put that break point in last time. I can show the OS release. We see it's trackit. In this world, we have an early systemd available to us. We have journald and we'll soon have systemd networkd. It's not turned on yet. We can see at this point, the real root file system is mounted at sysroot. We can check out the systemd generator for ignition. It's called ignition generator. User lib systemd system generators. If we look through it, we can see it's actually looking for that kernel argument, the first boot kernel argument, and it's going to add ignition disks and ignition files services, as well as an ignition setup service that's needed for pixie and ISO. And at the very bottom, it writes the OEM to a file called run ignition. The OEM is like easy to, or wow, I've lost my place. We can take a look at that ignition setup. It's going to look for a device that's labeled for the OEM. You can look at the script. It's going to actually mount the OEM disk and copy the ignition config off. This is how the ignition config gets in place. And shortly after this, I'm going to actually manually invoke the ignition setup, which is going to trigger ignition setup to run. It'll bring up networking and then it'll run the two stages of ignition, ignition files and disks. I wish this were a little more speedy. So it runs those, sets up networking, runs the two stages. The two stages will actually go and configure the sys route. So let's review what we've seen. There's a system de-generator. It looks for the first boot kernel arg to decide whether the ignition services are wanted. Copies the ignition config off the OEM partition and it runs the two ignition stages. Those can happily have networking because we're not modifying the actual running system. They won't have an effect until user space starts. It covers bare metal for cloud instances. How do we fetch the ignition config? Well, ignition is passed in the root location, the stage to run and the OEM name. It declares an engine and the engine is run. This fetcher is actually specific to the OEM. It loads a base configuration, which is like an empty struct. Loads a system configuration, which is usually just enough to set up a root file system. And then it acquires an ignition config based on that fetcher. So it's specific to the OEM and it merges them all together. The fetchers, there's a different one for each cloud provider. They fetch from like the canonical source. You can imagine all the cloud providers fetch from a network endpoint except for Azure, which is using a config drive, I believe. The interesting one is QMU KVM. You can use the firmware config flag, which allows you to pass a blob of data into the guest VM and there's a canonical location that is used for ignition. There's a fetcher that will go and read from that. So that covers bare metal cloud and hypervisors. Let's change gears a little bit and look at the configuration document. You can see the ignition itself is a JSON configuration document. It is not very human friendly at all. That's by design. The idea was that people would use tools to actually evaluate this to get like validation errors and things out. It also simplified a couple of type conversion issues. In practice, people end up using a YAML document called a container Linux config. That'll have to change names with the OS. It has a couple of top-level sections and some nice IDs like allowing you to put files in line and check these into source control. There's a section like systemD that writes a systemD file. There's nothing really advanced here. It's just writing this file into the sysroot in the correct location. Storage does similar things to set up file systems, write files to a particular location. It can partition disks as well. Password D just invokes user add, user mod with like dash dash root set to the sysroot. So again, it's a file modification. Here's a network configuration. Again, this just writes a networkD config file to the right location. The theme of all these is that we're not actually contending with a live running system. We're just writing files that will be read once PID1 starts. Over the years, there were a bunch of tools that were integrated to work with Quora's signature. There was CT, it was a command line tool that converts from that YAML format to the JSON format. Terraform provider CT, the Terraform version of that. Matchbox, a thing that serves files to pixie booting bare metal machines. Typhoon is my own Kubernetes distribution that has container Linux and flat car support for Ignition. Ignition enables a pretty unique feature there, actually. Earlier, I mentioned that Ignition documents are merged together. There's an additive merge. That turns out to be a really useful concept for a customizability for a Kubernetes distro. You can define a container Linux config that set up a Kubernetes cluster, but then you can also accept user-defined configurations and then merge them all together. So in this example, there's a large cluster, but one particular node might have some separate network setup. Maybe it's a piece of hardware that has a separate set of nicks. What's in store for Ignition and early boot provisioning to wrap up? Ignition has this concept of file system reuse, and there's a lot you can read about the semantics around it. There's ongoing interest in how do we get that same type of reuse available for rate arrays, for partition reuse and disk reuse. This is useful if you wanna fully re-provision a cluster, but you maybe wanna preserve some sort of rate array that's on some set of disks. Often the case in Kubernetes clusters, there's work to get SE Linux-enabled hosts working. No correlation to Red Hat, I'm sure. I think that's actually already in there. There's work to get RPMOS tree. Of course, Red Hat's influence again. I think that's related to installing RPM packages in the SysRoot. You can imagine how that would be very similar. It's a similar pattern. There's work to get LVM and encrypted volumes in there. Since CoreS was acquired by Red Hat, that means Ignition's actually coming into more distros that was confirmed in blog posts. I drew this amazing diagram here to show. So CoreS Container Linux is continuing on for two years. This is my understanding of it. I no longer work for CoreS or Red Hat. Just clarification. This is my understanding. Continues on as Container Linux. Flatcar forked it. They also have Ignition support, and Fedora and Red Hat CoreS will also have Ignition support because that was announced. If you want to compile Ignition yourself and put it in your own init.refs, it only uses these 10 tools. They're compiler options where you can, if you have a different path in your personal init.refs, you could just change these and show Ignition in there and see how it works. To recap, so CloudNet pushed people towards this idea of having a configuration document and doing the provisioning as part of the boot process. It had a couple of pain points. Ignition tried to solve those pain points and there were a few key insights, like three of them. Running from the init.refs in early user space is really advantageous. We integrated with Grub so that Ignition would only run on first boot to enforce this immutable style. And we permitted only declarative elements in the configuration document, so no arbitrary scripting. And finally, Ignition and the whole concept of early boot provisioning are going to become available on more distributions. I think that's pretty exciting. And maybe you can try it out yourself. So, with that, thank you all for attending. And I think I'm in the last talk for the whole thing. Yay.