 All right, it's 4.35, so I think we'll go ahead and get started. Thank you guys for sticking around late in the day for my presentation today. I'll be presenting on my group's experiences in switching from an asymmetric to a symmetric software update model. A little about me, I work at NI, NI is focused on making hardware and software for test measurement and automation applications. For the last five years, I've been a member of the real-time OS group at NI, where we maintain a real-time Linux distribution for our hardware systems. By real-time, I mean primarily that it has a kernel that has the pre-empt RT patch set applied to it. We're generally developing for 32-bit ARM systems and X64 systems. The distribution that we build is based on Open Embedded and Yachto. So we're all familiar with software updates. We see them regularly on our desktop computers, on our phones, and a million other devices. Today we'll kind of be peeking a little bit behind the scenes to see what's going on in the mechanics that are behind those updates. So today we'll be first starting with like kind of a little background on asymmetric and symmetric software updates. Then I'll talk a little about our use case and kind of the background for that. That's kind of what drives some of the interesting aspects of our design. I'll talk about our motivations, like why did we do this in the first place, what our requirements were, and then we'll start looking at the implementation. There was a ton of work that kind of went into this, but we'll sort of focus on the areas that are maybe more interesting from a technology standpoint or maybe more applicable to something that you might do. So asymmetric updates involve two different environments. You generally have your main OS, which is like your normal operating environment. It's generally the environment that your system spends most of its time. Then there's also a recovery environment. This environment is usually pretty minimal. It's generally what's used almost solely to update your main operating system because it's so small and only has a limited functionality. It rarely needs to be updated itself. A common use case is that this system will use an init ram FS to run a temporary non-persistent file system that just runs out of RAM. And then once the recovery OS has done its job, all that state information is just lost when power is cut. If you've seen devices that apply an update by booting into some sort of maybe strange looking minimal environment, maybe you see some kind of progress indicator that goes by and then it boots back into the familiar OS again. Generally speaking, those devices are using an asymmetric update model. If there's a problem while applying an update, like the battery dies, power is cut or something interrupted during the update, usually you can just reboot the device again. It will fall back into the recovery OS and it will try again until it's successful. Symmetric updates on the other hand are characterized by having two copies of the main OS. Sometimes these are referred to as AB update models because you've got two copies. We'll oftentimes refer to them as A and B. Generally one of those two copies is active. That's the version or copy that you're booted into at the time. Updates are done while you're running in that environment and then they're written to the inactive copy. If an update fails for some reason, you try to reboot and it can't load properly. Generally speaking, you'll fall back into the unmodified copy so that you haven't bricked your device. If this update is successful, and we'll talk about what successful means a little bit later, then either the system will retain that old copy so that maybe after some amount of runtime you notice a problem and you want to revert back to the previous version before the update, you can do so. Or other times the system might actually replicate the new updated copy to the other slot and if the first one gets corrupted, you can fall back to another version of the updated copy. This is version you'll sometimes see called seamless updates. Cuz unlike the asymmetric case we just talked about, usually these updates happen in the background. The system will download, they might even apply. Sometimes you're gonna prompt saying do you want to start the update now? Other times it'll happen in the background. And then generally you'll just have a prompt that says restart now or maybe it'll automatically restart and it'll just come back up in the new environment, but you might not even notice it happening. In both types of these updates, you'll see that there's some interfacing with the boot loader to decide which OS we're going to boot. In the case of asymmetric updates, we're almost always exclusively booted into just the main OS. Except for the rare occasions when we have an update and we'll do a one-time boot to the recovery OS and then back into the main OS again. Things are a little more complicated in this symmetric update case. After we write an update image, we need to tell the boot loader to load that image just one time on the next boot. We don't want to permanently change the boot order because there might have been a problem, there might have been a corruption. Something is gonna cause that new update we wrote not to boot properly. We don't want the boot loader to just keep trying to boot it again and again, so we tell it just once, boot the new environment. And then if that boot is successful, then we'll mark it afterwards and say, we know it's good, now change the boot order and make that the first choice for booting. If we fail to boot the image for some reason and we don't get to that step of changing the boot order, nothing has actually changed. The boot loader just goes back to what it was doing before and it boots the already running OS that you've been using previously. Now I want to take a little bit of time to talk about our use case. Some of these details drive some of the decisions that we made. So our recovery OS is a little different than what we talked about before. Before I mentioned the recovery OS has got a fairly minimal use case just to apply updates. Ours is a little more complicated than that. We use it to configure network settings, passwords on a newly minted system that you bring up. We use it to respond to network discovery requests. We have some management software that will broadcast out to the subnet and try and discover devices for you so you can see them and just have a click interface to configure them. We also have operations to erase an OS or reformat a user's data partition. And this recovery OS for us also serves as a fallback when something goes wrong in the main operating system. We'll talk a little bit about how things can go wrong in our use case in a little while. Our recovery OS was intended to update infrequently, like I said is commonly the case for most systems. But the reality has been that it ends up changing about every one to two years. That's been driven by bugs, because we have them. For security fixes, we'll find a component maybe that's included in that OS that needs to be patched. Because we're adding new features or also because of some compatibility requirements. Sometimes we want to make a change in the main OS, but our recovery and our main OS share a data store where we store configuration information. If we change something on the main OS side, we might have to retrofit something on the recovery OS side so they can continue to pass the data back and forth to each other. Our systems are often used in industrial applications. And so if we end up having a failure when updating the recovery OS that leaves the system unbootable, it can be problematic. Our customers might have these systems in remote locations where physical access is difficult. And if one of them gets bricked, then you have to go hands on with a USB key and plug it in and reboot and reprovision them. So that's what we're trying to avoid. But because we've had more updates than planned, we're increasing the likelihood of that affecting your customer. So another difference about our systems is that our main OS image is not fixed. On a lot of embedded devices, you might have a single firmware image you apply it, and then it just runs. And there may be limited configuration or changes that a user can make. That's not true of our systems. Our systems are somewhat general purpose to start with. Our customers then take them, configure them, write custom code and applications. And then that becomes their fixed personality for the device. So it looks maybe more embedded and fixed to their end user. But to our customer, it's more of a general purpose device. So our users can install software via package manager like you're used to doing on a desktop system. They can install expansion hardware. There are slots they can plug hardware into, which means we have to install drivers for those pieces of hardware. They can also install their own application. I mentioned in our case it's a real-time OS. So they may be running something in real-time priority. It's possible for them to write a real-time, maybe an errant real-time application that just starves the rest of the system for processor time. Thereby making the system unbootable. They can change all sorts of system configuration settings. And basically they have root access to these devices. So they can do anything that they want. So it's possible for the user to put that OS into an unbootable state. So what are our motivations in doing this? Up until now, we've been maintaining this separate recovery OS and the main OS. And like I said, the recovery OS wasn't that simple. There's actually a fair amount of logic going on in there. So we had two builds, two sets of configurations and source. Every time there's a build failure and others, twice as many builds, twice as many builds to debug when something goes wrong. We have two images to validate. So every time we run our validation, we've got two sets of validation to do. We also were looking for any opportunity to reduce our own code. You know, some of the benefit of open source software is that we're leveraging the work that other people have done, contributing to that when we can. So we were trying to get out of the business of maintaining our own logic for something that was already a solved problem by the community. And especially what I hinted at before, we were trying to make our updates fail safe. Our main OS wasn't such a problem, but our recovery OS doing those updates was a point of failure. So our requirements, what we just said, we want the updates to be fail safe. We also want to preserve some configuration information across when we do an update. It's not something where we simply just erase what was there and put a completely new copy of the OS. We want the system to still know what is its host name, what are its network settings, what are the passwords to log into it. Some of that is to just keep workflows continually working and just to make it so the user doesn't have to go and manually restore their settings every time they do an update. This is kind of what we're used to with a lot of our devices. Like your phone, when it installs an update, it doesn't forget its network settings or what your password is. We want to maintain the same kind of behavior. We also want to, within reason, try to keep the system always bootable. I say within reason because our users have admin access, root access, and they can literally format the entire disk, at which point there's nothing we can do to keep that system bootable. So that's a kind of key difference versus like our phones where generally speaking, unless you've rooted it yourself, you don't have root permission even as the owner and user of the device. So you're limited in the kinds of destructive changes that you can make, but that's not true in our systems. And we also wanted a way, if we could, for the system to be able to reset back to its original state. You know, our customers, some of them are doing research and development work. They're working on one project and then they want to basically reset back to a starting point and then start over again doing something else. Or they've got a system that's in a weird state and they just want to start over again. And so one of our goals is to give them an easy way to like undo all the extra software that they've installed, undo configuration changes they've made, or just overall reset back to the state when they last updated the OS. So one of the first steps we had to consider was how we were going to restructure our partitions to make symmetric updates possible. So we started by collecting all the files we needed to boot the main OS and then put them together on one partition. So that included the bootloader, the kernel. We have an initial RAM disk, which is a small root file system. We kind of talked about this earlier. It's used commonly in the recovery OS case where we booted into RAM and we have maybe some early stage boot that we do. And then the initial root file system. So this is the contents of the root file system as the OS or digitally installs. In this case, we have it in a compressed read-only format called Squash FS. And we put all those files together on one partition. The reason for that is that now we can have two copies of that partition. And then if we wanna do an image update, we just mount and replace all the files in one partition and now we have a new copy of the operating system. The contents of those boot partitions won't change while we're running. There's no reason to modify the bootloader, modify the kernel. That initial RAM disk is temporary, there's nothing to change there. And the initial root file system, I just mentioned, is a compressed read-only file system. So nothing's gonna change there. So this allows us to mount those boot partitions as read-only, which decreases the likelihood that a user's gonna accidentally make a change that's gonna make the system unbootable. And that also means that the inactive partition, whichever one we're not booting from, doesn't actually normally get mounted at all. We only mount it during the process of doing an update. So I thought it might be interesting here for a second to show kind of what the partitions look like before and after. The before isn't that critical, but it's just kind of interesting. Maybe commonly you might have had the recovery OS on one partition and the main OS on a separate partition. That's actually not the way that our system historically grew. We did have a bootloader partition, and then we had one partition containing all of our boot files. So it had the kernel for the recovery image and the kernel for the main image. It also had a recovery RAM disk. We had another partition that was dedicated to hosting that shared configuration information that I said could be accessed from either environment. And then we have a root FS file system. This is where actually the root FS of the main image used to be deposited when you installed the main OS. And then that's also where all the changes the user made went into that same partition. All right. So you'll remember that one of our motivations was to replace our own logic with an open source solution. The open source solution that we settled on is called Raoq. Raoq stands for robust auto update controller. That's the tool that we use to manage our boot partitions and their contents. The tool not only allows us to install the updates, but it also allows us to build the update artifacts, build our OS image into something they call a bundle. In Raoq terminology, you install a bundle into a slot. In our case, the slot is an entire disk partition. Doesn't have to be that way. Raoq will let you make your slot could be just a file or a device or a volume. It's up to you what you want that to be. Raoq itself interfaces directly with your bootloader. In our case, that's EFI is the bootloader on our systems, but just as easily Raoq can talk to Grub. It can speak Uboot, Bearbox, if you're familiar with that or a custom interface. Generally speaking, in all these cases, what Raoq is doing is it's talking to the bootloader from user space and it generally boils down to setting some variables that the bootloader will read at boot time to decide which partition it's going to boot into. And it's interface to EFI, the two kind of most critical operations are, it needs to be able to tell the bootloader to mark a slot as booting once, just as a test to see if it's gonna work. And then once we know it's good to modify the boot order more permanently going forward. Raoq is an open source project. That was one of our requirements to not be in the business of maintaining our own code. Raoq.io is the website and there's a lot of good documentation and stuff there if you're interested in looking further. So in order for Raoq to work properly, it needs some information about our system. So we provide that in a configuration file. The kind of key things that you'll see here is that we tell Raoq our bootloader type so it knows that we're EFI rather than grub or uboot or something else. And then we also define our slots. So by telling it that we have slot.niboot, it knows that there's a slot called niboot and there's two of them. One is .0 and one is .1. These are basically our A and B boot copies. We tell it what device, what partition on the system it's going to find those located at. We tell it the file system type and the name for those. And those slot information is what Raoq will use to figure out where to install updates when you install something. So we also need to create our bundles and the Raoq tool actually has a command line interface and you can run it directly and you can point it to a directory of files and it will build that into a bundle that it knows how to process. But since our entire distro is built by using open embedded and Yaakdo, we don't explicitly call the Raoq tool directly to make them. There's a Yaakdo BB class that's provided by Raoq and we just declare a recipe that inherits from that bundle BB class and then we give it some information and then it automatically generates our bundles for us. So some of the critical information you'll kind of see in here is you'll see this Nillert EFI AB. This is just a name that basically Raoq uses to know that any particular bundle it sees, it can look and see what is its compatibility and then it can look at the system configuration and make sure the two match. So if you inadvertently give a system an image that doesn't match, Raoq's smart enough to not actually try to install it and bork your system. And then the other thing down here is you tell Raoq what slot, what the name of your slots are called and it's called Niboot, which you'll remember from the configuration on the previous slide, we had slot.niboot.zero and slot.niboot.one. That tells Raoq that the Niboot slot with that name is the place where this bundle is supposed to be installed. So what Raoq does is it takes this metadata that we're providing and it builds what it calls a manifest that basically just records the type of bundle this is and how you're supposed to install it and it actually makes that part of the bundle and then Raoq reads that metadata out of the bundle when it's time to install an update. So once we have a bundle built, we can install it by calling the Raoq tool directly. We just call Raoq install and then we give it a path to the bundle file. That call will install the bundle into the slot. Raoq has built-in logic to know which slot is running and which slot it should install to next. And once it does that, Raoq automatically makes the call to the bootloader to tell it, hey, do a test boot of this slot one time. After we successfully boot an update image, then we have an init script in the new image that we'll call Raoq again and tell it, hey, the boot was successful so we want to mark this slot as good. We do that by calling the Raoq binary with a status command and say, mark it as good. You can also mark it as bad if you want to tell the bootloader, don't try to boot this thing again. And once we do that, Raoq basically tells the bootloader, hey, now we want you to permanently change the boot order and make this the preferred boot slot. So that brings up a good question of how do we determine that our boot was successful? That's up to you. We're currently treating a boot that gets all the way through init as having been successful and so our init script just always marks it as good. But you can do whatever makes sense for your own application. You could try to verify the contents of the file system, look at a list of running processes, try a couple of test operations and make sure that you're happy before you mark it as good. This is also a good place to note that you probably need to have some mechanism in place to make sure that a failed boot is going to be detected and that it's not gonna halt your system. So that probably involves like the bootloader recording when there's a failed attempt to boot a slot and then you having some code that will inquire of the bootloader, hey, was there a failure so that you know not to try and boot that again? Another important case might be to have a watchdog timer. So a hardware timer that runs and says, hey, if someone hasn't told me that they've booted successfully in the next 30 seconds, the hardware watchdog will just reboot the system automatically, which will cause it to fall back because we did a boot once into that new slot the next time the bootloader will go back and boot the slot that worked before. So you need some kind of mechanism in place to look for those failures to make sure the system just doesn't get hard deadlocked when it tries to boot the new update and then you're stuck. If the image fails to boot, then as we talked before, the EFI boot order is left unchanged and so the system will just automatically fall back to booting the old working slot the next time around. When Raoq does update a slot, it records some metadata about that slot in what it calls a slot status file and you can configure where Raoq stores that but that's the way that Raoq knows when you tell it to install an update, it can figure out which slot to install to automatically because it has that status information. It knows where you last installed to so it knows where you want to install the next time. You'll note when I called Raoq install and I gave it the bundle name, I didn't tell it which slot to install to because Raoq figures that out automatically. So having looked at how we install an update, I thought it might be interesting to look a little bit at how we initially set up these partitions on a blank drive. So when we first do our provisioning, we do it from a USB thumb drive that's running a custom little small Linux image. That image contains all the necessary tools as well as the main OS image, which is a Raoq bundle. During init, when we're booting this little recovery environment or shouldn't call it recovery, that's a loaded term we talked about earlier, but during this provisioning environment, the init script basically calls the provisioning code. The provisioning first creates the three partitions, the A partition, the B partition and the user partition. The boot partitions are of type EFI and then it creates EFI boot entries for those two partitions. And there's a tool called EFI boot manager that you can call to set up those entries in the bootloader. And this is the syntax for making those calls. So you basically tell it you're creating an entry, that's the dash C, the dash D tells it the disk we're talking about, dash P tells it which partition we're talking about, dash L gives it a label, and then the dash lowercase L tells it the location of the bootloader that it can pass on to. Then once we've done that, then we run Raoq to actually install our OS bundles onto both of the partitions. When we're first provisioning, we just have one image and we install it to both boot slots and this is the code that does that. It's kind of interesting because we basically need, we're not running in the normal context where we're running in one slot and Raoq will automatically find the other. We're not running any slot and so we need to give Raoq some hints about what to do. So we call Raoq with this override boot slot parameter and first we tell it that the current boot slot is A which then causes Raoq to install the payload to B. Then we call Raoq and tell it, hey, now our boot slot is B and mark it as good. Then we do the reverse. We go back and we tell it that the boot slot is B so that we trick Raoq into installing into A and then we tell it that we're booted from A and to mark that as good and this just gets Raoq set up with all the correct status information to know that the slots are good. And then finally we actually mark the A slot as active to make sure that it's the entry listed in the EFI as the boot next. The main reason for that is just in case you had a boot next set previously of some kind, we're trying to start over at a known starting point so we wanna make sure that the boot loader knows just start from A, ignore anything that you had set up previously. So having discussed the basics of how we implement symmetric updates, I wanna focus a little on some of the interesting aspects of our design. We mentioned previously that that initial root file system was a compressed Squash FS and that it was read-only. That's great for some systems where you don't need to store any state information. You can boot from that during the entire execution, everything is temporary and ran and when you turn off the system you just fall back to that original read-only copy. But in our systems we need the user to be able to make persistent changes. They need to be able to write new files. We want persistent logging. We want Linux logs to be present if something goes wrong or users wanna interrogate them and find out what happened. They also might wanna change files that are in that initial root file system, right? There's files in there that can figure network settings. If they wanna change the network settings we need to be able to persist changes so the system knows what its host name is and what its IP address settings are. So our solution here is to use a Linux overlay file system. It's called overlay FS. Overlay FS is a union mount file system implementation which is a fancy way of saying you can basically take two sources of files, combine them and then present them to the system as if they were one coherent file system. And the way this works is that it combines what we look at as a lower layer with an upper layer and then presents the merged result to the rest of the system. In our case, the lower layer is that read only initial root file system, the squash FS. That's depicted down here in the bottom part of the diagram is the lower layer. And for the example here, let's say that we've got like a dot profile file in our home directory and a dot bash RCS file there. The upper layer is just a directory tree. There's nothing super sophisticated about it. It's just a directory tree full of files. The overlay FS basically merges those two together. So in this case, our upper layer has two new files, file one and file two. And then it also has changes to profile, to the dot profile. So if we needed to edit it and make a change, that copy would be persisted in the upper layer. And then the merge result is up here on top and this is what the system sees. The system doesn't see the bottom two boxes. It only sees the coherent merged result. And like I said, this is all transparent to the applications. They don't know anything about the overlay FS being there. So the upper layer holds the new versions of all the files that are in the lower layer. Only if we edit them does it create a copy in the upper layer. But now we've got a problem because if files in the lower layer change, we can now have conflicts. So imagine that there was a file in the lower layer that just stored one setting and let's say the initial value was five and then you make a change to it and you say, you know, the new value is six and then you install an update and the new compressed squash FS says no, the value is seven. Well, now we've got a problem. The system thinks that the value should be seven. You've changed it to six. Which one should it be? There's no way for the overlay FS to resolve that conflict. So the solution here is that we just need to reset the overlay when this kind of thing happens. So the way we implement this is every time we create an overlay, we record a checksum of that compressed squash FS containing the initial file system. And then on each boot, we recalculate the checksum of that file and then we compare the two checksums. And if the checksum of the file we find there on disk doesn't match the checksum that was there when we created the overlay, we now know that there's a potential conflict and since we can't figure it out, we just wipe the overlay and we start over again. And so we communicate that to our users that when they install a new image that probably contains a modified root file system, they need to pause and backup any important data that they have and transfer it back after the system has been updated. So the result of this is that if we create that new version of the overlay, then we need to create, we need to update the checksum, because now we're using a new base, so we pick a new snapshot of that checksum and we record that. And also note that when we say we reset the overlay, we actually don't delete it immediately. The reason why is because the whole setup here is that you install an update, that changes the checksum, that causes us to need a new overlay. Well, we haven't actually booted that updated image yet. If it turns out that that updated image doesn't boot properly, we're gonna fall back to the old image, which has the old root file system, which has the same old checksum. So in that case, we actually haven't invalidated the overlay, so we keep it around until we've successfully booted the new update so that we can keep using it in the case where we fall backwards. So this overlay handling is actually done mostly in an init script in that initial ramfs that we talked about. That's where, so that init ramfs is the initial small root file system that's used before we load the full file system that's in the squashfs. And this is where we do most of this overlay handling. This is where we calculate the checksums and compare them. This is where we create an overlay if we don't have one yet or where we reset the overlay if a reset is necessary. This is where we mount the overlay and then right before, once we've got all the overlay stuff set up, we call the switch root, which is our way to basically change the mounted root file system and point it to the overlay, which is now ready for the system to consume. The overlay is reset like we just talked about every time we boot an updated image and that happens because the checksums don't match. The overlay is not reset when we, if we install the same image that we already had, then it won't change because the checksums still match or if we're booting the same image just again and again, the overlay isn't reset in that case because the checksums match. And it turns out one of our requirements was to be able to easily reset the system back to the initial state it was in when we first installed the update. This is our technique for doing that. While the system is up and running, the user could basically say, I wanna reset back to the last update. We'll create a little marker and on the next boot, that marker will instruct us to create a new overlay and that basically erases all the changes the user had made. Get sort of all their installed software, all their config changes, anything else they've stored on disk. So I thought I'd show here a little bit of the code that we used to actually set up the overlay. So again, this is running in the init ramfs before we actually load the main OS. So first you'll see here we create a few directories. We create an overlay slash lower and overlay slash upper. Those are gonna be just what you'd think they would be, the lower and upper layers. There's a work directory for the overlay. This is just a requirement for how overlays are created. We have to pass it an empty working directory to operate out of. And then the image overlay slash image is where we actually mount the completed overlay which has the unified view of the upper and lower layers. So here you can see where we mount the lower file system. We give it the option RO for read only. We tell it the type is squashFS and we point it at that squashFS file. And then we tell it to mount that at overlay slash lower. Then we create the overlay image. Here we tell it we're creating a mount of type overlay. We give it the options that the lower directory is overlay lower, the upper directory is overlay upper and we give it that temporary work directory to operate out of. And we tell it to mount the new overlay at overlay slash image. Then we've got this interesting line here that I had to think about for a while when I was preparing this presentation. I think when we first mount the user partition we mount it synchronously which means that all changes will happen synchronously and you'll have to wait for the operation to complete before the call will return. We want that because we're setting up the overlay and we don't want any calls to be continuing in the background past the point where we're trying to have the overlay available. So we call sync to tell the system to finish any operations and then we remount the partition. And then finally I told you that we do a switch root. We basically tell the system we no longer want the root file system we had before that small temporary initial RAM disk. We wanna switch over to the new overlay that we just created and then continue our init process from there. So switch root says use this overlay image as your new root directory. I want you to run S bin init as you're to start the init of the system and then we pass along some init options to the true init of the OS. So what we just talked about was how the overlay gets reset whenever we're installing a new update or when the user just says they wanna rewind the system. One of our requirements earlier was that we wanted to not lose all of our settings when we did an update. Well, when we reset that overlay we are losing all of the changes the user has made to the system. So we need a way to preserve those settings and we're doing that using some configuration scripts. So when we update an image we wanna preserve several things the target's identity and its settings like its host name, network settings, login credentials, SSH keys. So what we do is we run a script that saves all that data off. The main script actually calls a bunch of subscripts and each subscript is responsible for saving one group of settings. So there might be one subscript for the host name, another subscript for network settings, another hub subscript for login credentials. And then before we start to the update to a new image or before we erase the overlay we basically save that state off. We run the script and save all that saved data. And then we create the new overlay and on first boot of that new overlay we then restore all that information back in to the overlay. And what really for all that fancy talk what it really boils down to is just copying files usually. We copy etsy slash host name or we copy the contents of the SSH keys directory. But that subscript structure makes it easy for us to keep the script separate from each other and it makes it easy for us to add one more in whenever we need to. And then finally I wanted to revisit our requirement about keeping the system bootable as best we can. The system could boot normally but it could end up being messed up by changes the user has made. Because they can break the system we want to give a safer state to boot back into. This might be the state we would use to apply software updates. Maybe we don't want the user's real time application running while we're trying to actually update the other boot slot. We also want this to be a fallback in case the system doesn't boot normally. So maybe the boot loader or a watchdog timer detects that hey this slot that used to boot normally is not booting correctly anymore. We want a way to boot back to something safe. So we're basically gonna manufacture a way to do this using the overlay and using the transfer configuration scripts we just talked about. This could also be triggered directly by a user. They could from a running OS say hey I want to fall back to safe mode. We also have a physical button. You know a lot of times the thing where you hold the button down while you apply power or you hold down the button while you press another button to trigger a recovery. We have the same thing. And then during boot the system can detect that you held that key sequence and it can start a safe boot. So we're gonna leverage those features to do that. So our logic here real quick is we say are we triggering a safe boot? If the answer is no just mount the overlay in boot like always. If the answer is yes then mount the overlay. Run the script to save off the state. Unmount the overlay. Create a new empty overlay. Mount that new overlay. Create a little marker file in there that says hey this is a special safe boot. And then switch over to that overlay and continue booting. And when we do that once we're now in the main init script in the main OS we check for the existence of that safe boot marker file. If we don't see it we just exit and we boot normally. If we do see it we basically locate all that saved config information and we restore it. And then we skip all the other special first boot logic and we continue the user to the shell. So this gives us a clean environment but it still has the configuration information so the target knows who it is, it has its network settings, the user's login password still works. And we could even use this to go in the other direction and take any config changes they make in safe mode and then push them back into the main OS going the other direction. That's my presentation. I hope you found it interesting. Here's kind of a summary of what we talked about today. I think we're probably just about already done with time but if you have a few questions let me know or come talk to me afterward. Thank you guys very much for your time.