 So, I think we can start now, everyone seated, everyone in a comfortable position. Welcome to my talk. It's called ROG, or behind the scenes of an update framework that also gives the impression that we will hear something about how ROG behaves, how we develop the update framework, and some of the internal design decisions we did when developing ROG as an update framework. So my name is Enrico Jerns. A short note about me, I'm an embedded software developer, I'm an embedded software developer and the co-maintenor of the ROG update framework, so hopefully I know what I'm going to talk about in the next 35 minutes, and I'm working at Pengatronics since 2014. For those who don't know, Pengatronics, it's an embedded Linux consulting company and we do embedded consulting and support since 2001 and do a lot of open source work and during these days we made over 5,000 patches in the Linux kernel. So, a short introduction or general description of updating for those who are not that familiar with updating. From the overall perspective, we have an update system and it looks probably similar to this one. We have somewhere here our build server where we generate our images, our artifacts that we want to deploy later. Then we pack it into a format that is for updating where it's bundled somehow. We can sign it or should sign it and then we upload it into a network, a cloud or something like this and then the devices in the field download this update either by polling or they're notified somehow and then they install the update on each individual device out in the field. But there's not only these, I would say, modern or network approach, but in many scenarios we also still have this approach. So we haven't a network connection or are in an area where we can't have a network connection or because of security issues and this is still a valid scenario in many cases to instead use simple USB sticks or SD cards or something like this for updating. What we're going to talk about today is mainly a little about this side, the update artifact generation and signing and also the main part about how to install the update on the target about fail-safe installation of the update on the target. So what does fail-safe installation mean? So the key to fail-safe installation is atomicity. This means that the update is either fully valid or fully broken and there's nothing in between. So if someone plugs out the cable accidentally then the update is either fully operating or it's fully broken. And there's no state in between where I can break the device with unplugging your cable or something like this. And short on how this works in general is we have a redundancy, the simplest case is AB redundancy where we have an active system here and a B system and an inactive system here that are of equal size. So these are petitions normally of equal size and we have the bootloader here as the switching point. And when I now start the update and running from the A image, the initial thing I do is deactivating the B slot in the bootloader so that we don't boot it accidentally. Then we write the update to the B slot and if we have done writing the update and verified the update as a very, very final step we notify the bootloader to switch to this B petition. And this way we don't end up in a situation where we have active system that is 50% installed or something like this. So what I'm going to do today about is mainly image-based updating because RAUK is an image-based update service in the past, I think the past four or five years. Some update solutions, open source update frameworks came up. Some words to mention are for example, Mender, RAUK, of course, SWO update and there are also some more like that what we have from Mendeut. And on the other hand there are also update systems that are not using these image-based updates so they are not replacing the entire file system like OS3 does or like Balena does which replaces Docker images. So let's hear about RAUK, what RAUK is and what we have RAUK designed for. So for those who wonder what RAUK stands for, if it stands for something, yes, it stands for Robust Outdoor Update Controller, the best name we had those days. And yeah, it's basically an open source update framework and it was developed by us because we ourselves had many projects in our company where we over and over developed update system based on scripting and specific solutions for each customer reinvented the wheel every time and then back in 2015 we decided to develop a framework for this that handled the basic things and it allows us to go beyond this point that you can reach with reinventing the wheel every time with scripting and so on. And nowadays we arrived at RAUK version 1.2 that was released two days before or just before ESE and RAUK itself is licensed under the LGPL version 2 which allows you to freely use it in your project also potentially as a library if that's required and we have a steadily growing community of now about 50 contributors which in total made 1,450 purchase and despite basically RAUK works of course, there's still ongoing development, we have to fix things, we have to adapt to new technologies and also add cool new features that are required for new storage, new scenarios and also like, yeah. So one of the most important things are to mention the design goals behind RAUK. So the key idea of RAUK was to develop a generic declarative framework which solves many use cases. So we saw that many use cases are very individual that depend on the actual board that is used on the storage that is on the board. Sometimes you have AB recovery but on constraint devices you need a recovery scenarios or on more, if you want to pay more attention on safety than you have for example also AB recovery and the idea was to develop a framework that covers all these scenarios but not with scripting just with configuration and instead we also wanted to limit the complexity, it's complex enough to handle all these cases. So what RAUK focused on is actually installing images on the target, it's not an update service, not a boot load or something like this, it just limits on installing images in the right way. And one of the initial design goals was also security, so it's mandatory in RAUK to use assigned images and RAUK always verifies these images during installation so that you don't have unauthorized initialization on your target. And of course as it has to be fail safe it also has in general to have a robust design so we take care for error handling, use shared libraries wherever it's useful to not reinvent the wheel for every and each functionality. And also one design decision we made is that whenever possible use subprocess calls over library functions because then we can assure that if the update process or something like this crashed, if the writing crashed then we get an exit code from the subprocess that crashed and an error information but we don't crash our update itself and so we can again handle it. So when you get the code of RAUK you can compile it first of all to a host tool, I have to check out where the laser pointers. As a host tool and the host tool cares for creating, signing and inspecting update artifacts and the other side when you compile it for the target then you gain a target service and a target tool so this is the actually interesting part that we now will focus on. This is the part of the target that actually handles the installation. So this is a rough overview over the structure of RAUK so we have this update handler core here that I will describe in the following slides a little and interface for the bootloader below and for the application or integration with the application as we are a user space library we have the user space inter-process communication so we use debuffs for this and also the command line interface of RAUK itself uses debuffs to communicate with the services something worth noting because it is sometimes unexpected because people change something and expect when they call the command line the command line tool that the changes take effect no because the background service is still running and hadn't noticed the changes. So yeah as a utility library we use GLIP this is in most modern systems already anyway and yeah for installing for copying for interacting with the bootloaders we as I said before use sub-process calls instead of libraries. So one of the key tasks of RAUK is to have an idea of what the system looks like so how the redundant layout of the system is. So for this we have a configuration in the root file system for RAUK that translates these device view where we have devices and petitions to a setup that describes slots so in RAUK everything that's updatable is a slot and we transition this to a description of redundant update scenario for example here we have two root file systems and two application file systems and you have here a so-called slot class classes describe slots of the same purpose and we have instances of this class and this description allows us to set up very flexible scenarios for example yeah AB set up here but also a recovery or AB yeah this actually AB recovery here we can also specify bootloader slots that handled outside this scenario and now this gives us the ability to specify in the update artifact itself not exactly where we want to install the image to but just for what kind of slot classes is so in the update we can say okay we have an image for a root file system or we have an image for an application file system and now it's the task of RAUK to find out to which of these slot groups I have defined here the update should be installed and one key requirement for this is that we have to first of all know which is the active slot because we don't want to override the active slot and this is when we know that one of these slots is the booted slot then we know this slot group is the active one and then we can be sure okay we can update our image to the active slot part yeah so this is the key concept behind behind the slot description of the slot handling in RAUK and to show just shortly how the slot detection works there are two basic techniques the first of all is that the bootloader explicitly says okay I've booted this slot and that slot so this is done via the kernel command line and the bootloader just says here okay it's the slot for system zero and the configuration has the matching part where the boot name that it expects from the bootloader when it selects this exact slot that it gets so whenever it's possible and when you're capable to script this in a bootloader or somehow then this is the preferred approach in some scenarios this is not possible we cannot hand this information from the bootloader because we can't adapt it or something like this then the fallback is that RAUK XMINES the root equals argument in the kernel command line and tries to match it to one of the slots described in the system configuration so what this enables us is to also do introspection on the target so we have this RAUK status command that prints some base information here at the top that are left out but the interesting part is that you get an overview over the slot over the redundancy setup of the system and you also see which are the active slots here currently which are the inactive and also some more like the status from the bootloader and yeah information which device this actually is and so on so yeah this is what this descriptive view of RAUK enables us to tell you and now let's have a short look at the update bundle format so inside an update bundle as we call it there we have the images that you want to install and a manifest file that describes the purpose of each MSO it describes that these root images for the slot of class root and so on and the bundle itself is a squash file system we decided for squash file system for two main reasons the first one is that it is mountable so you don't have to extract your bundle when it comes for example from a USB stick and the second one is that you gain compression so normally squashFS is compressed and so you don't need to compress the images that are inside and it also allows to simply append information without being unmountable so what we do for verification is we append the signature here and the signature size for finding the start of the signature but that's just some detail and as a signature we use the CMS format which is similar to what is in S mine for emails and yeah the basic encryption method behind route that it uses is X 509 so we have the full capabilities of a public key infrastructure if you want so here's a short view how this actually looks on the targets on the right side here we have a system configuration where it's yeah first of all on the left side there is the matching bundle manifest and in this generic part here we specify some base information like a compatible that we also have in the manifest this is more some kind of sanity checking that we don't install the wrong images for the wrong target and then we have the key ring information where route finds the key ring that it used to verify the bundle to install and then down here follows the slot set up the slot configuration what was shown initially in the picture so now the drug knows where to install the question is how to install so how do you actually copy an image to a slot and we have a technical there or a handling codec it's update handlers so an update handler basically comes from a combination of a slot type we give so slot type can be used for UBFS UBFS and or FAT and this is as you saw before in the system configuration and then we have an image type the image type is derived from the file ending in the bundle this can be for example X4 a tar image or something like this and based on these two information Rao has more or less a matching matrix where it can say okay if I have an X4 image and an X4 slot so how to install that it's simply a raw copy of this of this image to the actual device despite Rao is basically an image update handling it can also handle tar archives which are more or less images no real file system images but if we have the information that our target slot should be this way our target slot should be X4 and we know that we have tar image to install then the matching update handler now is okay I have to call mkfs.x4 on the target on the target slot mount this target slot unpack the tar image to the slot and then unmount it again and this works also for FAT petitions with tar and potentially UBFS where you call mkfs UBFS and so on and this gives some flexibility between what's in the bundle and how it's installed. So one other important thing as you saw in the beginning for handling redundancy is interacting with the bootloader so there are two main reasons why to interact with the bootloader the first of all is switching the atomic region on or off so what you saw in the initial description so when Rao starts installing so actually writing the image it deactivates the slot it writes in the bootloader and when it's done it reactivates the slot and gives it the highest priority so that is the next slot to be booted this is one use case and the other is mainly for fallback handling that you see down here it's normally you do fallback handling by the bootloader that you commence a counter then you start the system and then you invoke Rao can say okay I've booted successfully at some point and if you boot several times and you don't get this reset of the attempts counter then you probably hit a watchdog or a broken system and then the boot counter from the bootloader reaches zero at some point and the bootloader switches to the other redundant slot back so that you are still in an operating state so these are the basic methods we need bootloader interaction for and we've abstracted it in Rao in a bootloader interface with these basic methods so mark good mark bet for activating deactivating a slot and mark primary where you say okay this is the next slot to be booted so for example after an update and then we have an implement implementation or a matching of this for several bootloaders at the moment we support barebox, uboot, grab and uafi and this ending basically boils down to interfacing with a bootloader with some storage on the bootloader which is for example an environment in grab or uboot or the barebox state framework in barebox or simply uafi vars in uafi and there's in each of these bootloader sizes some predefined logic for doing the actual slot selection or you have to do scripting as it's required for grab or uboot and yeah basically what Rao uses to interact with it is the user space tools provided by the bootloaders for example barebox state for barebox, fw setm I think it's called for uboot or grab and for grab and so on so now that we know how to install and how to install face safe we come to authentication so basically when you want to install an update from a verified source you have two options you either have a trusted transport like for network you have TLS for example where you can say ok the source where it comes from is trusted or I know that the pass to the source is trusted and the other option is that you have a signed artifact where the artifact is signed and you have an untrusted channel and perform the verification of the update on the target and yeah well this is valid for the network case if you have for example USB stick update and this is something we need to support with Rao then this is an inappropriate method so what we do in Rao as you have already seen before is that we do verification on the target here so we have signed bundles in Rao we use as a crypto library open SSL 1.x so mainly 1.1 as 1.0 deprecates I think in December and using the X 509 public key infrastructure standard allows us to use everything from a simple self signed certificate where you have just basic trust but you can replace or revoke keys to a full blown public key infrastructure that I also have the possibility to revoke keys in our key ring or replace keys when they're after some time or if they got compromised or something like this together with authentication I also want to mention some of the signing features so this is more on the host side of Rao as there happens a bundle creation so what's possible with Rao is for example re-signing a bundle so if you have generated a bundle with a development signature and then you tested it over and over and you say okay this got to my process and now I want to deploy it on the target or in the field without actually changing the bundle then you can simply replace the development signature with a release signature this is simply handled by Rao and then you can install the bundle with the release key on all your devices that are out in the field without touching the content of the bundle anymore or what you can also do is placing intermediate certificates in the bundle that you need probably to close the gap between the root CA that is in the key ring of your target and then the signature that was used to sign the bundle the mechanism also potentially allows to have multiple signers so if you say okay I need a minimum of two signers for a bundle so that the update tool acceptors this is also potentially possible and what Rao also has support for is PKCS 11 so you can use an HDSM for example a nitro key to get the keys for signing the update bundle and further apart from what you have in the system configuration in Rao you also have more customizing things you can for example place scripts in the system in the bundle itself that handle some custom installation steps or that do some post or pre installation operations this is what we call hooks so hooks is always in the bundle but you can also have some predefined handling on the target some scripts to be executed as post installation pre installation so for whatever you needed these are called handlers in Rao and if that should be required for some reason we also have the ability to fully replace the default installation handling so everything what I talked about with slots and just use Rao as a container assigned container and then do full custom handling inside Rao so these are the very basics now I want to dive a bit in some details and some point that often come up or some new features that are in Rao so one of the questions that often come up is how does Rao integrate or how does updating in general integrate with verify boot and the good news is that it's mainly autogonal to updating so if you have for example as here a deam verity approach then you have hash tree but this is generated offline on your host so what you basically get to install then is just an image and this boils down to a normal image installation and so the update itself is not affected from this if you have on the other side something like deam integrity then this won't work because then you have the device map a layer that calculates the text and the drone and so on and but then you can use the tar handing of Rao because then the kernel is installing the update or copying the files and then it generates these required text and so on during installation itself so these are just two examples so you don't need to care about verified boot for these scenarios these will perfectly integrate within an image base or tar based update system so we've talked much about how to interact with the bootloader but yeah maybe you also want to update the bootloader and updating the bootloader is always a critical operation because the bootloader is the part of the system that does the switching so for the bootloader you don't have a fallback and if you break your bootloader you break your device so this is often what people say oh I don't want to update my bootloader it's too risky it is risky but what we can at least achieve is atomic updating of the bootloader and this is also supported by Raoq we have at the moment two cases where it's supported it's a master boot record update that I will show this is a general approach matching all devices it's the second one is for EMMC this has to be supported by the bootloader we will see and the other approaches that could be done but not yet implemented is atomic update of the bootloader in NAND at least on the IMX6 platform so let's have a look how atomic bootloader updates for EMMC work and in EMMC we have two dedicated boot partitions boot zero and boot one these are outside of the normal user partitions so these are fully independent not all so only in the EMMC standard so an SD card doesn't have this and then you have a register in your EMMC where you can say okay I want to boot either from the user partition that we won't care about but we can either boot from the boot partition zero or from the boot partition one and the target device or the ROM loader of the target device has to support using this register so it's not valid for all cases but if it does then you can do again what it showed in the beginning you have your bootloader currently running from this slot and when you update your bootloader you've righted to the currently inactive slot and if you're done and are sure that the update was successful then you switch the bit in the XCSD register and then you are booting the next time from the boot zero slot and this way you can make your update atomic and the same also works for a master boot record and this is like the independent from the actual loader because it's implemented in the master boot record so for devices that boot from the first partition of a master boot record we have the support in Raoq we have to say okay we have redundant set of or region defined for the first partition and the entry in the master boot record points to one of these entries and if we now want to install an update or bootloader update then Raoq writes to the currently shadow inactive partition so or memory region more or less and if we done writing this then we simply change the entry in the master boot record to point to this previously inactive region and as the EMMC bootloader support also before this all points both down in Raoq to a simple boot type here you just say okay my slot type is boot mbr switch and you say for which device and the rest is handled by Raoq automatically so we implemented this as we often run in cases where customers don't want to update the bootloader as it's tricky and these two approaches give a good argument for saying okay it's still risky but you have the chance to update or to perform the update at least atomic and when it comes to network updates you typically run basically in constrained environments in two problems you have an update that is too large so if your connection is a little slow then you have to transport a lot of data over the network that would take much time and maybe also much money and you also need to have some temporary storage on your device where you start the artifact before actually installing us so what one actually wants for this is two things streaming and data updates and where we wanted to support this in Raoq there came up a tool called cacing from system D universe that perfectly fits these use cases and I shortly want to show how this is used in Raoq so what cacing basically does is it has a chunking algorithm that splits up a block device or a file system into reproducible chunks of similar size and creates an index from this and the index describes the order in which the chunks should be taken and installed to gain the original image back and for Raoq we change then the slot to not contain the actual image but only the index file as you can see here and then we have the chunk source so the individual chunks compressed somewhere on the server and what Raoq then does for installing is it runs the same algorithm on your currently active slot and calculates the chunks and stores references to these chunks and this is called cacing and it's called in a seed source these are basically sim links and when it should install the update on the target Raoq scans through this index files or cacing goes through this index files using the seed store and every chunk that it can get from the seed source so meaning that it can copy from the currently active partition it moves to the currently inactive slot for updating and only those chunks that are not on the seed store need to be downloaded from the chunk store and this gives both streaming support and delta updates so this is very cool so to close just some final notes on how to integrate Raoq and what it looks like when you actually use it so it boils down to a very few components on the host side you basically have the host tool for generating the artifacts and some utilities like mksquashfs for generating the squashfs and on the target side you need to install the service and the most important point is the system configuration that says okay where is the key ring where are the slots and so everything and then you need the crypto components so on the target you need a key ring so that Raoq knows with which key to verify against and for signing the update on the host system you have a key pair consisting of the private key and the certificate and on the target you also need the utilities like I said mkubfs or utilities for writing images to none depending on what you actually set up this so this is all that is basically required and the rest is only configuration in the system config so if you want to use it in the system we have support for the most common Linux build systems like Yocto where we have meta layer, meta Raoq where Raoq is supported we have built-in support for pdxs and also basic support in build route and for interfacing with an application there is this deepest interface that allows for example triggering updates but also allows to gain status updates from the system progress information during the update so everything you need to show it in your custom user interface and there also is an example project where you can see how this could work this is called Raoq cockpit what it basically does and you can also see it as a technical showcase later this day it's on the one-sided interfaces via the Raoq debus API and cares for updating here and gets the status and everything from Raoq and on the other side over network it interfaces with Hockpit which is a deployment server one of a few open source implementation of deployment service we have and yeah then it steadily poils at the Hockpit and asks for an update and Hockpit says okay there's an update then Raoq Hockpit notifies Raoq to install the update and yeah this is basically a Python script there's also a C implementation but not yet part of the Raoq organization so yeah this is basically gives a rough overview of what you need when using Raoq for your target so this is all thank you very much for your attention and if I have time left I'm not sure you may have some questions ask some questions or you can come later to us at the technical showcase also and discuss with us if you have more in-depth question so thank you first of all yes do you need the microphone? Thanks there is still a thing not very clear for me is Raoq a daemon or a program you have to invoke by yourself or composing it with your scripts? We are basically a team so Pangotronics is the company and Raoq is one of the approaches that we support and I am the kill maintainer the maintainer also sits here and yeah there are parts of a company that are involved but there are also many outside contributors that are involved in the project Actually my question was does Raoq run as a daemon or do you have to call it when you want to install an update? Then I got your question too long sorry Raoq runs as a daemon on your device so normally it's not running necessarily running all the time it can be activated via Debus activation so that it only starts when you start installation but normally when you don't install anything then there's nothing much to do for Raoq but you can potentially run as a service infinitely on your target here so that's all that's the question Can you hand the microphone or thanks Do you handle in any way migrations and when I'm asking that how do you handle migrations in terms of where they run on the new operating system on the old operating system stuff like that I know you have hooks but I assume all those run in the old environment So you're talking about migration of data or? Yeah basically let's consider it as data So the idea of image based up basically is to exchange the entire system in a defined state so it's the very state that you tested so there's no migration in the application part but yeah the migration of data is sometimes required mainly if you have for example redundant data slots or something like this and you have to move the data from A to B but your data is mainly depending on your application so we plan to support basic migration of just copying data from A to B that could be supported by Raoq but if you have to migrate data to a different format or something like this this is something that is very application specific so this would be something you have to handle in your application so this is not the task of an update it can't be solved in a generic way Okay so just to make sure I understood the hooks in your case the one that you mentioned they run in the environment of the old operating system on the active operating system right? Yes right but you can install to the inactive system with them so you can use it for migration if that's a question It's probably half of a question half of an answer Yes We maybe when I understood it correctly we had a similar problem in past two where we had a in the field system which runs on a Debian based operating system and we wanted to replace by a Yachter operating system and the solution was well, moving from a USB stick and have the updater on the USB stick is this probably something supported by Raoq? We have an updater of the USB stick which updates the EMSC in the disk with the image deployed on the USB stick So it depends on what you mean by updating So it's not for replacing the partition table or something like this So if you want to change the partitioning that is something that is not a task of Raoq So Raoq only installs five system images to existing partitions and if you want to migrate from a Debian system then you either need to use the update mechanism that is already in the system or install Raoq into your Debian system and let it handle the replacement So Raoq basically is independent from what kind of Linux distribution or something like this you use but you always replace only the root file system, application file system and so on but you don't do any repetition So it's not possible to repartition a memory a flash memory using Raoq as an infrastructure No, this is not what it's intended for So anything else? Otherwise I would again say thank you to everyone for attending this and yes, have a good day and many interesting talks at the next days of the conference