 OK, so it is about time to start. So hi, everybody. Thanks for attending. I'm Diego Rondini of Kinetics. And I will be talking about orchestrated Android style system upgrade for embedded Linux. If you want to download the slide, they are already there on the ELC website. So if you need any information or if you just keep something just loaded. OK, so in other words, we are talking about our experience with managing and rolling out software updates on embedded Linux in a similar way that Android does. So we will see why we wanted to do that way, so why we wanted to bring how Android works on embedded Linux. We will have a discussion about how Android does over-the-air updates. We will see the software, so the pieces that we have found to fit well in our design, both on the device side and on the cloud side. So we wanted to have a remote management of the updates. And we will see also the changes and the addition that we have done to this software and how we made it all work together. And last, we will have a brief demo showing practically how this all works. So why did we do that? We wanted a system on Linux to update SOC devices, so mostly ARM boards. We wanted the installation to be atomical. And we wanted to track several information about the devices. And we wanted to be able to manage the informations that the device can provide. So we will see more on that later. We basically wanted an update system like Android. So we used the Boundary Devices Nitrogen 6 for our work, but it's just reference. It's just that there are some implementation details that are useful to us, but our platform should work with any likely ARM board device. Maybe some small changes will need to be made, but we use that because it uses i.mx6 platform, so it's very widespread on embedded devices. It has a special thing that it has U-boot on SPI nor Flash. So even if you don't have your microSD card on, it boots, and you can use the boot loader. And the other thing is that we used two separate partitions, one for boot and one for root, which is quite typical on embedded. We must mention that we refer to the traditional single copy over-the-air updates approach of Android, because recent devices have started to use double copy. So we refer to the single copy. Basically, devices, Android smartphones that have more storage can use double copy. What we wanted to do was to have the biggest freedom to access storage, so we wanted to be able to write almost anything on our device, not just update the root FS. And of course, we wanted to run in Linux because it has additional fatalities with respect to, say, for example, U-boot. So let's see for a second how Android does updates in a single copy. Basically, you have two steps. The first step is the preparation of the upgrade. So you run in the regular OS, which is what you usually use on your smartphones. And then the device is rebooted to a special system, which is the recovery OS. Here is how it works. So as I said, you are in the regular OS, and then you get rebooted to the special recovery OS, which basically has one binary, which is able to read your update file. It verifies it, it unpacks it, and then runs several binaries and scripts. So here is in more detail how it works. So the device, the smartphone, for example, but it works even if you are using embedded Android registered to the cloud. When an update is available, you get the notification on the device. So you approve the download, so if you maybe want to download later. And finally, when everything is ready, you get the question if you want to actually install. So go ahead and install. The nice thing about Android is that it already has APIs to manage that. So there's an API which allows you to verify your update package before you install, so before you reboot to the special purpose recovery OS. And also, it has another API which tells the system to install the file. So it writes a special file that instructs the recovery OS to install that update file. So in the platform that we have been using, the boot script looks for the reset codes to decide when to start the recovery mode. When the recovery mode is started, a RAM disk is started, and the inner system of Android starts the recovery binary, which, as I said before, verifies your package, your update file, and it extracts it, and it loads the update binary. So the update binary reads your updater script, which is wrote in a special language, which is Android specific, which is called edify. And while it is running, it saves what it has performed in a log file. When it's finished, of course, it returns to the regular OS. Yeah, this is how the i.mx6 platform manages the selection of the recovery system. So when a special register, which is the one of the reset codes, is read, it selects if the value is the one that you want, it starts with the second partition, which is the one that is used in Android for the recovery system. So what is good about this approach is that it is quite small, so the recovery system is very small, and you can still run in the main OS. So you can have the Wi-Fi, for example, configured by the user, and you can access the APIs so you can check if the battery is low, of course, on smartphones. But you can check also the kind of connection you are using. So if you want to actually download the file, or the update file, or not. And the other good point is that the recovery system has no need to access network, so it can be very minimal, redone-ly, and isolated. Because everything is prepared before applying the update. So let's look at the two approaches that are the most widespread to applied software updates. The double copy requires you to have two copies of what you want to update. And generally, you want to update everything except the user data. So there are approaches that just update. There are ways to implement the double copy. We're just updating the root FS, but you probably want to also update the kernel. You need to cooperate with the bootloader here to decide which is the system that is active. So you have one copy that is inactive, and the other one that is active. And when you do the update, you want to switch from one to the other. The single copy on the other side has just one copy of your system. And it has a special, as we have seen before, it has a special system, which is just for the upgrades. In this case, what you need to do with the bootloader is decide if you want to go in update mode. So if you want to start your recovery system. This is an example of a good implementation of the double copy. Here, you can see that we have not only the root partition doubled, but also the boot partition. So you have, that's an example, of course, but you have your boot script, your device tree, dgb files, I mean, and the kernel and eventful RAM disks. So you have two copies, even of boot partition. And I think this is a good approach, because if you, like in this other example, if you leave out the boot partition, you risk that you don't have a copy of your boot partition, because this doesn't have a clear policy to update your boot partition. You can easily switch from one root fast to the other, but you don't have a clear policy to update the boot partition. This is a diagram that shows a simple approach to start with single copy. It is very basic, but if you want to start to play with a single copy approach, it is easy, because you start with your regular embedded Linux OS. So you have the boot partition and the root fast partition, and just add the special system recovery. So you have the recovery RAM disk added here. And when you select on the boot loader what you want to do, you are just actually telling use the same device tree, use the same kernel as the regular system, but then load my system recovery RAM disk instead of loading and mounting the root fast partition. OK, so let's see what's good about double copy, which is used in several systems. The main point is that it has fallback in case of failure. So if, for some reason, for example, power outage, the installation of the update fails, you just don't switch to the new system. You wait to have completed the operation. And it's also easier to implement with respect to the single copy, because it's symmetrical. The bad thing about double copy is that it's very expensive in terms of storage. So if you have a lot of user data, you probably want to choose this one, because your system probably is small. But if the application that you're running on the device is quite big, so the root FS and the applications included that you want to update are big, you have the size of your storage. So if you, for example, are running an embedded device with 4 gigabytes of RAM or 8 gigabytes of EMMC or 8 gigabytes, you have half. Also, you need to be careful when you have two copies of boot and root partition, because you don't want to mix those. So you don't want to use boot partition one with root FS 2 and the other way around. So you need to update the copies together. What's good about the single copy approach is that it takes very little space. So you can create a recovery system which fits something like 10 megabytes. You don't need that much. And that's all you need to have to update your system. Also, the other positive thing is that you are running in RAM. So you have nothing mounted, no mount point. And so you can write everything, of course, with the risk of breaking stuff. You can even overwrite your recovery system. So you need to pay attention. But basically, you can even go as far as writing a new partition table and rewriting your whole storage. So the best would be to have separate small storage for the single copy system. So you have a recovery which stays in a separate storage. This is not always possible, but it can be done. Of course, the bad thing is that there's no fullback. So if you get a power outage, the only thing that you can do is to restart again in recovery mode and try to flash again. So how do we bring the Android style to Linux? So we want a system that behaves like we have seen Android single copy. But we want it on Linux. So of course, we have looked to existing solutions. And there was a big part of that. It was already available. And we wanted to use software that already existed, because this is complicated. So you want to start with known projects. We start on the device side. So a good solution for building a recovery system is using software update. Software update is written by Stefano Babic. I invite you to attend this session tomorrow. You should be here somewhere. Yeah, hi, Stefano. Yeah, I hope I don't make mistakes. I will go quickly here. But basically, you have both a framework for installing your update file and demo modes to connect, for example, to a remote update management platform. So it defines an update file format, something that is already available in Android, because Android has its update file format, which is CPIO archives. It has several handlers to manage. And we can probably see better here. It has several handlers to manage, for example, UBFS images or row images, for example, doc devices or entire partitions. It can change the UBoot environment. It can write single files. And there are other possibilities not listed here, but it can handle a great variety of options. The other positive thing is that it can fetch your update files from different sources. So you can pick the update file from local storage, from remote file servers, for example, FTP server, HTTP server. You can provide your own built-in web server. So you have a simple web page where you can upload the firmware or the update file for your device, like you do, for example, on router devices. And finally, you can connect, as I said before, to a remote update management platform. Another interesting thing, of course, which is very important when you manage update files, is that it has the ability to sign the update files and check that signature. And you can encrypt the update files. So this is how an update file is made. So as I said before, it's a CPU archive. And it has the first file of the archive, a software descriptor file, which basically lists the components of those update files and how we want to treat them. So for example, here we have a partition that we want to flash in this device. So we flash it in this storage. And we have a script that manages, for example, pre-installation and post-installation actions. An interesting thing is how it manages the verification of the update files. So only the software description file, which is the descriptor of the update, is signed. But the software description is not OK. This is an example. The software description contains hashes for your files. So if you can trust the software update description, you can also trust the hashes that are in there. And if you check the hashes, then everything is verified. As I said before, you can also encrypt images. So if you want to protect the content of your update file, that's something that software updates already provides. When we started experimenting with the software update, we used our Wordpacks.io platform because we wanted to provide a simple way to update the device. In this experiment, which actually is in production now, we created a small update system, which is a new boot image. And we put there both the kernel and RAM disk. So we stored the recovery system in unpartitioned space. So before the partition table, sorry, after the partition table, but before the start of the first partition. So you can have this recovery system, even if you, by example, wipe your partition table. Of course, if you wipe even the space where the recovery is, you will lose all the system. But this is a good example of how you can use it, even if you destroy your partition table. I forgot to say that if you want to do the system in action, we have a technical showcase in a couple of hours. So just here outside. So the second piece that we're missing was a good remote update management platform. And we found a good solution with Eclipse IOT, early Eclipse Hockpit, because it is an independent project and it has a very good architecture to replace some pieces. So if you want to customize it your way, it's a very good platform. So the fundamental features of updates, the updates server component of Hockpit is the device and software repository management, the artifact content delivery, and the management of software updates and rollouts. So the backend has these features. There's anyway a user interface available, which you can replace and you can write your own. But there's one that comes by default. There are management APIs to manage users, groups, and tenants if you want to divide your customers in multiple data sections. And of course, it has protocols and interfaces to talk to the device. So the device has a clearly defined protocol to talk to Hockpit. I don't want to go into the details, but for example, you can replace some components. Here, for example, we have replaced MongoDB as an object storage with Amazon S3. So you can just write the simple piece of code that is missing and you customize it your way. Another positive thing is that you can scale and be full tolerant. So if you want to grow while the number of devices that you are managing grows, it's really simple to scale up. OK, I'll skip this one because we have a video about that. But Hockpit is also very good at managing rollouts, so dividing your devices in groups. So if you want to, for example, just 100 devices, run a beta test software and the other 10,000 devices run the production software, you can apply the update just to those beta devices, which is what you want to do generally. If you want to divide by location or any grouping that you might be thinking of, you can do. So there are several metadata that you can manage in Hockpit. So what did we add that was not existing before? We created an Android-like way to manage updates on Embedded Linux. So here are some of the new components. So the software update already existed, but we fit it into our Update Factory architecture. We wrote an Android application, which is actually an Android service to connect to Hockpit. We have implemented user management and tenant management server. And as I said before, we have customized artifact repository management and metadata repository management because we brought them to Amazon AWS. So on the embedded Linux side of things, we had some changes. We implemented the single mode in the way that Android does. So we set up the device to cloud communication. We tweaked the bootloader coordination. We added a new recovery partition, recovery boot script, a recovery RAM disk. And we made sure that the update installation feedback is provided to the cloud when your update is done. OK, so here is how it all works. So for example, you start with your regular OS. So you have your boot partition. You have the root fs that has software update running as a demon. And software update is configured. So set up to connect to your update server. And it has the device ID and the name of the customer or in general the tenants. And when your system connects to update factory, it gets notification of updates that you set in the Hockpit server. So when you decide that from a web interface you apply an update, the software update will receive that notification or better. Software update will look periodically to Hockpit and will download the .sw file, which is the name of your update file. When you have everything ready, you just reboot to recovery. And the RAM disk that is there has everything to install your software and go back to the main regular OS. So what we changed in the configuration of the demon is that we added the option to just download the files. By default, software updates just installs immediately what you downloaded. So for example, Hockpit tells the device that there's one update. It just goes, downloads the file, and installs it immediately, where we wanted to store them in a special place. So we added this option. The patches are not upstream yet, but you can find them in our meta update factory layer. The other thing that we had to deal with is that there's no easy way. There's no solution that fits every device to use to switch to an update mode. So if we want to go to recovery, we need a solution. Of course, every board can have his own solution, but we implemented it that way. So we looked at the Distroboot GMD way of setting up the Uboot environment, and we changed the way that the boot partition is selected. So we basically run a command before the usual Distroboot GMD command, and if the regular OS has set the boot mode variable to update, then we tell system, OK, you should go to the third partition and look there. Otherwise, you just do the regular boot mode. Another thing that we added in our Yachto layer is the ability to create this special recovery partition. So now Yachto has WIC support, so you have the ability to just add a line in your descriptor file, which describes the partitions that are in your system. But we had to create a new plugin called files copy to populate the contents of the partition. We also had to add an FS tab entry so we could mount the recovery partition directly from the main regular OS so that you can save the update files there. OK, we also had to, of course, tell the boot script that it needs to load the recovery RAM disk, and we had to add some file system utilities. But the main works come from meta software update layer because this single RAM disk, single image RAM disk, is already provided by that layer. And of course, when it's all done, so you have applied the update, and we'll see a demo in just a second, the software update demo reads the use state loader variable and provides feedback to Hockpit. So does it work or does it not work? So what we want to add is support for other boards, a way to start recover manually. So for example, your internet connection is broken and you want to update from USB, a way to separate the update files in a different partition, for example, Android saves the update files in the cache partition. And we also want to have the ability to update the recovery OS just from the regular OS. So you have at least one of those working, which is not a fullback, but it helps in case of problems. So the main target here was having an embedded Linux system that was behaving like Android. So we wanted to have good integration with Yocto, and we wanted to provide remote management as a service. And we wanted to have the freedom to write almost anywhere on our embedded device. So here in the slide, you will find some links. You can download them. And I want to show you a video. OK. So this is a video of Hockpit and the device working together, so it's an overlay of the device. Let me briefly show you how the interface of Hockpit is. So in this panel, you have the list of devices connected. We don't have any device yet. Here you have the updates that you have provided. And then you want to install. And here is the action history of what is going on with your device. Sorry. OK. So here you can see that we are missing a couple of things that we will have after the update. For example, we are missing this command here, so no information about the update factory. We are at version 1.0. And now you can see that we get the notification that software update has connected to Hockpit. So a new target device has been created. You apply the update to your device. You confirm. So you select how you want to apply that update. And you get the assignment is ready. So the next time that software update will connect to Hockpit, it will find that there's an update available. So here you are. It has downloaded the file. And now the update is running. And we will reboot into the update mode. Let me see if I can stop it. OK. Yeah. You can see here that we have changed the boot loader. And update mode is now selected. So we are running a special recovery system. So now the recovery system will install that those update files will tell us that they have been flashed successfully. So yeah, we have changed some web pages, but also the iOS version installed the update factory info binary. And we are now returning to the main OS. So as soon as the boot is completed, you will see that here the device, this is an embedded Linux serial console, the device will tell Hockpit, OK, I'm done. So you will see a green tick here. OK. It's going on. Should be ready in a couple of seconds. Here you are. And yeah, now here you are. The feedback provided to the server. And you can see that now the binary is installed. So we have information about update factories. So we know which server we are connected to. And we will have also a quick demo about rollouts. So here you are, how you manage, for example, installing on different counters in different groups. Here you select the update that you want to apply, the group of devices. Of course, I have already created the metadata to divide devices into groups. Here you can see that we select an advanced group definition. So we select the devices located in Germany and call that group Germany. And we have selected another group, which is devices in Italy. And you can see that we have split the devices in two groups. So you have now your rollout, which is Europe 2 ready. And we start the rollout. And so only the first group, which is made of one device now, is started. So we go back to the list of devices. And we see that the first one has started. The other one is scheduled, but it's not already going. So we go back to the rollout management. And here we can have update on the status of the installation. So basically, in a couple of seconds, you will see that the group of devices here in Germany has finished. Here you are. And as soon as this is completed, so Germany is done, you start in Italy. This is an example that shows that you can set error thresholds. So if you, for example, deploy an update to 100 devices, you can start with just 10 devices. And if, for example, 10% or 20% fail, you can just block the next group. So you don't want to apply a broken update to all the devices. You can just start with a small group and check everything is fine there. So yeah, I am finished. I just want to thank everybody in the team. So everybody in kinetics, Eric Nelson, also Gary Bison, Gabriel Howe, Amity Pundir, and everybody that is working on the software update, Stefano, and on Hockpit on the Eclipse IoT Foundation, and also my daughter for the drawing. So yeah, if you have any question, just raise your hand. OK. Yeah, the microphone. First question, I haven't understood why you need to improve everything for a manual update because a local update is already inside the software update. So updating from a USB or from a local storage is already part of a software update mainline. Maybe you think something different. OK. The second question is, I see from Logger that really you take the file, the SWU file, you store locally, but when nothing is running in software data, when it is restarting, the verification is done after rebooting the system. This can be quite dangerous if someone changes also one byte inside the SWU. The system reboots, try to install a software data before or later, we will find that something is wrong, that is corrupted or was changed. So the SWU is not authenticated, verified. And this can break your device because after that, you are still in the RAM disk, so you can put another software with a software data, but you have no communication anymore with the OCP server. A proposal could be that we implement some kind of dry run in software data, so that you get the SWU file, you go run completely in the software data chain, but without installing anything. So let's say at the end of the file, it's just copied, but all the handler are running, and the verification, the decryption is done, because also you can never do the decryption. So you are sure that after reboot, your software should be really installed. OK. So I will start with the second question. Yeah, so you are absolutely right. That was for the demo, so for the video. But of course, as we have seen here, Android has APIs to verify package signatures, so update files before we're installing. And that's what we want to do also in Update Factory. So it's something that, for sure, we want to implement. And the other thing was about installing with USB. It's probably what we wanted to do to have some kind of factory restore. So when I was talking about the work, we wanted some solution that was some kind of an alternative to doing a factory restore. So it's something that you can use only on, that you want to use only in very special situations. I don't know if I have answered your question. Maybe we can talk later, because I see that we have just four minutes. If there are other questions, we have a couple of minutes, but I think that the time is almost done. OK. If I thought correctly, you're downloading the update to a cache partition so the rescue system can read it from there. In my experience, I'd expect that some of the space for the recovery partition and the cache partition is usually enough to just have a complete copy of the root file system as well, and then just use the AB image system. OK. I don't have the slides anymore. But yeah, actually, we want to implement both. So you just have the option to use both the recovery and the cache partition. We actually put that in the design. But at the moment, we just use the recovery partition. So the recovery partition has both the ram redisk and the storage of the update files. But of course, if you have other storages available, so I think that he wants to come up here for his speech. So if you have other available options, yeah, you can use them. And if you separate the single copy, so the recovery ram disk from the rest of the system, it is good because you have no fallback. I think we should close. So thanks, everybody. And if you have questions, just come here and ask.