 Okay. Hello, everyone. Welcome to my talk on software update for Zephyr today. And thank you very much for joining. And to get started with this topic, software update is the most essential component of any software development cycle because today when we deploy a WordX version of the software into the field and then we see bug fixes in the latest stage or fixing things in the security level or changes in terms of security, then we update the software over the air from remote by managing the devices remotely as like managing the devices remotely. So how things will happen in terms of software update in Zephyr is the thing is the primary concentration of this particular talk. So let's get started. So firstly, a little bit about myself. I myself, Parthiban, working as a software engineer for Linux at Linux or myself do most of the concentration in development of embedded Linux varying from board break up or Uboot kernel development and consulting and the October developments are related things as in terms of Linux kernel for custom board bring up or for development platforms for various customers and also development of Zephyr recently in various customer platforms as well as board bring up and various consulting as well. So I myself live in Berlin working for Linux or people who with renomus or mostly distributed across different parts of the world already. So to get started, this is our today's agenda about software update in Zephyr. So as a whole part, this talk is actually split into two major parts like usually like systems in Linux like software agent in Linux like SW update or Mender does the operation of software update by downloading the image and as well as installing into the primary storage medium like NAND flash or EMMC or ST accordingly. So the whole operation of downloading and installing the image is completely carried out by the software update agent in terms of Linux but in terms of when it comes to Zephyr or the microcontroller platform here they're speaking about this operation is actually split between two different layers. I would say so that's how the talk is actually split. So the download part is actually handled by the application software which is actually run on top of Zephyr which is like there are available solutions like update hub or some available solutions like MCMGR and so on. And the installation part or the main part of installation is actually carried out by the MCU boot. That's like a bootloader for the Zephyr platform. So this talk will speak about the MCU boot in detail about how it is going to install the software from the storage medium which is downloaded and then the second part of the talk will explain in detail and also I have some couple of demonstrations to show based on a development board how to download the software into the memory storage using various mechanisms or various possible solutions which is available with Zephyr today. And also we discuss some future works which are planned for the next step. So let's get started. So MCU boot is actually the secure bootloader for Zephyr and Minute, Apache Minute real-time operating system which is common. So as a bootloader which is actually this is not bound to any hardware platform or not bound to any real-time operating system. So in the sense like this can be ported on top for any particular autosys. So this is hardware independent as well as the operating system independent secure bootloader. So when I say secure bootloader it does various other things other than software update or software installation itself. So what it does is actually like verifying the image in terms of signature verification and handling the encryption part and decryption part of this. So these things also handled by MCU boot. It's like chain loading of the image after verifying the image or after checking the integrity of the image. So actually there was a talk earlier on Monday by David, Mr. David Brown about the security aspects maybe you can go and check it already. So MCU boot also supports multiple boot image support. So what is like multi image support here means say for example an SOC comes with Cortex-M Cortex-M4 and as well as Cortex-M33 together. And then two of these cores needs to run independent softwares. In such cases MCU boot supports also the multi boot, multi image boot chain loading of the images located. So there was a talk already in 2018 by David Brown again. So you can refer to this talk. So then the last and most important part for this talk is about software update. So the most heavy lifting work of the software installation after downloading the image into the primary storage is actually done by MCU boot. So that's why I call as heavy lifting of installation by its operation. It's actually carried out by MCU boot. So to start with how really the memory is organized or the partition of the memory is organized in the microcontroller. So usually we have few kilobytes or maximum few megabytes of memory as a primary storage medium in our microcontroller platforms. So this is this example is directly taken from the freedom development kit for partition layout. So here you can see the device tree in the left shows various partitions which is needed or basic partitions which is needed for the MCU boot to function. So assuming here we have a flash node which is which is alias for actually the non volatil flash controller. It comes under the flash controller again. So this is the flash controller which represents the memory of the built-in memory of the controller itself. So we have under which we need to create partitions based on the needs. For example, we need to have like three partitions minimum or four partitions minimum to have over the air or software updateable system. So here the first partition is actually the MCU boot itself. So this resides at offset zero, zero cross zero. So and then depends on the size of the MCU boot. We can define the size here as well. So this is like partitioning the raw flash into multiple things. So and then the important part is lot zero. Otherwise called as primary slot where the actual chain loading image needs to be located. So one important thing here is MCU boot always tries or always boots the image from the primary slot. So when it comes to the secondary slot, here's lot one. It is meant for the purpose of software update or used for the purpose of software update. For example, by means of some external mechanisms like software downloading mechanisms like update hub or Hockbit, the new image will be downloaded into slot one and then it would be swapped or copied into slot zero. So when I say swapped or copied, this is like two different operation. So in terms of copy, it overwrites slot zero with the image which is actually downloaded into slot one by replacing the content in slot zero. And the other way of doing is actually swapping. How to swap the image here? That's why we have an additional partition code scratch. One can imagine a simple example of swapping of variables in C for in A is equal to 10, in B is equal to 20, and then swapping the values between A and B using a temporary variable C. So just assign the value of A to C and then replace it with B. So this is how the swap works. So the overall picture is like copying the image from one partition to another but using the swap scratch partition here. So that's the purpose of scratch partition in this case. So next is how the image needs to be organized so that MCU boot can understand the image and then chain load the image from the primary partition. So the image which is flashed or loaded into slot zero will contain the image itself, that's the code section, and also it contains a header which represents the size of the image and how much length it got and what is the sign key which is used and so on and so forth. So these all information, we are not going to detail it now because we are not explaining about the security aspects of the MCU boot but instead we are speaking about the software update aspects of the MCU boot. So the most important part for software update in this case is the trailer section. So the memory region in the end of an image partition is called trailer which contains certain metadata information and this metadata is actually used by the MCU boot during booting every boot up. It reads this metadata information and decides whether we need to swap the image from slot one to slot zero or just boot slot zero or we are middle of somewhere in the swapping operation and then we are interrupted with some X and Y reasons like power failure or hardware failure and so on and so forth. This metadata is named as trailer which is used by the MCU boot to decide the state of the hardware or state of the software itself to boot from. So expanding this trailer, here we have swap status which we will see in a moment and we have the swap size which will contain the total size of the image which needs to be swapped from slot one to slot zero in this case. So by having this information or inspecting this information MCU boot decides whether it needs to resume from somewhere else or it needs to start from scratch for swapping this image and also we have swap info which explains a swap type and so on and so forth. So it contains multiple things like swap type and as well as swap number which we will see in a moment. So there is also a couple of other byte information which we can see in the next slide and there is also a magic in the end of this image slot. So jumping into the next. Okay, so in overall big picture how MCU boot operates is like once we kickstart the target it goes and check whether there is an interrupted swap which means like during the swap operation the target is power failed or rebooted again or some X or Y reasons it resetted back and it was in the middle of the previous swap. It detects it by reading the metadata and see if there is an interruption. If so then it will continue the previously left place or continue the swap operation from the previously left place. If there is no interrupted swap there is a swap requested as a fresh new image which is downloaded in slot one then it's gonna check the image whether it's a valid image by checking the integrity signature, encryption part and so on and if there is a valid image which is found in slot one it's gonna start the swap operation. Now I'm gonna share my screen to show how this swap altogether works. Again this particular video or the animation is taken from YouTube which is published by Mr. David Brown, Linaro. So I'm just gonna play this one. So here we have the left one is the image slot one sorry slot zero and then we have slot one and then we have a scratch partition. So how the swap operation works is by first erasing the scratch partition and then followed by it's gonna create a trailer this is the trailer which would be used by the primary slot now so it creates a fresh trailer and also the first step it copies a piece of image here I mean by piece of image it's a sector of image from the flash so this scratch partition itself should be having enough storage memory to store the maximum size of swapable sector so assume we have 128 kilobytes of sector which needs to be swapped for every single operation then the swap scratch partition should at least contain 128 kilobytes of memory so it copies a piece of or sector of memory from the secondary slot and then it will mark a status here in the trailer saying it completed one operation and following it will move the same piece of sector or the same amount of content from the slot zero and then it moves into the slot one and then it marks the same information here in the trailer this way it tracks where exactly this swap operation is currently being left out or currently being in function so it just continues the only thing now here is the swap itself the trailer itself is moved to the end of the slot zero and it continues until the whole image is in the slot one is swapped back to slot zero so assume if it breaks in between or power failure and the MCU boots again it detects the state of where it exactly left by reading the trailer information here and then it resumes back from where it actually left so this way it checks or works in a way that it doesn't break the device altogether so I'm just going to stop this I'm just going to... so it's the same thing which is repeated here just as in the animation which you saw right now so here you can see so the trailer is created and a piece of image is copied from the slot one to the scratch area and then it is marked and then it continues so this process will continue until the whole image from slot one is copied to slot zero so this is how the swap operation is actually being carried out so to view in detail about the swap status itself it keeps record for each and every sectors which have been copied from slot one to the scratch and then scratch until it completely swaps it's like a three state information during the swap operation from slot one to slot zero so these state information will be maintained here in the swap status region of the trailer for each and every individual sectors and then by reading this particular memory in the metadata it decides whether the swap operation is complete or it was left in the partial and then it needs to resume or it needs to boot the first image so this is how it decides or MCU boot decides how to boot the new image and that's the swap operation actually so there are different types of swap here which we can see so as I said before it's not just only the swap status which decides where exactly the swap is done or how to decide MCU boot decides whether it needs to continue the swap or it needs to start the swap or it's left in the between so this is decided by this table so where it needs to read the whole metadata of the trailer the magic image okay whether it's copied done is done and so on and so forth so it decides different types based on reading the whole metadata information so this one is meant for starting the swap as a first download but the previous one here we saw during the swap operation and stores each and every sector this one is used for interrupted swap for example power failure and then it needs to resume from where it actually left so this particular swap status is used for the resuming of the interrupted swap operation and this one is actually primarily decides whether it needs to start swapping or from slot 1 to slot 0 or it just needs to put the image in slot 0 directly so this is the state which is needs to be provided or needs to be given to the MCU boot when there is a new image is downloaded to the slot 1 by the mediums like Hockpit or so on and so forth so after downloading the image how we need to inform or how we need to instruct the MCU boot that if there is a new image is done with set of APIs which is provided in Zaffer for example here we have boot underscore request and so upgrade so this in particular is used to request the request to write this bit flags to mark it as test or permanent for example if the flag permanent is written here then it is actually gonna change or swap the image from slot 1 to slot 0 and there is no test swap we will see in a moment what is actually a test swap and what is a permanent swap by examining in detail so quickly to jump back again to screen share so here I have freedom development board with me connected to my ethernet and then we just got nothing inside and then I am gonna flash the MCU boot and as you can see in the right side of the window the MCU boot boots and as there is no image in the slot 0 I just flashed MCU boot there is no other image which is flashing internally so it is just the MCU boot so it just says unable to find the bootable image because there is no bootable image in the slot 0 to continue further on this so this explains how actually the software update is actually installed after downloading the image to slot 1 so it is like after downloading our new version of the software into slot 1 then we need to inform the MCU boot by using the API and then reset the target so MCU boot reads the metadata information of both the slots 0 and slot 1 and then it decides it needs to swap the image to slot 0 but how do we get our new image into slot 1 that explains the remaining part of the presentation now so we have like few solutions available in Zephyr about how to download our new image or the next version of the image into the slot 1 partition and we are going to see one by one so here the first one which I am going to speak about is update hub what is update hub the cloud solution and as a cloud solution we can manage the device or for software update from the cloud by uploading our new version of the image and the device can either download the new image by polling method or by manually reading or checking the server if there is a new image and then we can flash the new version of the image into slot 1 partition so this solution is actually secure and also provides a total a total solution for managing multiple devices like having a roll out mechanisms for connecting all the devices to the internet and then managing the solution or managing the OTA itself so there is also couple of versions like for community, for the developers to use with and also there is a device addition which can be used for large scale devices which is deployed on the field now I am again going to jump into my screen to see how update hub works and we are going to use this example here I am just going to flash the update hub example or update hub sample application which is as part of Zephyr to my slot 0 because we don't have any image in slot 0 so far so before that I am just going to show the update hub UI here this is like a cloud which is running in my local server for now and it creates a cloud platform where you can upload your new image so after you login here you can see like devices and then roll out mechanisms which you can upload and then manage the device and now I am just going to jump back to the console to flash this image here so the one thing which I did wrong now is like I just flashed the image which is not signed and then it detects that the image in the primary slot is not valid because I flashed a plain image instead of the signed image so how we can I am going to flash the signed image and it is going to boot the signed image and it connects to the update hub server to make the device visible as you can see it now boots the image in the primary slot and it communicates with the cloud and it says there is no update available because we didn't push any updates here in our update hub itself here you can see this new device is just populated after starting this application here in the device now we are going to just upload the new package which I already created with actually the different version or altogether a completely different application I am just going to upload this package and there is also procedures like in the in the Zephyr documentation you can find the how to create the package version of the package or how to upload or how to sign these things so how to sign this new image everything is part of the documentation already so now I am going to create a rollout mechanism or rollout for this particular device and the image which I just uploaded is version is 2.0 and I am going to create the rollout for this new device or new version of software I am going to start this upload process it is like the process is started and it says the operation is pending now I am going to run update hub run in my control of the device and it is going to pull the new image and it says the new image is downloaded and flashed successfully and then you can view the changes here as well in the cloud UI so now I am going to reset the target and now you can see the new image is actually swapped as we saw in our animation or in our slide previously and then the new images copied back to the slot 0 and this is actually just a hello world application which is actually pushed from the cloud so the new image is now flashed and we can see this image is up and running so once this is done based on the application we need to notify the update server to say our update is done or failed accordingly but haven't done anything so that means just rolling for or waiting for the status from the status from the device itself so by this we can see like any number of devices which can be managed for managing the software update itself so the remaining part here is actually exactly the same thing which we saw I just have these slides because I was not sure whether the screen share will work or will not work so it's the same thing which you can see like the new image will be downloaded and then once you have the new image downloaded you see the status change in the cloud UI so that's about the update hub and how it exactly works so this particular solution works directly by connecting your device to the internet now I connected with ethernet and connected to the internet and then it uploads and downloads the image over ethernet or any form of internet medium so maybe wifi or modem which can be also together used in this cases so the next solution which we can see now is mcu manager so this solution is actually developed by Apache for their autos like my new autos and it also have the porting to zephyr as like mcu boot this is not hardware dependent or operating system dependent so there is a port for there can be a port for any number of real time operating system so here we have port for zephyr and we are going to see how this works with the zephyr OTA or software update itself so as an architecture point of view how exactly this is layered is it can address any form of physical mediums except the management layer itself all the rest is actually modular or pluggable based on the use cases by mean I here mean that today we have bluetooth shell and udp way of communicating towards the devices and server this we can assume like a client server model it can connect between clients and server using bluetooth or shell by using the uod connection or udp and future it can be extended further into further communication mediums like wi-fi or or so on and so forth this layer is modular so the simple management protocol by itself is modular in such a way that it can be extended to any communication mediums here so this management layer here actually takes commands from the server and handles the commands or forwards the command to the respective command manager here in this case for software update we need to speak about image management and the image management takes care to respond back with your response like uploading the image or downloading the image and then verifying the image and so on so forth so this is like a commands and it is managed by the image management layer itself so this whole thing between the management and the communication is actually attributed by seabor standard and this is like a binary format and this formatted data is examined by the management layer and forwarded back to the image management layer so that way it manages the images in the devices I am going to again share my screen now I am going to use my sd microelectronics discovery board with me for this particular example so here you can see the device got nothing I didn't flash anything into this image to start with we need to flash the mcu boot itself in the primaries in the initial 0x0 memory and then we need to further need to flash the application itself in the primary slot and then we need to download the image from using the mcu-mgr use case or mcu-mgr client to the secondary slot so now I flash the mcu boot as like before and then I am just going to flash the smp server with this simple management protocol server and the device acts as a server and we will control from my Raspberry Pi as a client to speak with the device and whether to upload a new image or query the image so on and so forth again I did the same mistake sorry instead of flashing the signed image I did flash the unsigned image now I did flash the signed image into the primary slot and then it just puts into the primary slot there is something wrong with the sample or I am going to try if it does not work maybe I will just jump back to the presentation where I have the same things explained in slides unfortunately there is something wrong with my demo maybe I can jump back to the presentation to show how it actually functions so assuming the device itself is flashed with the simple management protocol server application then from the client side or client part we can query the device over bluetooth for example or if the connection medium itself is like serial UR then it can be controlled with serial UR or recently there is support for UDP as well it can be connected with UDP so here there are few image management commands as I said before so we have the image management control which handles multiple operations like listing the image or uploading the new image or verifying the new image and so on so this is like couple of commands which is explained what is the image which is listed in the or total number of images which is part of the image part of the device here we have 0 with image which is currently running and there is no image in the slot 1 that is what we are going to download or upload from here once it is uploaded you can list this new image and also you can confirm this image by calling as confirm so what is the state which is previously in mcu well by speaking about mcu boot and it says state or flags here after the new image which is downloaded into slot 1 this is marked as pending because this is like a test image to make sure that this image is swapped back to slot 0 and it boots fine and it is not breaking the device or it is not failing with some error in the software so this way it is like we are testing it for once and then confirming it as a next step from either from the software itself or from the mcu manager client itself so this can be managed from either the application or from the client directly controlling the device so here the state is mentioned as or tagged as pending which means this is a test start and then during the next boot it tests and swaps the image as like as we saw like in update hub cases it is just going to swap the image and we see this image is swapped and if you see this image is booted fine then we can confirm the image in the primary slot which means like the image which we downloaded now so by calling it as confirmed from the mcu mcu client or in the application itself or in the binary itself in zephyr application itself we can mark this as a good image so if this image is marked good then it is like from the next boot it would start the image directly if there is if this image is not confirmed then as mcu boot during booting of the image in the primary slot it examines this metadata in the trailer and decides whether needs to boot this image in slot 0 or if it is not confirmed then it reverts back saying swaps back the image from slot 1 to slot 0 again saying this image is not confirmed so this way the device is not bricked with bad software and it is a nice way of doing to have a not bricked device in the field okay so there is also other way of doing software update which is actually with can open which is recently added in zephyr which can be used for as like mcu mgr or with update hub can also be used for software update but we are not going to speak about it now but as an overall solution what is the limitation which we or what is the thing which is really missing in this aspects so we have a solution called mcu manager which takes care for local radio and couple of cases using the cbor attributes managing the images and there is also solution which is based on update hub when the device is connected to the internet but there is no unified solution which is currently available in zephyr for handling both of these cases together assuming for an example of sw update as an unified solution which handles in linux at least it handles the image download from hokbit or local server or multiple other cases like usb download and so on and so forth so this is like one unified solution which is not available in zephyr to address all the use cases at least in terms of internet of things we have assuming a case where 10 to 15 devices connected inside an industry and connected in a star topology towards the gateway which could be linux and updating such solution or such devices which is not connected to the internet but connected to the centralized gateway is a serious job or the solutions which we discuss so far doesn't address this problem so this is the current state of these zephyr implementations so how we gonna extend this solution is by having a different architecture called zupdate and so far the state of zupdate is not complete at least which is not yet pushed to the main line zephyr which we developed started developing a year ago which is still ongoing and we are addressing various aspects of the limitations and problems by having this particular architecture called zupdate to address all possible form of software updates whether it's like a star topology connecting towards a centralized gateway in an internet of things topology or it's just a single device connecting directly to the internet or it's a single device just lying in a local radio like bluetooth or lora and so forth so this is this particular architecture addresses or influenced directly from the solution svupdate in linux so by having a model a solution for handling the download infrastructure by having a download handler which is connected towards the zephyr zupdate core and the other side of the zupdate by flashing the image to the primary flash or if it is not a binary update it could be a configuration update from the server or key management change of key for the security or change of x or y information which doesn't always necessarily go to the flash primary flash but also it can go to the spinar flash or other storage mediums as well so this modular architecture which we have right here is handling multiple other forms of solutions like cockpit update hub and it can extend towards other solutions like svupdate here I say svupdate because when you have a star topology in an internet of things network where the zephyr gateway or sorry linux gateway lives in the center and connected to all the zephyr nodes or the zephyr devices in a local radio and the swupdate can download the image directly from its source and updates itself in the linux machine or updating the gateway itself and also we can control the software update of all the client devices which is connected in this network which is actually zephyr by having this zupdate so by having an additional handler in swupdate this is directly possible but also not just swupdate but also with other solutions like mender and so on so this solution or this architecture is designed in such a way that it can be modular to any form of further software update mechanisms so by the device itself can be connected to the internet or it is not connected to the internet it can be managed in such a way for both connecting to the internet cases and as well as non-connecting to the internet cases like local failures so far we started implementing the code part of the zupdate architecture and it's still under way but we started with Hockpit as an example which is again another device management or cloud management solution for the software update mechanism so something like something similar to update hub but this update Hockpit case is an example we use to bind together with zupdate and it worked well so this is how it is organized where the cloud infrastructure is running any solution for this case is Hockpit and it connects towards the internet and then we connect the device directly to the internet so this could be connecting directly to the internet in this case like the freedom development board connecting directly to the internet and pulling the image from the Hockpit server but managing the software update not by directly flashing into the spine or not by directly handling it but by having a download manager handling the downloading part and flashing with the storage manager from the zupdate core itself so we started with the solution and this is like partially complete and it's working but for now the mainline version of the zupdate is not yet complete but it just have the solution for the Hockpit itself directly so maybe before the time runs out I can just quickly try showcase or show the example of how Hockpit works so as like the update of solution we have Hockpit server where you can login to the server and see upload the image and manage the rollout mechanisms now I'm just gonna sorry let's forget to share my screen so here I have the Hockpit server logged in maybe I can login back again so here we can manage the devices for the rollout management as like update hub but from the device point of view I'm just gonna reflash the device with some example to start it from fresh so now we have the simple hello world example and then I'm just gonna flash like as before the MCU boot first into the target now we have MCU boot so I think I run into some problem and we are running out of time maybe I can just show this case directly in the presentation itself so here we manage the devices upload the image directly in the update hub as like in the Hockpit as like update hub and in the device side you can see the image will be downloaded into the secondary slot and then it would be replaced by the MCU boot during the software update but what is the difference between here is like it's managed by z-update infrastructure instead of the direct update mechanisms as like before we see in the update hub and so on so this particular architecture which we see as an example here after the update the image is changed and then we see the status of the image to successful as like update hub as well so so far we have the solution for z-update is currently underway but it's not yet complete but not yet complete in the sense it is having the base infrastructure ready but not yet covering most of the local radios like Lora and so on this needs to be extended further so this is not yet done and we are currently working on it and then we move to this separate mainline as well so this solution is flexible in such a way that such a way that it can be extended to any form of cloud solution like update hub or cockpit or any other solutions which we can upload the image directly and then also handle the local radio cases and also the Linux OTA agents like SW update or Mender or so on so that's the major part which I want to speak that's it for now if you have any questions I will jump into the keyword part and see if I can address the plot of questions before moving into this slack so Mahindran asked where does the key store stored so in case of MCU boot it is directly embedded as part of MCU boot itself but this can also reside in the key store region like a separate memory region which can be part of an external chip like TPM or it can be part of any other secure devices then the next question there is support for 3 images current update and recovery yes there is support for recovery as well but we haven't discussed this particular part in this talk specifically to just make things simple to understand the swap operation and how update works but there is yes there is support for recovery as part of MCU boot and there is next question what if the only transport available for uploading is BLE and is the transport catered for not sure if I understand correctly what if the only transport available for uploading is BLE so I don't think there will be a problem in uploading with BLE in case of MCU manager or other mediums as well I think that's the questions which I can see in the slides right now I'm not sure if I have to scroll okay so is there any z-update documentation code available online yet yes it's available specifically but as I said before it's like in terms of early development internally and we just finished the cockpit part into the mainline part and it will be soon merged in like in coming days and yes the z-update part documentation and as well as the functionality is not yet in the mainline which will be soon available maybe I can say in next month because I have already a stable solution running so thank you very much for joining and I will be ready to take questions directly from Slack from now and then we already run out of time across our limit so thank you very much again and have a nice day stay healthy thank you