 Hi everyone. Thanks for joining us on this video. Today we speak about full metal update, the WitteQ's Over-the-Year Update Solution. And your host today for speaking about that is Sylvain Vincent and Antonin Godard. Hi guys. Hi there. Hi. Can you please introduce yourself before we start? Sure. Sylvain Vincent, Director of Technology at WitteQ. My identity job is for WitteQ and make sure that we always stay ahead of the curve. My name is Antonin Godard and I'm an embedded software engineer at WitteQ and I work on a lot of different subjects, including full metal update. Perfect. Thanks guys. WitteQ is an embedded software company expert, an IOT company as well with 18 years in embedded and IOT software, more than 135 WitteQ's around the world in a five office between Europe and also America. We have more than 100 customers per year and the blue chip customer leaders in their field. This video will be in three parts. The number one is full metal update presentation by Cedric. The number two is OS and Cantonese Roadback presented by Antonin and the third part is AGI presented by Cedric. Thank you Yanis for your introduction. Today I'm going to talk about full metal update, a product that has been developed by WitteQ for the last couple of years. But first, why full metal update? Because you might wonder, I mean, there are plenty of solutions for a date of the year already available on the market. I can talk about Mender, Valena, and you can use solutions that have been developed by Microsoft as well to achieve that type of goals. But full metal update has a couple of very unique features that you will not find in any of those solutions. The first one is data updates. So why data updates? I mean, not all IOT devices are connected to the Wi-Fi or over 5G. So which means that you might have a bandwidth, which is very limited, or you might even need to pay for a bandwidth. So the ability just to update the difference between two new versions of software is very important for all those use cases. Then full metal update is based on containers because they manage to solve a lot of issues when it comes to dependencies. And since we're going to talk about HGI, and if you are already practicing HGI or just machine learning whatsoever, you know that dependencies are a huge issue because you've got plenty of versions of TensorFlow. And for one version to another, quite often, it's not even compatible. Adding to that isolation, I mean, if you use containers, you've got like your application isolated from the rest of the system. From a security standpoint, it's always interesting. And using containers just makes it easy for you to use that same workspace. So that's as well very interesting from a security standpoint. Then full metal update is power loss resistant. If you use solution like Doaker to update your system, Doaker were designed for servers with power back up. And that's not applicable to the system. So when you're updating something, like one of your devices, you need to make sure that even you write in the middle of your day, but your device is going to still start and you're going to still be able to use it. So full metal update was designed for that. It was designed for that for the operating system and the containers. The operating system, the regular way to do that is to use some tools like LW a bit and to have like AD partitioning. But for full metal, we are using a different type of tools, which is even more powerful when it comes to that, because it's managing the data dates and it's managing as well the atomicity of your date. Then full metal update, when you use the demo comes with the feature that enables continuous delivery pipelines. So you get a big system that is going to build your application, build your own agency, like your own business distribution and deploy that automatically to a web UI that you can use to then deploy the new version of your software to your different devices. And it's fully integrated. It's already there. So day one, like you set up full metal update on your computer, you can have that. And I think that's a big plus of that solution as well. Then how full metal update? What type of technologies are we using in the background for full metal update? I mean, we did not develop that solution from scratch. It was mostly integration, by the way. So the first part, the first technology that we used is Yocto. So Yocto is the go-to solution that comes to develop embedded in this solution. It's a system based on a device that you're going to develop either in the Azure and Python, but you're going to use to assemble what's called VCPs. So a pair of software that you put together to assemble your own Linux distribution. With that, we use RunC to develop, I mean, for you to generate the different containers where you're going to put two different applications. So it's a mix between RunC and Yocto. So Yocto is going to build the embedded solution. But Yocto is also going to generate the different containers because Yocto is great for one thing. If you put, for instance, a Qt application, what's Yocto is going to do? It's going to look at, okay, you need Qt and it's going to look at all the dependencies you need to build your Qt. And it's going to package that together inside a container. So as much as it can be used to, as well, generate the main operating system, it can be used as well to generate a very, very lightweight container just with a minimum set of dependencies to run Yocto. Another advantage of RunC compared to Docker is it's super light. It's just the runtime to run like the application that you have in your container. When Docker comes with a lot of layers that can manage updates as well. But as we already discussed, the way it's done is not overfailure-safe. And that's why we needed something else that would replace the part of Docker. And what we use is OS3 to do that. So OS3 is a tool developed by Red Hat that basically is able to manage a data updates problem. I can compare it to Git, so if you are familiar with it, but it's Git for binaries. So basically, you're going to push that version of your application kind of a Git repository. And when you push, do you push version bin? OS3 is going to automatically include the differences between those two versions and only push those differences to the OS3 repository. Which means that when you're going to push that update to your embedded devices, it's as well just the data between the two versions that is going to be pushed by OS3. So, for instance, if you're updating, I don't know, an un-beginning distribution of 400 megabytes. If you just change a file that was like 20 gigabytes. OS3 is just going to do all the magic in the background for you and just update those 20 gigabytes. Then the last part of the project is orbit. Because, I mean, you've got a cool solution to generate your embedded links distribution, to generate containers with your application, a way to deploy those applications and this embedded links distribution to your embedded targets. But you need a way to control how you're going to update all those devices. And orbit was just developed for that. It was developed by a push to manage all type of IoT devices that are using for, I guess, for the customers. So, we use that technology, which is open source as well, and we integrated that to through Metal Update. So, when you are done with this setting up for Metal Update, basically, from the beginning, you get a build system that was set up in containers, so you don't have containers that you can easily set up on your computer. You get the server side with OS3, so to deploy easily everything, and a hobby to manage that. So, if we look at the architecture of our system, how is it there? So, on the left, on that slide, you see you've got the build farm. So, the build farm is where you're going to build your different links distribution, your different applications that you're going to put in containers. All of that is going to be managed by Yocto. And when Yocto is done with building your application, everything is going to be committed to OS3. What I'm using committed is just, again, to do the parallel between Git and OS3. So, it's not pushed. If you are familiar with Git, it's committed, which means that it has been kind of pushed locally, so pushed on your computer. Then, automatically, the script which is running with Yocto is going to push that to the OS3 server. So, this OS3 server is running in the cloud. If you use our demo, it's going to be running on your computer, as you can see it's a Docker composite. So, automatically, every time you generate a new version of the embedded links distribution that you want to use on your device, or a new version of the application that you want to push to the device, OS3 is going to generate all the data updates that you need to update those devices that are on the right side. When Yocto is going to do something else, it's going to automatically push all the information about those new versions to a whole bit. So, automatically, every time you read the new version of an embedded links distribution or container, you're going to see that popping out in the UI, in the whole bit UI. And then you can choose whenever you want to deploy those new versions. And what's great, we can manage as well, like bulk updates. So, you can decide like for, I don't know, you've got like 1000 devices, but you want just to update 50 devices, like what's going to be your canary devices, just to test the new version of your software, and then to update the rest. Another cool system that you have on the whole bit, if it starts to go wrong, you can have like an kind of automated way for hopefully to decide, okay, that's not working, like this new version is not running very well at the end. Particularly, trigger a rollback for all those devices. On the right side of the side bank, you can see what's happening on the embedded systems. So basically you've got a client for a few years. This client is going to do the polling on the server. So it's not the server deciding when you're going to update. It's the client, which is going to do the polling. I don't know, every 30 seconds, every 20 minutes. It's something you can actually set up. And when it's going to update that there is a new version, it's going to automatically use OS 3 to download that new version. And if it's a new version for the operating system, when it's updated, we start the operating system. If it's a new version of an application running the container, it's just going to update the container and restart the container. Which is another big plus of full metal update. You don't need to restart the whole operating system every time you've got an update. If you use any partitioning, for instance, you don't pay any price. But with full metal update and with OS 3, with that combo of applications, you instead just need to restart the part you just updated, that's it. Then you might wonder, what about the security level of full metal update? Well, you've got everything. You've got everything to build a security system. But nothing is set up by default. Why? Why that? It's very simple in fact. Because depending on what you want to secure, if it's a water tank or if it's a safe, you're not going to apply the same level of security. And you're not going to invest in security. And I've told you, we don't believe there is such a thing as generic security. You cannot have good practices, but you don't have anything generic, something that you can deploy everywhere and say it's going to be safe for everybody. If you look at systems, like if you know how to obtain, for instance, for automatic systems, obtain is great. That's the state of the art of what you can apply for security. But do you want that for a sensor that is going to be out on a swimming pool than just to send some water temperature? No. And it's going to be way too expensive. So instead of trying to do something kind of generic that could be used everywhere, we decided to have everything set up for you that you can enable secure mood, that you can amplify a system, like when you do your certificates, for instance. So you've got everything ready for that. But nothing is enabled by the default. It's up to you to do it. And then you can just do it to the level you need for the product that you are developing. Then there was one more, another additional feature that was missing, to put it out of date, it was a raw market. And that was just for us to develop it, basically. And now we have it. So Antonin, my colleague, is going to present to you what we mean by rollback. Because we already had kind of a rollback. In case of a poor failure, basically, OS 3 was already managing that for us. If the date is not complete, automatically OS 3 is just going to rollback to the last version that was released. But you can have as well the case where you actually updated the system, your application is not working properly, or your kernel is not even starting anymore. And this you need a different system to manage. It's what was in there. So Antonin, I leave you the stage. Go ahead, you can start your presentation. So thank you, Cedric, for this introduction to full metadata. Now we will focus on the operating system and container rollbacking. So what do we mean by rollbacking? A rollback is just a return to a prior state by undoing some operation. And what that implies in our specific case is automatic error detection. And this is very important in the case of edge computing and in the case of embedded systems. And this rollback applies to the operating system and the containers. One very important point is that these are two different solutions. So we will first explain how the operating system rollbacking works and then how the container rollbacking works. So we need to focus on how the embedded system is booting to focus on the rollback. So we have basically four stages. The first one being the bootloader, which is U-boot. Then the execution is passed to the kernel. Then a script is executed. It is a custom script by OS tree and it's used to mount the file system, which represents the current deployment. And then finally, system D is executed and all system services are started, including the full metadata clients. So this is very important. The full metadata client is actually a system service. So we have to make two hypotheses on this boot sequence and we need to determine what is a success and what is a failure. So if the system boots from U-boots up to the full metadata client startup, we will consider that the boot is a success. Otherwise, if the boot process fails at any point at any stage in the boot sequence, so that maybe the kernel, the script or system D, we will consider the boot process as failed. And so in one more hypothesis is that we want to trigger the rollback when we have failed five times, because sometimes the boot process will fail at some point. So if the boot will fail at some point, the boot will run fine, actually. So two hypotheses. We need to boot completely in order to consider the deployment as successful. If the deployment fails, if the boot fails at some point, we will consider it as a failure. And then if the boot fails five times, we need to rollback. So now we will see how we actually enable the rollback with U-boots. So U-boots has an environment when starting and this environment contains variables that will be required by OS tree to determine what to mount and to determine on which kernel version and in which deployment it will boot. So if we take a look, we have the boot command from U-boot just to use to boot. We have the boot arguments, which are included in the kernel command line. We have the kernel image path, which is used to load the kernel from the file system. We have the initramefs path. And finally, you can optionally have a device tree path. But then OS tree actually defines those variables from user space. So it will define some boot arguments needed, which OS tree actually parsed in order to know which deployment it will use. And it will also define three variables, the kernel, RAM disk, and the device tree. But it will also define a pair of variables for each of these, which are named the variable with the two. And these variables actually represent the previous deployment. So if we decided to boot on the previous deployment, we will actually just need to use the green variables in order to boot. So let's see how we use this in order to enable the rollback. So it works with a simple algorithm because U-boot actually allows for logical operations, very simple logical operations. We will have two additional variables that are defined by the full Metal Update clients. And the full Metal Update clients on an update will define success with the value of zero and trials with the value of zero. Success, as you may have guessed, represents the success of the current deployment. So by default, it is equal to zero. And trials represents the number of trials. The number of times we try to boot. So if we take a look at how the algorithm works, basically it will start, U-boot will start and parse his variables. And OS tree variables are loaded. So the variables that are just defined right there. Then we will take a look at the value of success. If it equals one, that means that we are already booted successfully because the client has marked success at one. So then we will boot. But on an update, success equals zero. So we will go on the next condition. Here we take a look at trial value. Now trials equals to zero on an update. So we will take, we will go in this branch and we will actually increment trials. We will save this environment and then boot. Now let's make the hypothesis that the boots has failed. So the system reboots. We go on the script. Success is still equal to zero and trials is incremented. So we just go in this branch again. And we do that five times because the system will make the hypothesis that the system fails five times. So the correct trials equals to five. So we go into the next branch, which is replacing all the main arguments used for booting by the second argument here. Boot arguments, the kernel, the RAND disk and device tree. And this will just have for effect to boot on the previous deployment. We boot environment and we boot. And then we are sure that because this is a previous deployment, we have very little chance that the system fails. So then when we actually reboot and we boot successfully, the client actually determines if the deployment has succeeded. That is, we went on the first or five boot trials we actually succeeded. Then the full mental deployment will feedback the server with a positive feedback. Otherwise, if we went into this branch, that means that we are on the previous deployment. And then the client makes a negative feedback to the server. And that is very important because it's the only way for the user to know if the deployment, the operating system deployment has succeeded or not. So then let's go on explaining the container rollback. Containers are started with system D and system D is actually very powerful in order to control the execution of processes. And system D, sorry, system D actually allows for notify services, which allows for a fine-grained process monitoring. And basically what system D does is that it will execute the main process in our case that's the container. It could be the container. And then system D will wait for a flag from the container, a flag which is actually sort of a message saying that the initialization has actually succeeded. And then we can consider the startup of the container as a success. So how can we rollback a container? First, we execute them with a notify service. That is, we just actually change the type of service to notify. And then we wait for an answer from the container. And then if the container actually fails, which system D handles, system D actually handles lots of different cases of failure, we rollback the container. So we will see an example right now. This is just a sequence diagram representing a success, an update scenario where the container actually succeeds. So we have four parts, the client, the full metal of the client, the Hockbit server, the container and system D. So first off, as Cedric presented, we will deploy the update from Hockbit, sorry, we will deploy the update from Hockbit. And that is represented by the update query right there. Then the full metal of the client will feedback the server by saying that it is proceeding to the update. So the updates on the Hockbit server will be pending. Then the full metal update will actually create a socket and then it will create a thread. And this thread has the only purpose of waiting for a message in the socket. Then the full metal of the client will update the container. And here we make the hypothesis that the container successfully initializes. The continent will make a positive feedback to system D. System D will actually execute a command to message the socket with this positive feedback. And the thread will receive this feedback and it will use this feedback to feedback the server, the status of the container. So here it is positive and it's actually very useful because, again, the server is the only way for the user to know if the update actually succeeded. Then the client just proceeds to remove the socket and proceed to other updates. And this scenario can actually repeat exactly. So now let's see what happens when the container fails to boot. So here we have the same steps for beginning. So the update query, the socket creation and the thread waiting for a message. Then we proceed to the container update. But here we make the hypothesis that the container just fails at some point during the initialization process. And system D having a pretty fine delay will actually time out waiting for the message from the container. And system D will actually decide that the container initialization has failed and message the socket with a negative feedback. The thread will receive the message and actually we will first roll back the container because we know that the initialization has failed. And then we will feedback the server with a negative feedback saying the status of the execution of the container so that helps debugging. And then we will actually include whether the container has successfully rolled back or not. And then again, this is very useful for the user on orbit. Then we will just remove the socket and proceed to other updates. One very important point about this rollback is that we actually need to adapt your program in the container in order to send a flag to system D. But this is actually very simple because it's just one command sending the flag. And so finally we will see just what happens during a power outage as Cedric previously mentioned. We just remind that OS tree updates are atomic. That means that we ultimately will always boots on one version or another or another. That is the newest version or the previous version. So if the power goes down the update will just if the power goes down during an update, sorry, the update we starts and proceeds again. And one very interesting fact about OS tree is that if the data has already been downloaded, we don't actually need to read and load the data because it's already there. Because it has already been downloaded before the power outage. Now let's see just the demo of how full metadata help handles a power outage for this first demo, we will demonstrate what happens when a power outage occurs during an update. So as you can see, this is the cockpit user interface and I have already configured the target, which is the STM 32 MP1. I have two version of the same container, the first version version being the cats version and the second version being the cats plus dogs version. Currently, my target is running the cats version and I'm going to deploy the cats plus dog version and cut the current by pressing the reset button on the STM 32. In parallel, I will just analyze what happens on the full metadata client by analyzing the log of system B on the target. So I'm going to deploy this version. As you can see, it's being added here in Hobbit. And now you can actually see that the client will start the update and I'm going to press the reset button right when it starts. So as you can see, it started the updates and I cut the current. So the board is actually rebooting and it's going to follow its usual boot process. It's going to start system D, which is going to start the system services, but it's also going to start the full metadata client. We will wait until the board successfully reboots and we will just see the log of full metadata client. So as you can see, the client is going to restart and it's going to proceed to the update again. And it's just going to proceed to the updates. Normally, it's going to download the data and actually proceed to update the container. And in Hobbit, you can actually see that the update is successful. Now in this case, the data was re-downloaded, sorry, because I cut the current before the data was downloaded. If I cut the current just after the data has been downloaded, that is, right there, the data, this data wouldn't have been re-downloaded. OS tree would just have updated the container without having to re-download everything. So what we can conclude is that whenever the power of the outage occurs, OS tree will manage to boot the container in either the previous deployment or the new deployments. Now let's talk about HAI and Cedric will talk about it. Thank you, Antonin, for that demonstration of the World Bank feature. So the last part of that presentation is about HAI. But why HAI? What is HAI? That's the three talking points you find everywhere. Latency, autonomous driving. I mean, of course, if you send everything to the cloud and if you have to steer the wheels of a car, you might have some issues. When you are recording some videos, like people in the street, if you send all the data to the cloud, you might have some issues with privacy. And then which, of course, again, if you send a video stream all the time to the cloud, you pay for the data so it might get quite expensive. By the way, that was the same talking point for edge computing, the kind of just championing. But that said, edge computing has a couple of challenges that are really not that easy to solve. And we thought that full metadata actually can help you to manage to solve a couple of those challenges. So in the cloud, if you look like overall, what are you going to do when you do AI, like training of your models in the cloud? You're going to need a data set for the training. That's one of the main challenges when you are doing deep learning. You're going to need to observe the training. And then you're going to need a way to transfer both your networks to your edge devices. Because, I mean, the inference is something you can do at the edge of the training. It's still for the cloud. Maybe at some point it's going to change, but right now everything is still happening in the cloud. And you might see me come in with a biopoint to transfer the networks. Yeah, from the metal of it, probably could do that. At the edge, you've got two main challenges. It's inference speed and tell me the model size. I mean, if you are learning on a microprocessor, it might not be an issue. But if you want to run your model on a microcontroller, something that you might need to optimize. One big point as well when you are running machine learning models on edge devices is the dependency help. That's one of the biggest challenges you might run into. Even with TensorFlow, from one version to another, they are not compatible. 1.4 to 2.0, not going to work. Even from 2.0 to 2.1, you might have some issues. But that's a very interesting and it's where we're dealing with contenders. Again, from the metal of it. We look at the example on edge. What do you want to do when we talk about edge? Basically, you want an estimated system. You want to have on the edge the inference. You want to be able to classify, for instance, and in the cloud, the train. If I take an example with a coffee machine, which is going to be able to recognize customers. So if it's a new customer, I'll be able to retrain the network in the cloud to have that customer. And then to adapt the settings of the machine for that customer. You are in front of the coffee machine and then every morning you take a black coffee, no sugar. Would not be great if the coffee machine knows that for you and automatically every time you are in front of that coffee machine, it's that specific type of coffee with the right settings that you have. So that's the idea behind edge computing and HGI. The ability to have that type of algorithm running locally on the coffee machine, but still the ability to retrain that model in the cloud in case you have got new customers using that coffee machine. Another use case, which is a bit more I think what's happening in my life right now. So let's say you've got a model that you are using to classify those amounts. This model has been trained with data sets and now you want to add a new output that would be for houses. The way to do that, in fact, that would be the first option and you just change the data set and you add to the cats and dogs, you have pictures of houses. So that means that you need to retrain all the neurons that are in that network. So in that case, for instance, it's 13 neurons. So it's not much, but in real life, it's more like millions of neurons. So you just don't want to do that. So in real life again, usually what you do use is a transfer algorithm. So instead of retraining everything, you just go there. So the end of the network. So in our case, it's the last six neurons. So you remove them completely, you put six brand new neurons and you just retrain that part of the network. And it's just going to allow to recognize houses in addition of dogs and cats. And that's what we will call the transfer learning. So it's interesting with full metal of date because full metal of date is a ball with a history, of course, just to update the differences between those two models. So if you just updated both six neurons, and let's say it's 50 kilobytes, just going to update both 50 kilobytes. And if the overall size of your model was, I don't know, 20 minutes, just going to update that small part that was at the end of the network. So it's very interesting. Networks like NTM or IoT and limited bandwidth might have to pay for the bandwidth. Then you don't have that problem. I mean, it's not such a challenge anymore because you are just updating the differences. If we update the pipeline that was developed for full metal of date, just to the. Then what we did on that slide, we added like an AI server. So, of course, you need to have your AI models and you're not going to do that. Even for transfer learning, you need something quite powerful. Your new model and you add some houses, then this model, you're going to use a container to execute it. And that's the beauty of full metal. Because we've got like years review too, but you can use to install any type of machine learning frameworks. So you can have a container and in terms of TensorFlow 1.4 and next to it another container in TensorFlow 2.3. And that's great because then it's already with dependencies because they are just executed in their own little world. And they don't know, they might not even know about each other. So you don't, you will never run into some issues where as you cannot form these two versions of TensorFlow together. It's a bit like virtual environment if you are not a scientist. Then you're going to push that container with the new version of an import through S3. And that's like the same process as one has described at the beginning. We've always been computing for you the differences between the two versions of that container. And we've probably then you can trigger the update of the AI model to your learning systems. So by using full metal update, you saw like three challenges that are coming with API. One of them was the pipeline to transfer new networks. And the second of coming with full metal data. We already provide you everything. We just need to plug your AI server in the build process. And with your code, it's going to be automatically integrated in like the full metal of the pipeline, like creating the containers and pushing everything to your learning classes. The model size, because you are not transferring like the full size of the model anymore, you're just transferring the data between two versions of the model, thanks to your S3. Again, yet another problem which is not there anymore. Finally, dependency head. We are using containers. So you can put any type of dependencies in both more blocks. And then you can just deploy them easily to your learning systems. And you can use TVM, Armin and Tamsa for whatever you want. So if you are looking for TVM and Armin and, for instance, a video we are providing, we are providing a meta layer which is called meta machine learning. And you can just use it like out of the box with full metal data. We already have examples and just test it. Time for a demo, talking about a nice demo with a classifier running on the STM32MP1. And so the beginning, the first part of the demo is going to be just with that classifier. So we just plug the camera and you can point that camera pictures of dogs and cats or we will look at update the model using full metal data. And we add like hosts. So we use transfer learning to change the model to update the output with full metal data. For this second demo, we will see what happens when we update an inference model with full metal data. So as you can see, I still have my STM32MP1 figure on Hubbit. And I have a third version of my container, including horse detection. I will deploy this distribution, this version of the distribution and we will see what happens on the targets. So here you will see that the update will begin. And here you can see that the update started. And one very interesting point to note is the size of the updates. As you can see, it's only 13 kilobytes, which is very, very interesting and is very a key point of OS3. It enables very lightweight updates. And you can actually update an inference model, in this case, with very little data. So now we will see what happens on the application. So here you can see a picture of a horse which is being filmed by the application. And it's not able to detect the horse because it only has the cat and the dog detection enabled. Now we are going to proceed to the updates. And now you can see that the container has restarted. And in a couple of seconds, a horse will be detected. So you can see that this update was quick and very lightweight. And it's really an interesting feature of full metal data. Thanks, Cedric. Thanks, Antonin, for this presentation and the demo. Thank you for this complete presentation of full metal update. I hope you guys enjoyed this presentation. If you have any questions or you need more information, feel free to contact us. We have typed our email in the first slide that you can see in the video. You can follow us on LinkedIn. You can follow us on Twitter as well. We have the website www.witecure.com. And thank you guys, everyone. Wish you a good day and see you soon.