 Gweithio, a dyma'ch peth yn ddweud y gweithio ar y cyflosio ar y internet. Wrth gwrs, mae'n ffasgrwm yn ddwy'r ffordd yma yw'r ysgol, yn ffasgrwm rysg ffordd yma, mae'n ddwy'r gweithio ddwy'r gwaith. Mae'n ddwy'r gweithio'n gweithio. Mae'n ddwy'r gweithio'n gweithio, felly mae'n ddwy'r slide'u ddim, felly mae'n ddwy'r link ar gyfer yr adeg, oedd ychydig ar y website E-Linux. a dyna ni'n meddwl i'r smiling from the events page there. Skip that. So that's a little bit about me. I'm a freelance professional trainer and author. This is a really good book. I highly recommend you go out and buy a copy because let's face it, I need the money. And you can also contact me in these various places. I'll be very happy to link to any one of you on LinkedIn or on Google+. So, this talk then is about software update in the world of the Internet of Things. I don't know if you've noticed, but actually Matt Porter gave almost exactly the same presentation yesterday, which is a bit of a surprise to me. I've changed the emphasis of this slightly. This is now less of an overview of the various open source projects because Matt did a very good job of that. So, it's a little bit more of a tutorial on what goes on underneath the hood and what makes a good update mechanism, I hope. Let's start with a scare story. So, why is this an important thing? So, this happened a couple of weeks ago now. A virus or an attack called Mirai created a huge 600 gigabits per second DDoS on a couple of websites. The reason this is scary is that the attack vector of this was very simple. Mirai just looks for open telnet ports, uses a well-known set of usernames and passwords, and then gets the shell and then does evil stuff. The prime target turned out to be this particular webcam, which just happened to be shipped with an open telnet port, and most people never changed the username or password, of course. So, this is the kind of thing we need to protect against. So, I've identified two problems we're trying to solve here. The first is the one on the previous slide. We have bugs in software. All software has bugs, that's inevitable. But now that we're connecting everything to the internet, those bugs become a lot more available to anybody who wants to take advantage of them. That's the prime problem that I think we need to focus on. But there is also a secondary problem or opportunity, depending on which way you look at it. It's quite handy to be able to update devices in the field to add new features, maybe even add new revenue streams, et cetera, et cetera. So, conclusion then. Anything or many things that are shipping that are connected to the internet, we really need to have some mechanism for updating them. So, you're out there now and you're all trying to choose a software update mechanism. What are you going to be looking for? Well, obviously, you want whatever mechanism you use to be secure. We're trying to increase security, not reduce it here. So, that's an important concern. Robust, we don't want an update to break the device. That's not really achieving the aim either. On atomic. So, by atomic in this context, I mean that you either apply the update or you don't apply the update. But it shouldn't be possible to have a system which is halfway updated, and then it reboots and then it fails to work. So, atomic is a big important thing here. Fail safe. So, even if things do go wrong, you don't want to be able to recover and get back to at least a minimal working system. And the update itself needs to preserve the persistent state of the device. Can you hear me still? Not if you can't. Thank you. So, what are we going to update? So, this is a diagram showing the main components of an embedded system. Bootloader kernel, root file system, applications, ranged against, on one side, the frequency that you're likely to want to update these things. And on the other axis, how easy it is to update. The main conclusion from this diagram is that the one component that's really hard to update and is therefore not usually updated is the bootloader. The other components are increasingly easy to update. Certainly the root file system, certainly the applications need to be updated. It's quite possible the kernel needs an update as well. So, just to reiterate, I'm talking about updating everything, but not the bootloader. That is a constant. Why not the bootloader? Well, the problem is that the bootloader is usually a single point of failure. If you get an interruption whilst trying to update the bootloader, the chances are it won't boot again. It's very difficult to have a redundant bootloader. The next question is, what are you going to update? So, file, package, container, image. So, just going through those. Updating a file at a time seems like a trivial thing to do, but it turns out that file-based update mechanisms, it's very hard to make them atomic. In other words, if you have a group of files that need to be applied as part of an update, it's difficult to make that atomic so that the entire group either are updated or not. So, file update doesn't actually work. Package updates. So, we've been running Linux distros for decades now. We update a Linux distro by using AppGet or YAM or whatever you may have. But it turns out that package updates also are not atomic. So, in reality, they are not an option for embedded devices. I'll come to that on the next slide. So, that leaves us with either container or image. So, containers are a neat idea. The idea there is borrowing on the work with container systems such as Docker on cloud services. Why not apply the same technology to embedded devices, make your application into a container, and then we can just update the entire container in one go, and using the magic of Docker, we can do that fairly atomically and other similar systems. I'm not. Actually, I do have one example of a containerized update system, which I'll come to at the end. But most cases it comes down to the bottom option here to do whole image updates. So, do an update of the root file system as a block, and that's mostly what I'm going to be talking about. Let's come up to this package theme because this keeps on coming back. People keep saying to me, I've got 10,000 Debian servers in my data center, and I can update all of them with a single shell script. Well, that may well be true, but a server running in a data center is not the same as a device sitting on the top of a mountain or at the bottom of the sea. It's a different use case. So, whereas servers are in a nice secure environment that kept at a constant temperature, the power and the network are protected and are unlikely to fail. And if they do fail, they're always accessible. Somebody can walk around and plug USB sticks into them. Embedded devices, they're different. They have intermittent power. They can be powered off at any time. They have intermittent network connections. If you're using mobile data, the link can easily go down at any time. They have low bandwidth. And in the case when the update does fail, it's not a simple question of just walking around a nice cosy room to do the reboot, you've got to go to the top of the mountain or to the bottom of the sea to update the device. So this is why I say, app to get update doesn't work in the embedded world. So looking then at image updates, there are two common approaches. And they call various things. First one at the top is symmetric. Also referred to as AB mechanism, where you have two redundant copies. Well, you have a copy and a redundant copy of the root file system. And you have some kind of integration with the bootloader, which I've represented as a flag. And depending on the setting of that flag, the bootloader will boot either copy A or copy B or copy one or copy two, as it actually says on the slide. So to update this, you are, for example, running, the live copy is in OS copy one at the top. You then update this partition here. Then you reboot having toggled the boot flags so next time you boot into this one. And then next time you need to do an update, you do it the other way around and you flip back again. The main drawback of this historically at least is that it means you have to have two copies of the root file system partition, which can be quite large. So in the case where you have limited flash storage, it eats up your flash storage budget. With the falling prices of flash storage and the move to things like EMMC, that's less of an issue because flash storage is so much cheaper now. However, one mitigation of the amount of space you need in the symmetric case is the asymmetric case. So in this we have just one copy of the main OS, but we have a much smaller recovery OS so that when we want to do the update, we boot into the recovery OS and then we use that to update the main system partition. So this is the way Android used to work and then in the recent Nougat release, they've at least as an option switched to symmetric update, which they are hailing as a big new feature which nobody's ever heard of before, except we've been doing it for decades. One aspect of doing a whole image update of the root file system is that the root file system has to be stateless because every time you do an image update, you are restoring the state of the file system to that when it was built. So if you have a network config and SSH keys and other things in your root file system, they're going to get overwritten unless you move that persistent state out to a different place. So this is something I talked about yesterday and if you're interested still, there's the link to the presentation I did. So the next thing I'm going to talk about then is the update agent. So we need something that's actually going to apply the update. Sorry, let's go through this story in sequence. We need to receive the update from somewhere. It can be from local storage, USB thumb drive for example, or it can be pulled from a remote server somewhere. Then we need to apply that update, in other words write it to the appropriate partition. Then we need to twiddle the boot flag and force a reboot so the bootloader will then boot into the newly downloaded system. So I'm going to look at two open source image updateers starting off with SW update. So this is a classic, if you like. I'm not sure if I used the word classic but something that only existed for 18 months. Nevertheless, let's use it. A classic image-based update client. You can go and get the code from here. Documentation is here. This is maintained by Stephen Bubbage, who is sitting in the middle of the audience right there. This was the first, or at least it applies, I know, the first open source update client to become available. It's been around for 18 months, maybe 24 months, something of that sort. So it's the oldest, it's the old man on the block basically. So it supports both symmetric and asymmetric modes. Of course, it supports U-boot, since Stefan works for Dinks. It does all the things you would expect. It supports raw noun flash using MTD. It supports UB volumes. And it supports regular file systems partitions using normal partition formats. It has some integration with Yocto project. There is a meta SW update layer, which will help you build systems for software update. And amongst other things, it will build you the recovery OS image as an init RAM disk. It uses Curl, I believe, to pull in updates from remote systems. And it has integration with a thing called Hawk Bit. It will come, not actually to Hawk Bit, but to the other end of the sequence. If you are doing remote updates, you kind of need something to control the servers that control the updates. Hawk Bit is one mechanism for doing that. But I'll come to the whole remote update thing in a few slides time. So that's all I'm going to say. If you want a little bit more detail, Matt Port has certainly covered this in more detail in his presentation. And there is plenty of stuff at these links here. Next I'm going to mention the RAUK, Robose Auto Update Controller. This project started a little bit later. The aims are essentially the same. It's another image-based update client. It's an LGPL license of a GPL. And you can go and get the code and the documentation from here. So RAUK is written by Jan Lubber, who works for Pengatronics. And so it also supports symmetric, asymmetric update mechanisms. Syncs is a Pengatronics thing. It has good support for Bearbox. Not so good support for U-boot. And also supports Grub as a bootloader. It too has support for all the usual partition formats. It has Yocto project integration. Plus it has PTX dist integration for the obvious reason. As well as local updates, it handles remote via streaming product, well, using Coal. And it has verification and so on and so forth. So again, this is very much an overview. Go and read the links on the previous slide for the full details. So there is two well-known and operational solutions for the basic update client component. And this works fine for the local update case, the classic man with a USB thumb drive, who goes around updating each machine individually. And since typically these update clients also support remote streaming, they allow you to download an image from a remote server. But typically that download is user initiated and is user attended. So for these reasons, what we have so far doesn't scale to large deployments. So the next thing I want to look at is the kind of current hot topic which is OTA over the air updates. And I define an OTA update mechanism as one in which in addition to the update client running on the device, we also have some logic behind that which can manage the updates and push an update out to a population of devices. The update itself can be done totally automatically, so there is no user intervention. So that's handy in the top of the mountain or bottom of the sea scenario. Or it can be semi-automatic as with your favorite mobile device where the update is pushed automatically to the device but you have to do something to actually initiate the installation, the final step of the process. Either way around, the mechanism is the same. So now we have in addition to a population of devices, each running a copy of the update agent, we have a whole bunch of stuff at the top of the diagram now to manage the deployment to those devices. So we'll need some kind of update server. There needs to be some kind of communication between the server and the device so that ultimately the server can push updates to the population of devices when necessary. In addition to that, you need some procedures in place for creating those updates and pushing them to the update servers. So typically that requires some kind of build system integration, maybe with the Octo project or something similar. And there is a component actually not shown on the slide, but not essential but very useful. There is typically some kind of management console so that you can send instructions to the update server to control updates to the devices, do them in phases or campaigns or whatever. And also since you have this all in place, this is a good place to do device monitoring. So typically there will be some kind of back channel so the device can send information back to the server. And then you can see on your management console, on your update console, you can see the devices that have received updates, what versions they have, maybe other status information. Okay, so the whole thing now becomes a much more complex management system. So this actually complicates things considerably. Since we're now pushing automatically updates to devices and potentially applying those updates automatically, we need to make sure, well, authentication, is the update that I'm receiving, is it legit, is it from the manufacturer, or is it from some evil hacker from some evil organization, unnamed. So we need authentication. Security is kind of handy as well. So this is different in that, this is basically encrypting the stream and adding in checks into the stream so that essentially, am I receiving what you're sending or somebody in the middle of changing stuff? Is this a replay attack? And so on. We really need some kind of automatic rollback system so that if an update fails, we will go back to a known working system. We need to, we need the system to scale so we may have tens of thousands or hundreds or millions of devices that we want to update. And as I said previously, it's quite handy while we're doing all this also to put in the back channel so we can get status information from the devices that we have updated. So some of all of these things are what we would expect in a complete OTA update solution. Just a couple of thoughts about rollback. In other words, how do you recover from a partial update? And typically there are two levels of achieving this. So the first is to use some kind of protocol between the operating system and the boot loader. And one such mechanism is the boot count limit which is a new boot. The idea is then that you increment the boot count when the boot loader boots. And then once the operating system is up and running, you run a command to reset the boot count. If on the other hand it doesn't get reset and for example it just crashes and reboots, then the boot loader boots up it finds the boot count hasn't been reset and typically it will then go into recovery mode and in the AB image example it will boot into the last good image. In the asymmetric case it will boot into the recovery OS. So that's the first level of defence. But we do have bugs which would prevent that from working. If the system hangs before you even get to a point where it could reboot we just hang in a buggy device driver for example then we're never going to get back into the boot loader. So in this case you need some hardware support which is obviously going to be device specific but really we want a hardware watchdog so that if we don't get to the point of triggering the watchdog daemon the watchdog times out we do a hard reset of the CPU and you can tell when it resets by reading a register the reason for the reset the boot loader can read that and then once again go into recovery mode. So these two things together should give you pretty hard almost bullet proof guarantee that the system is going to boot and if it doesn't boot it's going to recover. I'm going to mention two systems which implement OTA end-to-end OTA update mechanism. First of all Mender I should probably give it a bit of disclosure here I've actually been working with these guys over the last few months on their system so I'm slightly more familiar with what they do than the other guys Nevertheless I try not to give any bias here. So Mender is a end-to-end update server and client it uses full image systems it uses actually symmetric AB update mechanism the client component is is open source and is available here the server component at least currently is not open source okay it didn't used to be okay and the license is Apache 2 okay so can you mentally scrub out this little bit here in your image of this slide and I'll update online when I get a chance so that's good the code is available here documentation is available here some of that documentation is written by me so if you don't like it it's my fault so what it do the update client is a classic AB update mechanism it has integration with U-boot supports regular partitions on EMMC and MMC cards for example it has a rollback it has integration with the OCTO project with a meta-layer called meta-mender and the the Rope features include all the things you need to implement a full end-to-end OTA update the other guys you will see at have a booth upstairs are resin IO these guys essentially doing or trying to solve the same problem but in a slightly different way so the main concept between behind resin is that it's a container-based system in other words your applications you put into Docker containers and then it has mechanisms to atomically update those applications on your population of target devices in a similar way the client software is open source but I was speaking to them just now and at least as of half an hour ago their server code is not released but they say that they will indeed it is their intention to release at least some components of the server code when it's when it's ready so resin is kind of a bit of a hybrid in that it does actually have a symmetric AB root-of-fest update mechanism for updating the core part of the system so maybe I should have had a diagram here but I don't have to draw it in the air so essentially you have with this kind of mechanism you have a base OS which essentially then is the host to Docker and then on top of the host OS you use Docker to load the various application images so the base OS you can update there is a mechanism called resin hub I can't remember what the hub is stands for, never mind and that will update that but that's not really what resin is about that's what I'm trying to say is to update the Docker containers of your applications so it's kind of quite neat it gives you a little bit of overhead on the target because you have to run Docker and all the dependencies Docker has which expands the size of the core OS somewhat but hey flash memory is cheap these days it has a little feature in its support for Yolctor you can actually build Docker images into the base OS so you can actually pre-populate a bunch of dockers if that's your idea remote features deployment server it has a handy little thing integrated with Git if you use their build engine essentially you can just do a Git commit and then a Git push to the appropriate place and the build server will then pick up that Git push build it into a Docker container and deploy that Docker container for you funky okay and Brillo I'm not going to do much detail here because I see you already know what Android is Brillo is just a cut down version of Android it's essentially Android without the Java bits and without the UI bits and it is one of the operating systems that Google will have for IoT devices so Brillo therefore does the same thing as Android does it has both symmetric and asymmetric image update options and as with Android although the OTA update agent is part of the Google repos the server components are not and you will have to essentially write your own update server in order to communicate with these and yeah okay and that's essentially it so that was a quick run through the the basic technologies involved with update first of all talking about client only update agents and specifically using software update demon which I've missbuilt here and Roug and then I went on to look at N2N solutions using over there update updateers and I look specifically at Mender and Resin oh actually in Brillo I should put Brillo on here as well these are by no means the complete list I could put in those under those subheadings as I say Matt Porter gave actually a more complete list of things there so go read his presentation if you haven't seen it already and once again follow the links online you'll find a bunch of this stuff so we have I think 5 or 10 minutes for some questions and before you ask questions let me just point out that the slides are uploaded on slide share they're also on the e-linux events page and I have also to the Linux foundation web thing but it takes a little while for that to percolate through but they are at least available here that's definite so who has some questions yes so the question is how do you update the kernel basically so the none of the solutions I've talked about have a specific case for updating the kernel usually the situation is that the kernel image is part of the root file system or at least it is perfectly possible to put the kernel into the root file system and then use uboot or whatever bootloader to read that kernel file from the root file system if that doesn't work for you you can always it's just another image in the same way as the root file system image but you'd have to do an extra bit of work to make that work so typically in the symmetric case then the update client is running on the live operating system so there's no reason why you can't write the backup copy whilst the main OS is active and indeed that's one of the features of Nougat that you can do simultaneous updates as they like to call it if on the other hand you're using the asymmetric case then plainly you have to boot into recovery mode in order to do the update so in the case of your light you have to do this at night when the light's turned off for this to work well buy ordinary lights then don't buy smart lights yes sir so I do actually attend your very interesting talk on sensitive systems sitting just below the waves yeah exactly so you have a computer almost at the bottom of the ocean so hmm I guess the answer is kind of so I mean using the AB mechanism your overhead is basically twice your existing root file system so if your existing root file system is 100 megabytes then you need an extra 100 megabytes of flash memory to do the redundant copy it's nowhere near that much unless I so I mean the the essential update can somebody pull the door to maybe so the essential overhead is the update agent on the device itself and the update agent even in the case of Mender is a few 10 megs or something 7 megabytes so the overhead is about 7 megabytes to the root image but there is a bit of a caveat here and this isn't specific to Mender but it is an interesting case to bring it up so most of these update mechanisms use fairly recent versions of the kernel and other utility and other infrastructure specifically in the case of Mender it requires system D for example you are probably not running system D so that would be the overhead to achieve that so yeah so exactly yeah so as Stefano rightly points out the other way to do it is to use the asymmetric case and the specific advantage of the asymmetric case is you only have a small recovery OS which is typically just a RAM disk 10 megabytes or something okay so that's an interesting yeah well to be fair these all require a reboot so that's not per se an issue yeah so you could use the bootloader as essentially the recovery OS becomes the bootloader I'm not very keen on that solution myself exactly so I exactly so just to wrap that up this is a use case and people do use it I would not recommend to use U-boot as the recovery OS it is a restricted environment and also remember from one of my earlier slides the one component where we can't update is U-boot or the bootloader so it is a good idea to keep the functionality the bootloader to be minimal so as it works without problems and to put your potentially buggy software outside of the bootloader so that would be my view but on the other hand you have a very specific case you don't have a huge population of devices to roll out so you know anything is possible this is software you can do anything in software okay any more questions otherwise I'll do a wrap up it's gone quiet yep okay so thank you all very much and enjoy the rest of the afternoon