 All right good afternoon So to start off how many of you have been to at least one session on Over-the-air update there's or update there's Ring the conference. Okay. Almost all of you Yeah, so there's quite a few of them and The challenge for me obviously is to try not to bore you by talking about something. That's already been spoken about So it gets harder and harder as I'm probably the last session about Updaters, so I'll try to talk About or cover more in-depth things that haven't been covered already And So just a quick introduction That's my name. I stein stanberg very close to einstein, but not quite I've been working for seven years in in Systems and security management software Have a background from computer science Cryptography so security related and you have my email there if you have any comments feedback on the talk Either let me know afterwards or Email me greatly appreciated as I do talks across Several conferences. It's always good to improve what people like and Don't like So Mender what I work on now is Over-the-air updater for Linux We integrate with the Yachto project to make it easy to work with over-the-air updates for If you base your build on the on the Yachto project tools It's fully open source a partially License version to the way it works. It's using a dual a b root of s We also have a remote deployment management server, which is also open source And yeah, we're still actively developing it You can test the server as well if you if you try online That's quick overview. So This May not been used to you since you're here But the proposition is that you must have a way to update the connected devices a couple of examples With respect to bugs It might be a bit Small for you to see here, but basically these are the kernel versions that you can see on the vertical axis for the Linux kernel and The red bars indicate critical security vulnerabilities that are in Represent in those kernel versions The orange bars indicate High severity Issues with the kernel so you can see at least one of the Red bars there stretches pretty far. So if you're on One of these kernel versions you should have a way to to fix it and obviously there are probably things We don't know about yet also. So that's typically the motivator for over-the-air updates to to fix fix these issues that we don't yet know about and There's also a couple of interesting examples on this vulnerabilities happening One was the fiat Chrysler hack last year where they managed to compromise a car and Control it remotely due to a well in part due to a vulnerability in software And of course you also want the ability to deploy new features. Yeah, so Recently, there's been several tools that are open source to do over-the-air updates or do software updates But still most companies fright it from scratch, I guess it's because it's fairly recent that there are more generic tools to do it and We'll cover a bit more on this later And Obviously the reason to reuse one of the existing tools is to avoid the development cost So you can focus more on what your core product should do rather than messing with Copying files and making sure this is very robust and One of the counter arguments That I frequently have seen to using an existing tool is that Doing an over-the-air update is so easy So you can just download the file and Then you just run it, right? It's just a script copy over some files And it will work But if you look a bit down the under the hood You will find a lot of interesting challenges and we'll look a little bit at that as well And then there's the maintenance aspect, of course So if you manage to make it reliable and it works then how long are you going to maintain this? Updater is Your product out there for five years. You still need to maintain this up later. Do you have One product or five products ten products you need to tweak the update they're based on the products So there's all this costs that Come later as you develop your own tool so of course I Have my favorite tool here. So unlike the other presentations I will not talk about tools or serving tools because You shouldn't trust That I'm not biased but This is how I structured a session instead so We've talked to Well, probably close to a hundred them by the developers at this point But we did a survey. We asked Consistently some questions that I'll cover across 30 interviews that we've done. So It might give you some some interesting insights if you have your own up later to like what other people are doing And then look a bit at the embedded environment and Criteria that people say they have for the embedded updater and Then a bit more on the how you can address those problems or requirements so the first thing we asked was Do you have a way to update your software and Well half said no and the other one they said yes with a homegrown solution So it was really hard to find somebody that actually used a More generic tool. I think this has changed. This was done. Maybe Between Six and nine months ago now. So I think it's changed a bit. I think more people are using Using a tool that's purpose-built for doing updates. I Don't know if you recognize so how many have their own Updater have written their own updater Okay, it's maybe one third now then and How many use A tool that's made for updating like yeah, one of the updater tools To people So I hope you're growing So then there's the eventual debate about how to do the updates the two big camps are image or package-based and The people we asked so you can always reason about this obviously but the people we asked a About half of them preferred image-based and the reasons they gave was that it's atomic Meaning that either that they disapplied or it's applied in full or not and consistent that's an interesting one where It makes it much easier for you to test that The test device is the same or very similar to the production device because if you flash a device completely then There's a very good chance that if it works in test it will also work in a production environment So for the package-based which you can see in the red here the arguments where typically that it's faster to install Easy to develop if you have a build system already that outputs RPMs or IPK G It's and you develop your home-grown tool for this Then the development cost is not that high or initial development cost And then there's always the bandwidth Concern depending on on your bandwidth, but in typically embedded we aren't Lucky to have a lot of bandwidth But there's also differences here if you use 3g networks similar Expensive networks Then we looked a bit. That's how Long it took to make a homegrown updater And how it frequently was used so typically people spent maybe three to six months varied quite a bit, but This was in the average three to six months to make it And then you have to maintain it And then how frequently you deploy updates about six times a year, but we also saw that People did it more frequently now than it used to and I Think one of the reasons is that more devices are getting connected now. So it's possible to do scalable remote updates now Or easier at least So what we found is that there is still some room to to make this better But the good thing is that if you're not doing updates yet, you're not far behind average. So and As you can see on these sessions that we have at the conference, I think at least five Talks about Over-the-air updates. So at least there's a lot of more interest for it now. So we can see that it's going in the right direction and Like I mentioned, I think the connectivity aspect is the biggest driver Also known as the IoT and bus word form Which is yeah, the same as connected embedded devices, you know any questions or does that make sense? okay, so Why so how does the update and right environment look like for embedded? So there are some things that make it a bit harder We all know this guy. Maybe it's ourself in the beginning that Think well, this can't be that hard, but First thing is that it's quite expensive to reach these devices if something bad should happen if they get pricked as you know It might be quite expensive to send somebody out to actually do a physical fix of the devices They have a long lifetime as well five to ten years maybe on average Unreliable powers or the battery might run out if you have battery You might start the updates and then you run out of battery or somebody unplugged the device for some reason And then what happens the next time you boot the device does it still work or is it able to recover? last thing is the network so For example in 3g connections The device may move to a tunnel if it's a car for example, and you Started the download of updates And then you lose network and then you get out of the tunnel and then you start to download from scratch again And then you never finish because it's this infinite loop of lost network. So you need to handle that In some way as well and you have a lower bandwidth than you have in ethernet networks obviously and Also, the security aspect is a bit interesting. I've seen some set of boxes where they made their own Updater and the way that work was that the box would accept wirelessly an update From any source. So if you were close enough to it, you could put any kind of software on it so that's That's a challenge. So you have to think about this more over these these wireless networks and This is not just hypothetical. This has happened a lot of times and there are several examples of it that are publicly known and Yeah, we don't know what we don't know obviously So one interesting thing is that So when you think about Hardware for an embedded device and I'm sure many of you have worked in this industry for a long time You have this argument. Okay, that's Sure, it's slow and expensive now, but in two years or five years It will be fast and we can use the same technologies approaches that we use For yeah cloud or servers However, I don't think that this is true for the bandwidth I Think it's it's true for some aspects of the embedded device like the storage the CPU memory And these kind of things but the bandwidth In general will not follow. So I expect that the bandwidth will always be Very constrained on embedded devices and the reason I tried to indicate here so What you might think is that? Then better devices will soon adopt 3g 4g 5g and the Wi-Fi Like the more expensive devices use so for bandwidth. I used a smartphone For cpn memory you could use a server for example, which is I think a more valid argument, but for Comparing a smartphone that costs, let's say seven hundred dollars and embedded device at thirty dollars They have very different use cases with respect to the bandwidth because for a smartphone you want to stream YouTube or Yeah, I don't know what you like to do on your smartphone But you have much higher bandwidth requirement in order for it to be useful to the end user So high data speed is very important and the users don't mind paying 700 dollars for it apparently So that's that's fine for the embedded device on the other hand cost is very There's a very high focus on cost Because of the scale so you try to reduce the cost of the hardware and you also try to use Reduce it on the data transfer So obviously 3g are not that cheap It also has smaller size and it doesn't have to have that high data speed. So for example you can In agriculture you have these devices that will be in the in the field and then maybe they measure the moisture in in the ground in order to optimize the Fertilization So you don't need the same kind of bandwidth as you used to stream YouTube videos But you need high connectivity. So especially so I live in the US right now and The 3g coverage there is not impressive If you're used to European standards So you can lose connectivity quite easily, but there are quite a lot of interesting other types of networks that have much higher or longer range, but But smaller bandwidth that I think will be more adopted in them better than on this 3g networks so That's one aspect of the bandwidth so This was sort of the criteria we found by talking to all these people about How they do their embedded updates? So the first one is that it's robust and secure meaning that it doesn't fail even though you have Have this embedded environment where you can lose power and so on and needs to be secure Second thing is that it integrates with existing environments. So what does this mean? so You typically have Some kind of build environment or some and some kind of devices out there already typically you're not starting from scratch and If you have Depending on what kind of update there you're looking at or what Approach you're taking it can be quite intrusive on on your existing development workflows So so that's one one thing that can be a showstopper for for people Easy to get started. Yeah, so Many times. I don't know how how your experience is but the people we talked today the way it worked was that The reason the updated project got triggered in the first place was that Just before launch for a six month development cycle of a new product The engineers knew that's okay. There is going to be bugs So they kick-started a quick project at the very end of the development cycle to integrate the updater So just something quick and dirty to get started and if you have If You have a generic tool to do the updates and you're trying Trying to help people that's the environment. They will be in so So it needs to be very very easy to get going with the update or otherwise people will just Use what they're familiar with and get this done as fast as possible Other requirements obviously bandwidth consumption we covered out a bit and then an interesting one is also downtime during the update So this also Yeah, so depending on how you do your update exactly this can be affected And also what kind of device do you have can you afford to have downtime and how long downtime can you afford? Yeah, so we covered this a bit an interesting property That's I think has been covered in several talks is atomic installation This is the definition I use for this So it's update is either completed fully or not at all. So it's like a transaction And that no software component can see a partially installed updates except the updater itself, of course because it would be hard to make otherwise and consistency we talked about that test and production devices should be You should have good confidence that they are the same Which is very important for QA, of course then One thing also to think about how you sanity check after the update So the one criteria you can do there generically Is that you must make sure that it's possible to do another update? So whatever happens at least you can deploy another update so you can fix it But this is also quite custom depends a bit on your specific device Maybe there's a service some people try to ping a server that they knew their applications needed to use other people want to ensure that the Specific application is running. It has some built-in health checks. So in these cases, it's not really something that you can solve generically, but it has to be Has to be extensible in some way where you can have these checks carried out And then authenticity of course most Common way of doing that is through signatures of the updates itself But yeah Cryptographically there are other solutions to it as well like HMAC, which are a bit more complicated to manage But they're more efficient So integration Yeah, so we talked a little bit about that relates to development tools build and test environment Do you have to rework all those? To enable updates and then the hardware so What storage type are you using? How big is it and what kind of network do you have? The operating systems also one component not all OSS might be supported and Then yeah, of course you also might have devices in the field. So can you integrate with them as well? Even though they are already? already have a design and then There's what I call the standalone mode. I think it's fairly Common to distinguish between standalone and managed so managed deployment meaning that you have some kind of server That can control the update process but there are still So there's a lot of focus on that obviously because that's the way to make it scale where you if you have a thousand devices You cannot run around with a USB stick to all of them every two weeks, but There are still devices that need standalone deployment because they don't have Network for example or or that's historically how it's been done in a transition period So that's also something to think about if that can be achieved with a with the updater and extensibility So custom update actions typically pre pre and post install scripts Also, this sanity checks that we mentioned earlier an interesting one is also if you can have custom installers because Especially for homegrown solutions, there's quite a few interesting ways to install an update And if you're trying to there could be a tarball or it could be some package or some custom way so the question is if you can have like this installer modules inside an update that can allow you to To install in different ways and not just one way On integration getting started. That's pretty Straightforward typically the way it's measured is how long it takes from scratch to actually having a working update It's too like I mentioned also due to the time pressure if it's too hard it takes too long There's too much you have to be there self then There's a big risk that people will not adopt it just build some simple homegrown solution instead And then the quality aspect is obviously very important for For an updater very critical components. So how do you know that so? other test reports for the updater or are there other people using it successfully that's typically how How people look at and documentation of course, so it's clear how it how it works So you don't get any surprises Yeah So for the bandwidth and downtime obviously we want this little as possible and Widely varying requirements, so Yeah, typically you have some kind of maintenance window that you have to respect and start and stop the update And yeah, it depends also the bandwidth clearly depends on the network so sigfox How many have heard about sigfox Okay, three four people. Okay. Yeah, that's a very low data rate than network So it's a new infrastructure. That's there's one company that's building a new infrastructure for networks Just like the 3d networks But they're very low data rates and I think the range is very or the range is very long So it's sort of optimized for the Connected embedded use cases, so if you use that then your scenario would Would look quite different from from other networks So this is an interesting one So I've done similar talks quite a few times and I Try to map out the generic steps you need in order to do the client side Updater in a generic way So I think I started with four or six of these boxes, but then every time I did the talk there's This one guy that's you need this also. So That's getting a bit Lengthy now, so I might need to add some even after this talk, but in general, this is how the Main parts work at least so you need to first detect the update So this could be also Local or you could call it to trigger the update if you have a USB stick that would still apply somebody needs to Make sure it the process starts or if you have a managed update then The server might tell the client device that you have to start updating Then you need to do some kind of compatibility check so Or you should so Does it does the software run on the hardware? So you want to know that before you actually start installing it You need to download it or copy it from the USB stick Do checksum authenticity for example signatures in some cases. They are encrypted especially Yeah, there are some reasons for encrypting it There are vulnerabilities. Maybe you want to try to Encrypted so that attackers cannot see the vulnerabilities until you actually install them Yeah, so why why encrypt it again if if you have a secure channel up here so it's In cryptography you always try to focus on end-to-end Yeah on the end-to-end so that whatever happens It's just an extra safety measure But it's always end-to-end encryption or end-to-end authenticity. That's the focus. So that way you're sure that Nothing went wrong in in the middle here, but in general, yeah, you're right on I also try to indicate But this is a bit environment specific with the lighter color here. So if it yeah, it's Just an additional security measure, but yeah, definitely not required Yeah, that's true. Yeah. Yeah, that's true. Yeah, so this might not be a secure channel like this gentleman points out Ideally this I think Yeah That makes sense and that also applies to the authenticity on the middle right there That's if you know the channel from the source to the destination is secure Then maybe it's not that important to do signatures, but if you just get a random USB stick then definitely want to check it So then you do this pre-install actions typically what these are are scripts that maybe migrate some configuration files or maybe the developers have changed the formats of Of some of these configuration files that you need to do in order to have the new version of the application working Then you get to the Honeypot where you want to install that date the reason we're doing everything here That depends obviously on how What kind of method you're using Then you have the post install familiar with that the sanity checks might also be custom and the last one is the what you do in case it fails, so Ideally you should have a way to roll back But at least you need to have a way to detect that something fails even though if it's just a person that Manually inspects it afterwards. You want to know about it? Yeah, so these are sort of The generic criteria so these things you can From the criteria we went through one to five earlier these things you can implement no matter how you Install your update if you use a full image or package or so you can always have this implemented So sanity checking and authenticity you can sign any kind of blob basically for example And you can understand the law and deployments no matter if you have a package or image Yeah, so I'll go through these quickly. I think these have been covered in Previous talks a bit so how how you can go about installing the updates So one is what I call runtime installation so Then you would have some kind of package manager, maybe or OS 3 or Tar G set file that you just install in user space That way it's harder to do robustness because it's harder to You don't have atom atomicity so While the package manager is installing the updates application can read partial data Or partially installed data But It integrates well Like mentioned before many people have already packaged managers or at least they're able to Package up some tar ball of the file they want to use Low bandwidth obviously the one megabyte is just some vague indicator and then short downtime Was one of the other requirements that we found So typically just install a package maybe restart a service So this is what I call the second approach is to but to maintenance mode. It's also called the recovery OS Also a bit hard to do robustness in this case Either the bootloader or you have some minimal System that runs alongside the main root of us that Updates the root of s or the user space So obviously in this case you can You can start an update and it's you can lose power and that way when you Reboot you could run into a partial update and But it integrates Fairly well as well Not that many changes to the existing system You need the whole image and then you also will have a very long down time because You first have to reboot into the recovery OS Then you have to install so right the entire image and then you have to reboot again into the root of s so You have a whole image install and to two reboots in this case this third strategy is Quite common. I think called symmetric or dual AB root of s so that's way the Image is written to the other root of s you have two of them This one is fully atomic and consistent It integrates fairly well. There are some Things to worry about with respect to the partition layout. You will use double the root of s storage You also need to have some bootloader support to switch the partition to boot from So there's some integration work It also uses high high bandwidth. You need a whole image the little star there I guess you on the front row can see it so you can Meet to get it with delta updates as you're probably familiar with The down time the interesting thing here is that You don't have to take down your system as you install the update because the updater in a Can install the updates while the user space applications in a are still running and Then the only downtime you will have is between you put from a to be so there's one reboot in downtime So it's lower than the recovery or maintenance mode updates So the last one which is a bit different Semantically is that you can use a remote system so Proxy-based updates so typically you did this in very resource constrained devices that cannot run any extra updater agents The default scenario you have here is that you have Or one example is in in home automation in the core IOT Space where you have a gateway that runs some intelligent updater may be running on Linux And then you have some temperature sensors or some lights that you can Update over Bluetooth or a sick be or another protocol So then you you you might use this strategy So but it's not really an apples to apples comparison to the others But obviously the gateway can control That device at least to some extent even when when the update fails Yeah, so this is the summary of how the pros and cons with the green and red so Depends on really what what your focus is If it's on bandwidth or it's that it needs to be atomic for example and then Remote deployment so Yeah, I think we should start moving towards this now and stop Try to stop using the USBs especially once now that we get so many devices Guess now you might not have seen all the Gartner reports, but You might not need to trust a meter, but The point is that we'll have more devices to deal with and we need to move to a more centrally managed way of managing these updates. So you typically do that with a server and Have the ability to group devices. Perhaps you have a very risk averse customer that you want to Be the last one to to get the updates and you have a very Bleeding edge customer that loves to get bugs. I mean not but at least to try the newest features So you might want to group it by customer and then roll it to the more To the less risk averse customer first There's also this word campaign management, which means that you can Do this in in phases so you can do maybe 1% of the devices than 5 and 10 maybe Before you roll out to the entire Fleet and then there's the reporting aspect that you need also To know that the update was successful or it failed. Maybe you need to retry it So these are all Requirements for the server side So To ensure that your devices can be updated you need to Figure out one of these strategies and I'm sure you've heard enough about them during this conference I also think you should at least look at the open source tools that are out there before Starting to to build your own And then yeah, of course, I'm a little bit biased since I work on the Mender So you can try out the standalone deployment in Or yeah in 10 minutes Roughly so we have some pre-built images so you can see if it fits your needs or or not and our end goal is To sleep better after all so we want to get rid of this one Even though we don't know about him as we released Devices to the field so That's it any questions or comments. Yes You talk about Yeah So the question is if Mender has a way to migrate existing devices that do not have this partition layout to have it The answer is we don't have that I Think it's quite difficult to do if you have some thoughts about how we can do it and would be interesting but Yeah, yeah, so that migration Hmm Yeah, and so one way that might be a bit easier to do it is that you can You can have this new partition layout in new devices, but then I know if you use package based updates or yeah Today you can use still use those on the older devices until they're decommissioned so Because the server side our server and I think most other servers they have an API So you can basically write a client to them. So from the server side you can manage both package based and image based deployments So of course your existing devices won't be have as robust update process as the new one But that's one way to migrate more softly any other questions Yep Most problems Alphabet Yeah Right Yeah, it's an interesting problem, especially if you're trying to avoid any kind of Critical points where you don't worry cannot lose power that any other questions comments or Are we all ready for the drink reception? Okay. Thank you