 All right. Well, thanks for coming back. I see a lot of familiar faces that were here when I gave a talk earlier Again, my name is Drew Mosley. I'm with Mender. I owe And we are an open-source project for deploying over-the-air updates. I'm guessing this talk is going to be primarily about the key considerations in designing an update strategy and an update system I'm guessing based on some of the questions and content earlier discussions. We had after my previous talk there will be Hopefully a lot of questions Generally this talk is more of a 45-minute talk and it starts with a lot of general stuff But I'm thinking the folks here probably don't need all the details so I'll kind of try to blast through some of the Initial slides here and get more into the meat talking about actual updating strategies and that kind of thing So if I'm going too too fast for something here on the initial couple slides You know, feel free to speak up raise your hand get my attention and I'll be happy to slow down But in before we started designing our product we did a lot of interviews with embedded developers and we are specifically took Targeting the embedded market for this update or so at the time these these slides were put together It was about 30 interviews and I know it's more than that now But I don't I don't know how much further we've gotten talk a little I want to talk a little bit about What's unique about in updates in the embedded market? Based on the discussions earlier about the number of folks that are doing embedded connected devices It's probably not going to be nothing too surprising in there and then we'll then we will dig into some of the actual strategies and Design decisions you need to make when you're doing an update Like I say here's all my contact information. I'll also be at the end of the presentation if I you know We don't get a chance to Answer your questions here feel free to reach out to me and I'll be happy to chat with anybody over email or Twitter or whatever This is one of those slides that we always put in especially when we're dealing with less technical folks and more marketing Focused folks. I don't think there's any any big surprises here The net is there will be bugs in your code after you deploy it to the field and you've got to be able to fix it And the two primary Design decisions that we went into our product design are that the Updater has to be robust and it has to be secure and I'll get into a little bit of that In the next couple of slides on what that means But just keep in mind that that was our our number one design Decision any decisions we made if it reduced the robustness or security That we won't do that in our product and for embedded we feel that's the right way to do it Obviously things like secure communication Cryptographic signatures that kind of thing those are all all all required for for any any any real system today And in a lot of cases a lot of the folks we talked to did their own homegrown system Some people here mentioned doing that and there are plenty of projects out there that you can use Many of the homegrown systems we've seen get put in very late in the In their design cycle and corners get cut robustness and security are you know done as an afterthought So homegrown may or may not be the right thing for you Of course, I guess if you're attending a conference like this, you're probably thinking about those things ahead of time So it's probably less of an issue. I Think these numbers here that came out of our survey They they kind of jive pretty closely to what we did the the conversation earlier In this audience, I'm kind of curious how many people that that have Systems in the field have a means to deploy updates to them automatically Okay, and how many have updates that are not automatic Okay, so a few more not automatic than than automatic but That's another criteria that we'll get into is it has to be automatic and simple The Primary mechanism especially in the embedded space for updaters is what's called the dual AB root file system And that's basically you have two root file systems that your bootloader chooses at boot time Other options are package base, which is your typical desktop distribution apt-get or yum type updates and And then there's other Less less formalized ones. We've heard of people doing weird things with tar balls and all sorts of kind of crazy things that are hard to track in the long run And and this is more this slide like I said is more for those that are Developing their own typically it's a three to six month effort a lot of customers We've talked to they think they're just gonna slap together an updater in one to two weeks and they find out that it There's a lot of details. They hadn't considered And another thing you might want to consider is how frequently are you updating? Are you rolling out, you know continual updates or is it just when there's a CVE you have to address? You know, we've we've talked to some customers that gave up on updating the base OS and decided they would just update their content files Because they just assumed the base OS would be sound bottom line most most connected devices these days Aren't where they need to be as far as deploying updates However, given the early early stage of the industry for most of this most of them are not that far behind the average We definitely see an increase in interest in in over-the-air updates Most of the conferences we go to we definitely see more more discussion around it. So That that's all promising news and obviously the connectivity and the sheer large number of Connected embedded devices is a large driver towards this So for those that aren't in the embedded space just a couple of things to consider That they may drive the decisions around an update or specifically targeting embedded in a lot depending on your use case They the devices may be remote might be hard to get to They might have Unreliable power if they're running on battery. They might have unreliable network connectivity a lot of these devices Don't even have Wi-Fi. They might have 3g or something like that Which is a fairly expensive means to download Over-the-air updates. So there's a lot of different kinds of things that need to be considered You know when you're dealing with this environment as opposed to a data center enterprise type environment And you know there are that there of course are lots of things that can go wrong So as I mentioned robustness and security top priority number one. All right Beyond that you see the other characteristics that we looked at how easy is it to integrate with your existing workflow How easy is it to get started? And then bandwidth consumption and downtime those are those are important for the end users But as far as actually getting started with the over-the-air updates the first three items on this list are really more for the system designers So when it comes to robustness and security what we mean by this is Robustness is that you can never brick the device if I download an update It is installed all the way or it's not installed at all Nothing except for the updater client ever sees a partially installed update if there's an issue with power If there's an issue with network connectivity if there's some kind of corruption in the image That's all handled by the client And the the actual system will never run in that in that state In this case the updates are always atomic they have to be There's you know with the dual labiroot file system. That's the reason we chose that Primarily was to allow that atomic and update right if you're doing a package-based update all bets are off. You don't know You don't really know what the state of any given devices without a lot of Herculean efforts The nice thing from the development perspective is this allows very consistent Deployments of of images, you know exactly what's on the device and it should be exactly what you tested in the lab You have a single binary that gets that runs through your qa and that's the same binary that gets installed out on your device in the field And then additional checks sanity checks after update our our system by default On reboot if the system is unable to connect to the server It assumes a failure and in that case it will automatically roll back to the previous To the previous installation. There are also additional Mechanisms to do additional checks. So your your particular design Or use case may have additional things that wants to check. I might want to check some database you know database sanity or You know, maybe a migration of some some data files was done properly So, you know, ideally there'd be a mechanism to do plugins on After the boot to be able to say is this a sane update or not? Obviously the updater itself handles the generic Characteristics of can I connect back to the server? Does it look like the system is up and running? But in terms of the actual specific use case That's generally up to the system designer. That's that's integrating the the updater system And then the the final thing of course is ensuring the authenticity of the update That's where the cryptographic checks comes in You know, you use tls to verify that you're talking to the right server and then you Have an additional check based on cryptographic Identification of the images before you install them just to make sure that you aren't installing the wrong thing. There was an incident a couple months ago with some Some embedded door locks that were used by air bnb and a number of other customers And they had some issues where the wrong firmware update was installed and it was for I believe an arm v five And it got installed on arm v six or something like that So and the the lock was completely bricked at that point So, you know, you've got to be able to ensure the Authenticity you've got to be able to ensure that the target the artifact is targeted at the specific device So there's a number of checks you want to do And you you always want to Ensure that another Update can be done. You always want to have a known good rollback So that's you know In the most generic terms In terms of interning with existing environments, everybody's starting with a lot of history You have a you have a number of existing tools. You have hardware that kind of thing So ideally if there is a third party update or you're looking to go with That there are means for it to integrate into the to the environment that you're using And in our case where a yacht primarily yachto based there's nothing Inherent in our design that it that requires yachto. That's just the low-hanging fruit that we've chosen And you know at some point we hope to be able to to branch out from that And then of course one one thing to consider is do you have devices in the field that have nothing installed that you Want to install the updater on and we've got a slide about mechanisms to do that That's a a lot trickier because typically these updaters will require partition updated partition structures and that kind of thing And that's virtually impossible to do In a robust fashion when the devices are remote and you don't have physical hands on access to them Another another nice feature is is there a standalone mode or is there significant back-end management infrastructure that's associated with the updater We provide both a client and a back-end management server But it's possible to use a lot of these updaters Just as a client running on the device and then you write some custom scripting around it If your needs are fairly modest, you don't necessarily want to set up a management Web back end to manage a large fleet of devices. For instance, you might just have a few small devices That you can throw together some bash scripts and you know just use the the client in a standalone mode Invoking the client directly on your target device And then you know how extensible is it I mentioned the ability to plug in your particular use cases And do sanity checks and that kind of thing based on your particular use case And how easy is it to get started is the documentation good? Do you feel that it's uh well tested again? These are these are more of the uh non-functional requirements. So we'll move forward And as far as your end users bandwidth and downtime, obviously are a big concern If it eats up a lot of 3g data that can get expensive If it takes 20 minutes to reboot into the new image, obviously our end users don't want that so A couple of things to uh to keep in mind when you're considering How you might want to integrate an update system into your environment So let's uh dig in a bit to some of the specific strategies Installer strategy number one This is kind of the one I was I mentioned a minute ago where you have a system already in the field and you might want to Make it while it's deployed somehow available for updates And this is just an update or client that runs on the target And and and does some level of updates robustness robustness is difficult here Obviously if you have to Manipulate the partition structures. That's a bad idea. So It's very difficult to get a robust setup in this environment similar with atomicity It's not impossible But it does require some extra extra steps to make sure that your your installations are still maintained as a Obviously it will integrate very well into an existing system It's generally just a a client executable of some kind that would install into your running system Typically it's low band with use although I guess that depends on exactly how much data you're installing with it And usually the the downtime is fairly short The the systems can usually stay up and running while the updates are going on in the background And once the updates are complete then the system can do the reboot if needed Another strategy that could get similar results is booting into a maintain maintenance mode Where basically you have a small root file system that you boot that handles any updates This can this is the biggest issue on this is the long downtime your system is up You have to reboot into your maintenance mode Which then does your update and then it reboots back into the system mode Additionally there could be extra reboots if there are fail if there are failures in the Sanity checks after booting the update then you would have to reboot reinstall the original In rollbacks or rollback does potentially add quite a bit of time to your system here And in bandwidth obviously it's fairly high because you are downloading an entire image There are mechanisms to minimize that using delta updates and that kind of thing but It can still it can still slow slow things way down And this is the one that I that I mentioned specifically for us that we're using And then I know I've spoken with at least a couple of folks here that use this The the big downside of having two root file systems is that It it cuts down your storage quite significantly You have to create obviously two partitions in your storage space The way the way flash sizes and disk sizes are going that's less of an issue But there are certainly still markets where this becomes problematic Some of the advantages here it is robust fully atomic fully consistent The assumption here is that all the logic to select whether it's a or b is handled in the bootloader So that you are able to update new kernels and root file systems That has the implication that you're not updating the bootloader in the field But that's a pretty standard architecture and and Generally speaking bootloaders are small enough that that shouldn't be too big of an inconvenience It integrates fairly well, but there's usually some you know additional partitioning that needs to be done in that kind of thing Again, it can have a high bandwidth use But the biggest thing is it can have a very short download time the the the new image can download and be written to the storage Without interrupting the end user and when it's time to boot into the new image then simply Simply reboot and the user has to deal with A single reboot time or two if they're if we need to roll back Because rollback at that point is simply switching back over to the previous one that was known good before And i'll i'll mention this one just briefly. I'm not sure that it's terribly appropriate in the embedded linux space The idea is that you've got an external system that acts as a gateway to deploy images This is typically used for smaller systems running embedded embedded operating system sensors actuators that kind of thing There may be there may be some use for this where embedded linux is running on the target devices But that's not something that we've seen a whole lot and obviously The ability to manage the deployments. That's another thing you've got to decide if if it's relevant for your use case Do you need a web infrastructure to maintain your device fleet or is some kind of scripting on the device? simple simple enough in your case So with that, I think we've got enough a few minutes for some questions and I will open the floor to anybody that has questions One second for the microphone and i'm not working now either Yes Sure, so the so the question is how do you deal with user data specifically in the in the case of a rollback, right? So the idea being I install a new image Maybe it does a database migration or something to a newer format and then I decide I have to roll back That's going to have to be done by the plug-in architecture. That can't be obviously can't be done in any generic fashion Because the updater itself won't necessarily know but if that, you know, if you've got a plug-in that can come in and do some kind of post installation Migration to some new format or something then there would also have to be a a callback that says, okay We're rolling back. You need to undo whatever you've already done And in in in general I can speak for for for our updater our default partition structure We have the a and the b root file systems, but we also have a persistent data partition So the the so that is not touched on any given update So, you know, there is definitely a place to store that that data, but the manipulation of that data obviously has to be handled very carefully Okay. Okay. Very good. Any other questions? In the back Right. So the question is, you know, what about things like butterfs or nyxos that use various techniques to To to not require that dual partition, but still get the atomic and robust guarantees that you get with the dual partition and I can say definitely we have looked at it It's not something we've implemented today, but it is uh, You know, I see there's obviously more work involved in that the dual ab is fairly well understood So we picked that as you know from from our particular project perspective We picked that as the low hanging fruit, but we are certainly considering those things And you know, I'd love to to pick your brain about it later You see you've probably looked into it a little bit more than I have I've I've only, you know, touched it at the very surface level to understand that it's feasible But that's about as far as I personally have gotten with it Yes Okay, so the the the question is if an update fails multiple times and you attempt to write a bad update Numerous times will that affect the the lifetime of the flash is that? And and and it most certainly will and that's you know a design decision In our case if an update fails it's not going to try to reinstall it at that point You know the update has failed it requires operator intervention to figure out what to do next If it was just some Some glitch in writing the flash, you know, and we just say yeah go ahead and do it again That's fine, but an operator would need to intervene To to to do that. Yeah, we don't ever have any automatic reinstallation after a fail Yes, sir I'm sorry. Can I mitigate with what? Yes, precisely the the question was on the slide On this slide here the the asterisks next to the high bandwidth use Yes, they you know can that be mitigated with things like delta updates and absolutely it can and and that's uh For speaking from my project specifically that that's a feature that we are working on Everybody we talked to that's one of the first questions they ask so we definitely want that feature out quickly I know Leonard's given a talk on ca sync. That was one of the options that would that we're considering And I know there are quite a few options out there It's it's kind of a tricky thing to get right You know because if you decide that You require the the a and b to be read only root file systems in some ways That makes it simpler because then I can do the the delta calculations on the server based on the image that we know Was installed whereas if the the a and the b root file system are read right Then I have to have some kind of mechanism where the calculation is done on the device Since the data may have changed and we don't know what it is Correct that would only help with the bandwidth use something like butterfs or or some of those other techniques Could potentially help with some kind of it's some kind of an overlay structure In the file system that would that would help mitigate the extra need of multiple partitions Yeah, so the question is where are the checks done so there are checks done before we boot So that would be the cryptographic check sums and things like that and then there are checks done after we boot Our generic setup is does the system come up far enough for Our client executable get running and be able to communicate back to the server That's the minimum step we can do on the reboot and then then the in our case There's a plug-in architecture where you could plug in Additional executables that that verify sanity and other means that that are specific to you All right time is up. Thank you very much