 This is the outline of the talk. I will first provide an overview of the Quark bootloader. Then I will focus on the firmware management functionality. And then I will discuss a security extension that is coming in the next release. And finally, if we have time, I will discuss some internals. And specifically, we manage persistent data in the bootloader, metadata in the bootloader. And finally, some concluding remarks. So the Quark bootloader is the reference bootloader for the Intel Quark microcontroller family, which is basically the Intel Quark T2000 microcontroller and the Intel Quark SC microcontroller C1000. The bootloader is developed as part of the software stack that we have for this family of microcontrollers. And the software stack and the bootloader is available on GitHub and released under a 3BSD license. So what are the main features of the bootloader? Obviously, we have some bootstrap features, initializing the system. And we also support some functionality that allows to restore the context from deep sleep, because this is not done in Arthur. And we also have some first time initialization process, like the Trim Code computation. In the next release, we are also adding some security artening features. But the main topic of this talk is actually the firmware management functionality, which I will discuss briefly. Just let me first provide an overview of the Quark MCUs so that you can have an idea of what kind of hardware we are talking about. So both the SoC, the D2000 and the C1000 feature an X86 core called the name Lakemont that runs at 32 MHz. The C1000 is more powerful than the D2000. It also has another core, an Arc core, which is part of the sensor subsystem of the SoC. And it has more RAM and more flash. And it also features a USB controller. From the flash layout point of view, however, and from the bootloader point of view, they are quite similar, because we have to put our bootloader in the OTP region, which is AK for both SoC. And yes, so the bootloader has firmware management functionality, and specifically it supports firmware upgrades, obviously, but also some FM functionality like key management that is coming in the next release, actually, and some system information retrieval and application arrays. So you can query the device in order to know the application that are installed, the partitioning, and so on. And these firmware management functionality are available over two transports, both UART and USB. Obviously, USB only for C1000. The biggest constraint that we had while developing the bootloader was, as I said already, the 8K limitation because we had to fit it in the OTP, the one time programmable flash. Despite that, we decided to adopt a modular approach. And that was done mainly with the goal of achieving extensibility and the possibility of adding new transports in the future. And potentially also OTA in the future. So let's see how the model design on the firmware manager is protocol stuck. All the firmware management functionality is DFU-based. This means that we use DFU for sending images and comments and requests to the device. And obviously, DFU is a USB specification, is a USB protocol. So what we did, we adapted it for UART. And we introduced what we call the QUARK DFU adaptation, the QDA protocol. We will see it shortly. And on top of this DFU-based communication layer, we have defined an image format, the QUARK firmware upgrade image format, and the QFM protocol, the QUARK firmware management protocol that basically allows to have some functionality different from upgrades running on top of DFU. And these are, again, the one that I said before. And specifically, the most important one is probably key provisioning done over DFU. So DFU, as you probably know, stands for Device Firmware Upgrade. It's a USB standard. And one interesting feature is that it does not define any specific image format. What it does provide is basically a way to transfer data to the device with what is called DFU.load and to extract data from the device with DFU uploads. And both of these transfers are block-based. We decided to use DFU because it is an open and well documented standard, which is also used by many other projects. And it is really designed for resource constrained devices. Specifically, as I said, it has a block-wise transfer that basically allows us to write a block of the image at the time, because given our limited resources and especially our limited RAM, we cannot store our image in the RAM or in a special portion of the flesh. We must write it one block at a time. And then another feature of DFU is that all the transmission is controlled by the device, basically. So the device is giving the timing of the communication. So you never have the risk of the host fluding the device by sending too many messages or stuff like that. Another advantage is that there is already some host tools available for it. And probably the most famous one is DFU util that supports both Windows and Linux. And these open source as well. And finally, we lack the fact that DFU does not provide a specific image format, but leaves the vendor the choice to define its own. Because we wanted to add our own metadata and our own authentication mechanism in our image format. So as I said, DFU is a USB standard, but we wanted to support UART as well. So what we did is we add this QUARK DFU adaptation protocol, which is just an adaptation layer that makes basically DFU and the state machine of DFU and the communication protocol of DFU available over UART as well. And actually, the QDA protocol makes DFU available over any message-oriented transports. Since UART is actually stream oriented, we added another layer, the X-modem protocol, which is an old file transfer protocol that we use to transfer QDA packets. We basically use it to turn the channel into a message oriented one. And we choose it because of its simplicity that allows us to have a reduced footprint. So the QDA protocol basically provides all the DFU request response messages. So you feed a lot, upload, and all the other control comments. And it also mimics some of the generic USB functionality among which the alternate settings feature that we use, as we will see. I like to stress that as a separate QDA is not limited to X-modem UART, but it can be applied over any message-oriented protocol. Indeed, during the early development phase, I basically use it over UDP so that I could test the code directly on my laptop. So since we are porting DFU to a different transport, we also needed a different host tools for the user. And what we did, instead of reinventing the wheel, we took DFU till and we modified with fork it. And we basically removed the USB layer, which is libUSB and replaced it with a QDA UART layer. And the modified DFU till, we call it QMDFU till, and is available on GitHub as well under a GPL2 license. So thanks to USBDFU and the QDA DFU, we basically have a common DFU-based communication layer on top of which we transfer both upgrade images, which is the main intent of DFU, but also we transfer what we call the other firmware management request using the QFM protocols that I mentioned already. So the image format is very simple, is actually a block-wise format, which is meant to be transferred using DFU that divides the images in blocks. And in the end, this format is just an header that we add to the binary. So we prepend this header to the binary and this header will be transferred in the first block, in the first DFU block. So it is processed before the image. Yes, this is the information that we have in the header and the other is actually divided into two sub-header, a base one that contains common information for processing the image, like the vendor ID, the product ID of the target device, or the application version numbers, some metadata, but this header can be followed by an extended header and we use this extended header to add authentication as we will see later. Another interesting decision that we made is that we do not put in the header any specific explicit memory address, but we use the concept of partition. So we assume that the flash is divided in partition and every image is targeting a partition. The current partition scheme is quite simple. On D2000, we have just one partition designed to host an X86 application and on Quarka C, we have two partition, one for the X86 core and one for Arc. Other partition schemes are, however, possible, including multiple partition per core in order to have some functionality like a fallback image in case of future OTA extension and stuff like that. Yes, this is the flash partition that we have in our MCUs. So as I said, we didn't want to have just firmware upgrades, but we also wanted other functionality, available, especially because we wanted to do key provisioning, and so we defined this QFM protocol, the Quark firmware management protocol, which is basically a request-response protocol. Requests are sent using a few download transactions and response are collected, are either piggybacked on the status of the download transaction, the resulting status, or if they are complex responses, like the system information response, they are extracted using a DFU upload transaction. An example will probably make it clear, like, when we want to do a key update, we transfer the update key request and the key itself in a DFU download transfer as if it was an image, right? But it is not an upgrade image, it's a new key. And then we see from the status of the transaction, we know if the key update has been successful or not. For system information retrieval, things are more complicated. Again, we transfer the request in an download request, but then we collect the response using a upload request. So we extract the response basically from the device, that's the idea. Before I mentioned that we make use of alternate settings and to use with two is that basically, we have decided that we wanted to provide a way for the device to know if the host was going to transfer the QFM packets or images. So we decided that alternate setting is used for transferring QFM packets, while alternate settings one or greater than one are used for firmware upgrades. And specifically each alternate, we have one alternate setting for every partition. The advantage of doing that is that on the FUTil, you can list the alternate settings that you have and you get this nice overview of the partition that you have in your system using a native to the FUTil. So nothing proprietary. Obviously, we also had to implement some host tools, so not just embedded code here. We decided to use some Python to be platform independent and we have two scripts basically. One for creating the image, that these image creators script basically converts a raw image into QFU DFU image that can be downloaded to the device using a DFU tools. And then we have another script that basically implements the QFM functionality and allows it for instance, information retrieval or key provisioning. And internally these other script calls DFU, UTIL in order to transfer the request and collecting the response. So this is an example of how QFU images are created. As I said, you have to specify the binary that you just compile and then you have to specify the partition on which you want the binary to be installed and then you can specify other metadata like the application version. And then you enter film management mode and then you flash the image using DFU, UTIL or QM DFU, UTIL depending on if you want to use USB or UART. The important thing is that you basically have to specify the same alternate setting of the partition. These are a few examples of our QM manage, the QFM script can be used. And this is an example of the kind of information you can retrieve from our bootloader using DFU. And for instance, the version of the bootloader, the SOC type, the number of core you have, if you have an application installed or not and stuff like that. So as all of what I've shown so far is already available, what I'm going to show now is the secure female upgrade extension that will be available in the next release and it is expected in weeks. And what this new release will provide is basically authenticated firmware upgrades. Unfortunately, given the limited resources of our SOCs, we couldn't use a public key scheme. So we use a simple symmetric key scheme, which is HMAC. And this means that the image is verified using the same key that is used to sign it. And this also means that the key must be located in the device. And we decided not to encode the key in the device, but to provide users with the key management functionality so that they can around time provide the key. And the key can also be updated in case it is leaked somehow. How we provide the secure female upgrade extension is almost kind of simple. We extended the other adding the HMAC extended data which contains all the information that is needed to authenticate the image. An interesting concept is that we do not authenticate, well, we don't compute the HMAC for the entire image. What we do is that we authenticate the other. And inside the other, we put an array of ashes, one for each block that compose the image. So during the upgrade, what happens is that the first packet that we transfer is the other. And we authenticate the other, computing the HMAC of it and comparing the one that is in the other. Once the other is authenticated, we know that the Shah are authentic. And then we start receiving blocks of images and for every block we compute the Shah of the block and we compare it with the one that we have in the other which was authenticated. So the block itself is authenticated. A problem that we had to solve was how to ensure partition consistency, how to handle failures that can leave partition in consistent state like reset during an update. In order to do that, we basically associated a consistency flag to each partition. And this metadata is stored in the bootloader data which is some data that is handled by the bootloader in a persistent way. So what we do is when the upgrade starts after we have received the header and authenticated it and when we receive the first block basically, we mark the partition as inconsistent. And then we start writing it. And once the upgrade is terminated, we basically mark the partition back as consistent. If during the upgrade something happens, like a reset, the partition will remain marked as inconsistent. And every boot what we do is to check for inconsistent partitions and erase them and mark them back as consistent. We have to erase them because we have no way to recover from them. But it's better to have an empty partition so no application booting than an application that boots and that is corrupted and can create safety issues. As I said before, we also provide a key management feature. So the key is not encoded, the device. When you program the bootloader, the device is unprovisioned. And then you have to provide the keys. And we basically define a special key update request as an extension of the QFM protocol. And this request is authenticated as well using the firmware key, which is the key that we use for authenticated the image, but also another key, which is the revocation key. We call it revocation key. This double signing adds some security. And it is important to note that the key update request is not encrypted. This is obviously not fine for OTA, but since we are supporting only wired and point-to-point transport, like UART and USB, the risk of a med in the middle attack is negligible, I'd say. So it is fine with our current configuration, but if we move to OTA 10, then this is a decision that we must revisit. And as I said, we have two keys, the firmware key and the revocation key. The firmware key, the importance of having the revocation key is that, and double-signing the update of the key, is that if the firmware key is leaked, an attacker can update, can change the firmware, but it cannot update the key. So it cannot take complete control of the device. And at the same time, if the revocation key is leaked, but the firmware key is not, again, an attacker cannot update the key because it also needs the other key. That's the main reason of having the two keys. And the revocation key can be updated as well using the same mechanism of the firmware key. So again, there is a double signature there. These are the packets that we use. And basically they are the same packets. The only thing that changes is the type that identifies which key must be updated. And then we transfer the key as well and we compute the H-mark of the packet, using both keys, which means we compute the H-mark of the entire packet using the firmware key and then we get an H-mark and on this H-mark we compute another H-mark using the revocation key. And the final H-mark is compared with the one in the packet. First time provisioning is a bit special here because you don't have the key set yet. So what we do is we define some magic key that are assumed to be preloaded, which are basically zero. And so during the first provisioning, you must first provide the revocation key, which is signed twice with the magic key. And then you provide the firmware key, which is signed with the magic key in place of the firmware key, which is not there yet. And then the revocation key that you just set. We also try to enforce the key provisioning by not enabling the upgrade until the keys are set basically. So bootloader data. I mentioned before that we have to store some persistent data and the bootloader has to manage these persistent data in order to work properly. Well, the best example are the authentication keys that must be stored somewhere, but I also mentioned already the partition consistency flag. And this bootloader data must be resilient to update failures and possible attacks. And we try to achieve these resilience with duplication and verification at each boot. So basically the bootloader data has two copies, identical copies, store each copy as a CRC to verify its integrity. And the two copies are stored in different flash pages. This is because when you update even a single byte of a page, you have to actually delete the entire page and then update it. So you kind of put the two images in the same, the two data in the same page. And then every time BL data is updated, we update the two copy in the same order. Main first, backup then. Yeah, this is where we store BL data now as a CRC. And as I said, at every boot, we perform some verification of BL data to ensure that data is consistent. There are a few cases that we need to consider. The first one is the lack of initialization. BL data is not meant to be flashed together with the bootloader, but during the first boot, the bootloader creates it. And the bootloader detects lack of initialization by verifying that the flash region where we store the data is completely blank. So FFF in our case. And then another situation that can happen is that one of the two copy is corrupted. And this can happen for instance, because during why we were updating the copy, we had a power loss or a reset. And this is detected because the CRC will not match. And so we go and check the other copy, which should be valid. And then we copy it over the corrupted one. The other copy will contain the latest valid bootloader data. Then there is another situation in which both copies are corrupted. This should never happen because we update it since they are stored in different pages. So they cannot be updated at the same time. So if that happens, we consider it a hardware fault or a kind of hardware security attack. So for us, this is an undercoverable situation and we enter an infinite loop. We do not reinitialize the BL data because doing so will put the device in unprovisioned state so the key can be changed and security is compromised. Yes, this is the detail of the verification flow. As you can see at the end, after we have sanitized BL data, verify and sanitize BL data if needed, we sanitize the partitions. And this is the content of BL data. I'm not going to go into many details here, but as you can see, we store the firmware key and we have these partition descriptors and target descriptors which basically are our way to define our partition table. We decided to store the partition table not to our code, the partition table in the code, but to have it in BL data because in the future we may want to have some kind of runtime feature that allows us to change the partition table. And this design will allow it. So we have this concept of partition and targets. A partition is a portion of flash that is designed to host an application. And each partition is associated with a target which is a computing unit capable of basically running the hosted application. At the moment we support just one partition per target. So as already mentioned, on D2000 we have only one target, so we only have one partition and on C1000 we have two targets, Arc and AX86, so we have two partitions. But the design actually allows us to have multiple partitions per target so that we can implement for instance a fallback partition in case of OTA updates so that if the updates fails we still can run an application and we can have some kind of partition that is meant to host a different application and somehow the user can switch between the two. So the possibility are many here. And another thing that this design allows is having what we call external target or partitions. So on our board we can have some, for instance a BLE module. So some peripherals that is external to SOC but they can have some firmware as well. And we can define these as a new target and a new target will have a new partition, a new partition will have a new alternate setting. And when we set that alternate setting basically we process the QFU image in a different way and instead of programming the flash we program the BLE module. This is not done yet but the design allows it. Yeah, this is the final flash layout that we have. As I mentioned before we put all the code of the boot loader in the OTP. But this is truly for firmware management over UART when we use USB in the case of the C1000 there is no way to fit the USB stack in AK at least we didn't find it. So what we did is to add the second stage boot loader that is booted only when firmware management is requested. So conclusions. I said that there are some code that we think could be reusable because it's not platform dependent. This is the DFU state machine which is also independent from actually the lower layer communication layer because we use the same DFU state machine for both UART and USB. And then there is all the DFU over UART adaptation. So the QDA protocol and DX modern protocol. They are actually platform independent and we have used basically the same code on both the embedded device and our modified version of DFU till. So it's exactly the same code, C code. And then, well, there are the Python scripts for generating the images. Unfortunately, the code for parsing the image and flashing that is SOC specific. Some, this is the last slide. Some lessons that we have learned is first of all that the modular approach pays back in embedded as well. We are glad that we go choose to not pre-optimize for footprint because this allows us to adapt to changing requirements. And as I said, we could use some of the code on the host tools as well. So not just the embedded and we somehow could validate the DFU state, the same state machine is validated on two different transport and two different possible communication errors. And then again, we try to reuse as much as open source code as possible. So instead of reinventing the wheel, we use DFU till and we fork it. The advantage of doing that was also that basically we provide the same user experience for both DFU over USB and DFU over UART. And another thing that we find great was using link time optimization. That basically offset most of the overhead that the modular approach introduces. So combining the two was a great thing for us. We could save from 15 to 20% of flash of footprint. The only drawback of LTO is that complicates debugging quite a lot. So that's all. Thank you for your attention and any question? Yep, I'm not totally getting it, sorry. Yeah. Look at the re-vocation key with the DFU key. And that results in a valid key, different. And that double encryption is exactly the same which is going to alter the key, not increasing your cryptocurrency. Well, you need both key to update the key. No, I'm using the equivalent key. I'm sorry. Because of the way the algorithm works, if you encrypt with the two separate keys and two steps, it's the equivalent of taking that and using the third key to end up with the same result. Yes, thank you. Oh, yeah, I see your point now. Yeah, yeah, that's true. Yeah, the point is that the third key is not stored anywhere, it's the combination. Yeah, well, the idea was defending from a leaked key. So you had both key to be leaked. But yeah, if you can. Or find you for the key. Yeah, yeah, but then you have to find it somehow. So it's a different kind of attack. It's not a leaked situation. No, but I see your point here. It's a good point. Yes. What's the scenario that you think of? We just imagined that at some point a fender may decide to just delete the firmware that it has on his device. It's just something that actually we got asked to. So. You go into this infinite loop. Yeah. It might be tampered. Yeah. Yeah, we got asked. Not the brick device, but temporarily render it in an output loop, maybe. We just got an extra loop. Maybe project some sort of proprietary interventions in a different like a key. Yeah. That also makes sense. Another question. Next up, do you see what's quite an option with an Intel without? Sorry. Can you repeat the question? Do you see what's quite an option of this architecture inside Intel? I don't know. We also, yeah, that's what we also, we will see. Yeah, that's an intent. Another question. Okay. Thank you very much.