 Hi everyone, I'm Jan van den Herderbeeken, and today I will present our paper, Fill Your Boots Enhanced Embedded Bootloader Exploits via Fault Injection and Binary Analysis, done together David Oswald, Flavio Garcia and Kajs de Reza from the University of Birmingham. So, first of all, I'm going to introduce the realm of embedded bootloaders and firmware, motivate our research and also give a bit of introduction regarding vault injection techniques and I'm finally going to introduce our three targets that we've chosen for this research. Then in the following three sections I'm going to explain how we exploited each target and then finally I'm going to explain a little more about some bootloader design directives, which we call anti-patterns and summarise our research. So, first of all, what is an embedded bootloader? The bootloader is the first program to execute on startup on an embedded chip, so as you can see in this diagram, typically the bootloader initialises some peripherals and then checks some system registers or external pins to see whether it should go further into the bootloader or just load the application software. At some point there's also a CRP or code readout protection check, which typically tells whether the chip is re-protected or not and we'll explain more about this later. So, bootloader is quite a critical piece of software since it exposes read and write functionality, so basically if you can bypass any security units, you have full access to the chip's memory and then finally, which is quite important for our research, it is addressable or readable from a normal user application. So, in a data sheet that would look like this, where you typically have a certain area memory reserved for the boot ROM. So, the reason we were interested in this is because we can actually read out this bootloader firmware, let's say, and analyze it and thus make our attacks more effective based on this binary. So, the readout protection, which I mentioned earlier, is a mechanism for protecting the memory on the chip. This can be done either in hardware, so through fuses, for example, to disable a debug interface or is also typically done in a special flash page, which then contains a certain value which indicates whether the read protection is enabled or disabled. This can also be done in different granularities, so it can have a different protection bit for read, write protection, arrays per sector, per block, however the chip manufacturer decides to do this. So, this is an example of the LPC bootloader binary, which is an ARM microcontroller. So, you can see it reads the readout protection value, the CRP value, checks it against a static variable, CRP1, and then if it's off, it jumps further into the bootloader. Now, why is this so important? Why would we want to extract firmware? So, first of all, as previous researchers pointed out, is that proprietary crypto in an embedded device is a particularly bad idea. So, in order to find that and scrutinize it, we obviously need to have the firmware. So, therefore, this is valuable research. It is necessary to find vulnerabilities in chips. We can extract secret data, such as immobilizer keys or crypto keys, which might be more than just an individual devices key, but actually a manufacturer key, let's say. And then there are a few more reasons. For example, forensics, the repurposing of end-of-life devices, replacement firmware, you name it. So, the targets we picked vary in difficulty and set up an exploitation method. So, first of all, the LBC-1343 is an ARM-based device, which we attack configured on CRP1, which I'll explain a little bit later. But basically, this is a software-only exploitation and the difficulty herein lies the development of a ROP, return-oriented programming exploit on such a restricted embedded device and also exploiting its memories. Then we have two hardware, two targets we attack by hardware. So, the STM-8 locks the bootloader on startup, based on the option bytes it's called. So, since it's locked on startup, there's a very small critical codebase, but here what makes it difficult to attack is the actual glitch parameters. So, let's say, for instance, with voltage glitching, that's the width, the offset, and the glitch voltage. And then, finally, we have the RuinSS 7880, which is an 8-bit chip. This restricts the write access and has no read functionality, but always exposes certain commands, such as a checksum or a verify on a longer array of bytes, so 256 bytes, for example. Since this is a larger codebase that's accessible here, the timing is a difficult aspect and not the other glitch parameters. So, fault injection techniques typically have to weigh up better the cost versus the invasiveness. So, first of all, we have voltage glitching, which is what we opted for. It's a very accessible, very cheap device we developed as a giant. It's an open source hardware and costs about $150 to assemble. Then there's also optical fault injection, for example, UV lights could reset certain fuses or erase certain bits when exposed, but this requires extensive preparation of targets. Then there's also laser injection, which is extremely expensive, the set of, but can be very accurate. And finally, there's also a mid-range where electromagnetic pulses can affect the chip's functionality with great available tools, such as the chip shutter, which I definitely would recommend. Now, the first target, the LPC1343. So, it has multiple CRP levels, which are only enabled by certain values. So, CRP1 has restricted write access and no read access to the chip. CRP2 basically limits the functionality to a chip erase. On CRP3, the chip is fully locked, so there's no programming functionality. And then finally, it has another level, which is called no ISB, which only disables the boot loader, but still has the debug interface, the SWD interface enabled. So, this chip has already been attacked by a voltage-volt injection by Golinski and Al a few years ago, but we would like to show how the complexity of the boot loader leads to a software-only exploitation. So, for that, we have to have a brief look at the stack. So, on the bottom of the RAM of memory, it stores the CRP value. So, this is the previous of these bytes, basically. Then, the boot loader resides in RAM. The stack area is here, which is writable, and which is what we will exploit. So, since the stack area is writable, we can override return addresses on the stack. So, if we call the writeToRAM command, we can actually write to the stack an overrided return address with an address inside the readMemory command handler, past the CRP check. So, we write these values on the stack, basically, which contain the address to be read out, and the return address first. So, the program counter pops to the readMemory command, which then pops certain values of the check of the stack, which are the addresses to read. And then, finally, through several more gadgets, we get back to the command handler, and we can repeat this process. Then, we found one more vulnerability, which is that individual sectors on this chip can be erased and rewritten. So, that also leads to a major vulnerability where we can just override a certain sector, which we know will be executed with a dumper program. So, we can bar from this one section and read out the rest of the memory of the chip. Now, I'll go into the next target, the STM8. So, the STM8 security is configured by two option bytes. It's gold. So, the first one is the readout protection, or ROP byte, which is depending on which bootloader version it is. We've looked at two either turned on or turned off by programming this byte by up to AA hex. And then, finally, there's also the bootloader option bytes, which determine whether the bootloader will be enabled at all, first of all. So, in a diagram, this looks like this. So, if reset, the bootloader initializes a few peripherals and disables all interrupts, then it checks whether the chip is empty, or if the bootloader option bytes are set. If that is the case, then it checks the ROP byte, so the readout protection. If that's active, it just goes onto the user application. And then, if it's not active, finally it goes through. So, we dumped the bootloader and in the bootloader binary, this looks like this. So, first of all, it calls the checkMT serve function, which checks the first byte in flash, whether it's AC2 or AC. If that is the case, then the chip is not empty. And if that's not the case, vice versa. So, then either it jumps into the checkCRP and basic block, which then checks whether the readout protection byte is set or cleared. So, basically, looking at this binary, we know we will require two glitches on a fully secured chip. So, the first one would be to get to reach this basic block. So, that's either here where we convince the chip that it's empty, or either here where we convince it that the bootloader option byte is set. Through experiments, this basic block turned out to be the easiest to glitch. So, the first glitch is inserted here, which gets us to this basic block. And then finally, we just have to skip this jump and go to the serial bootloader, which then exposes all the functionality. So, doing that, we knew that there was two critical sections. And on a profiling device, we can actually code these sections or program these sections into a user application where we completely take away the timing aspects of a glitch. So, we pull a GPIO pin high, then we have our critical section, which we want to glitch. So, there's only like one or two microseconds there where the glitch can fall. And then finally, we pull another GPIO pin high to indicate success. So, here this figure gives an idea of which glitch voltages work with which glitch widths. So, there's only a relatively small subset of voltages and which work with each other. But here, we have more of an idea of these glitch parameters. Then, the second step is to get the attack the real bootloader. So, first of all, we can do this on a profiling device again, where we either enable the readout protection or the bootloader bytes. So, we always, we, in either option, we only need one glitch. And then finally, we do the full double glitch attack on the real targets, where we have to focus on the timing, since we already know the voltages and widths from the previous, from the first step. So, what can help with this is also a boot power consumption graph, which we obtain by connecting a shunt resistor to the ground. And then we can see the power consumption. So, the bootloader starts about here. Then there's a section of about 20 or 15 microseconds where it executes. And then we know where the first glitch needs to fall and the second glitch. And then finally, we can see that the user is at its high. So, this can actually make the window, the glitch window, a lot smaller if you have a power consumption, an idea of the power consumption of the chip. Then finally, I'll explain how we enhanced voltage glitching by static analysis on the Rene S78K0 chip. So, once afterwards I'll game up with a very clever attack on this chip, basically. So, the chip only locks right axis, but leaves a checksum and a verify open, basically, but only on 256 consecutive bytes. So, technically you couldn't gain much information from having a checksum done on 256 bytes. However, they found that by voltage glitching, they could get this down to four bytes. And they could also leak individual bytes by glitching during the checksum calculation. So, we decided to look into the bootloader binary and actually try to predict glitch offsets based on which addresses we're generating the checksum or verify from. So, we noticed that each command in the bootloader had this sort of sanity check subroutine, which basically, if given two addresses, it checked which block number it was, whether the block numbers corresponded. If the lower address was lower than a maximum allowed address and whether the first address is higher than the second. So, since the same function is executed for all these bootloader commands and depending on where it fails or succeeds in returns, this will affect the glitch offsets. So, the idea is to statically predict these offsets by putting the arguments or two addresses given to the checksum and verify command into equivalence classes. So, basically the idea is that given a certain function, for example, this one, certain sets of arguments will always take the same path through this function based on the constraints generated. So, this is akin to symbolic execution where we change the inputs to a function starting from the interrupt handler to where the bootloader responds. We change these input arguments and then we build up the constraints and then based on these constraints, we generate our equivalence classes with all arguments which have the same constraints and thus the same execution path through the bootloader binary. Then, what does this look like? Actually glitching. So, we were actually able to, based on the first successful offsets. For example, of this equivalence class, we were able to predict how much further in time or how much bigger the offset should be for other equivalence classes based on the length of the execution path. So, how many more clock ticks and then based on the frequency of the device, we could predict where the other offsets of all the other equivalence classes should fall. So, that's how we used leveraged static analysis for voltage glitching while completely ignoring the other glitch parameters such as voltage and width. We set the voltage to zero and the width to 100 nanoseconds, so just constant. Finally, I will summarize some bootloader design directives which, if they're followed, they might be able to mitigate issues like this. So, there's a few anti-patterns we've noticed which are basically to be avoided in any bootloader design. So, the first one, partial RAM write access led to the LPC microcontroller where we could overwrite the return address on the stack. Then, similar for the next one, where we could erase and overwrite a certain flash sector to then with dumper code which then dumps the whole chip. Certain chips default to unprotected which make it a lot easier to glitch when there's, for example, 15 values which disable the readout protection and only one that enables it. A non-redundant CRP checks makes it easier to glitch so that if there's only one, then you only need one glitch, for example. Large number of protection levels may confuse developers as to what's actually protected, what's actually not, is write access. Complete write access to the chip or only to RAM, etc. Complex bootloader logic could lead to software only vulnerabilities such as we've seen in the LPC. And then finally, the non-atomic erase is also, if you can erase one sector and then somehow overwrite it, then the rest of the firmware is also vulnerable. So I thank you for listening and if there's any questions, please do ask me on the presentation at chess. Thank you very much.