 Welcome to my talk on liquid resilience on microcontrollers, my name is Florian Unterstein and this has been a joint work between follower Isaac and the Technical University Munich. I'm sure many of you remember the IoT goes nuclear work by Ronan et al. So in their talk, they show the video of a drone flying around and infecting smart light bulbs with a warm that then spreads from bulb to bulb. So if you have any IoT devices in your home, this is basically the worst case that can happen, that someone just comes flying in and turns all of your devices into a mod-net. For this attack, they first recovered the master firmware encryption key with a side-channel analysis. So in our opinion, this is one of the best examples that shows that firmware updates need to be protected against physical attacks if they are out in an environment where the attacker has physical access. Unfortunately, a lot of devices are already out there with no means of protection. So our motivation for this work was the question whether we can rate fit side-channel protection to such devices. We start this work with two observations. First, that side-channel protection seems absolutely necessary, as shown in the example. But countermeasures like masking can be very expensive to implement and cost a lot of latency. Second, microcontrollers often come with crypto accelerators that they use for performance reasons. But unfortunately, they almost never provide side-channel protection. So our idea was that maybe we can use methods from leakage-resilient cryptography to harden such microcontrollers and still make use of those crypto accelerators. So in our next few slides, I will give an overview over leakage-resilient cryptography and then I will go into more details about the constructions that we actually use. Leakage-resilience is an algorithm countermeasure against side-channel attacks. The basic idea is to bond the leakage per execution, such that an attacker cannot accumulate information about the secret. Now contrary to masking and other, let's say, more traditional schemes, it does not require high-quality random numbers or specialized hardware. So this makes it seem ideal for IoT devices where we have no control over the hardware and where often high-quality random numbers are not available. I want to start from a top-down perspective. So remember that our initial problem was firmware encryption. So to achieve this, we need a full leakage-resilient authenticated encryption scheme. For this, we make use of the AAD scheme that was proposed by Di Gabriele at all and later the requirements were relaxed by Kramer and Struck. So what you can see here is a block diagram of the scheme. So it basically consists of four different blocks. So there's one leakage-resilient pseudo-random function here, then a leakage-resilient pseudo-random generator, a hash function, in our case SHA256, and another leakage-resilient pseudo-random function here. So basically the extension by Kramer and Struck said that those two blocks can be instantiated with the pseudo-random function. Before the security requirements for this block were slightly different. So as inputs, as usual for authenticated encryption, we have the message, announce, encryption key, associated data, and a MAC key. So what is interesting about this scheme is that the keys all go into the pseudo-random function. So what happens is the nonce goes into the pseudo-random function, gets encrypted with the encryption key into an fmrl key that is only valid for this one encryption, and that goes into pseudo-random generator, which generates the key stream, which is then xored with the message to get the sliver text. So the long-term secret only goes into the PRF. If you look at the MAC part, first all the inputs, the message, the nonce, and associated data get hashed by the SHA256 function, and then the result gets encrypted or gets processed by the PRF. And this forms the tag. So both for the encryption and the MAC part of the scheme, the secret long-term key is only processed by the PRF. So this is why we focus this talk in our work on the PRF implementation. In the paper, we also discussed the security of the PRG in more detail, but for now I think it's easy to see that since this key is only valid for one message, that it suffices if the PRG resists a simple power analysis and not differential power analysis, like the PRF, which uses the long-term secrets. Next, let's look at the PRF. So the PRF we use was proposed by Medvedor and it can be implemented using a standard block cipher like AES. The side channel resistance is based on two methods. First one is limited data complexity. That means there is only a limited number of different operations under one key that an attacker can observe. The attacker can still repeat those operations, but he has no influence over the inputs, so he can only observe a limited set of inputs. This is a configuration parameter as we will later see and it allows for a trade-off between security and performance. The second fundamental principle is aggregate big noise from parallel S-boxes with equal inputs. So ideally, we want a fully parallel implementation where we have 16 S-boxes in case of AES that generate the most noise and make attacks harder. But on microcontrollers, we cannot influence the hardware, so we have to take whatever we get from the manufacturer. And typical implementations are either 16 S-boxes in parallel or four. Like I said, the data complexity is a configuration parameter. So on this slide, you can see two different configurations for data complexities 2 and 16. So let's first look at the left side of the slide where there is the configuration for data complexity 2. This is the most secure but also the least efficient configuration for the PIF. So if you look at the data flow graph, you can see it starts from the top. There is the input and binary form, the long-term key K, and then it executes through 128 stages and outputs the result at the bottom. So in each stage, we process one bit of the input. So in the first stage, we look at the first bit of the input, which in this case is a 1, and then we go into this branch, which means we encrypt plaintext P1. If we have a 0 as input, we would go the other way and encrypt plaintext P0. So these plaintexts are fixed, they are the same on every stage, but they are of a special form such that all bytes are equal. But I will explain this on the next slide. So we process the first bit, then we go deeper into the tree. The output of the AS is the key for the next stage. So we change the key and then we process the next bit. Here it's a 0. We go to the left and encrypt P0 with the new key. Then the output, the ciphertext, is the key for the next round. We process the next bit and so on, until we have processed all 128 bits of the input. So you can see each of the keys that appear during this tree walk can only encrypt either P0 or P1. So this is why this is limited to data complexity 2, which means there's only two inputs that are distinct. The attacker can still go back, repeat the observation and walk the same path again, or maybe you can even influence the input and observe both operations for one key, but it's always only going to be those two. So let's look on the right side. So this is an example for data complexity 16. So this means we don't only process one bit of the input per stage, but now we process four bits of the input. So this means there are 16 different plaintexts that we can encrypt. So you can see it branches into 16 different branches. Here the first four bits form an A in hex. So plaintext A gets encrypted, key is changed, next four bits get fetched, plaintext 4 gets encrypted, and so on. And now we finish after only 32 stages compared to 128. So from a performance standpoint, you want to process as many bits per stage as you can so that you finish early. But the more bits you process, the more different observations you give to the attacker. And we will see that this makes attacks easier. So the other principle that the side channel security is based on are parallel S boxes that generate algorithmic noise. So algorithmic noise is just noise that cannot be averaged out because it's caused by the algorithm itself. So unlike, for example, electrical noise. So this is a basic diagram of the leakage that is generated by the S boxes inside the AES. So we have all the S boxes in parallel. The inputs are always one byte of the plaintext and one byte of the key X odd. And then during the execution, some leakage occurs that is measured by the attacker. And this is some kind of function, some leakage function L of the S box processing the secret key and the public plaintext byte. So now typically, the attacker knows the plaintext bytes can manipulate them. So he chooses random bytes. And then he can attack them one by one. First focus on this byte, recover K0, go to the next byte, recover K1, and so on. But now we have only a fixed set of plaintexts. And we choose every one of them such that all the plaintext bytes are the same. So P0 is the same as P1, P2, P14, P15. And what is achieved is that the attacker now has no way to differentiate between the different S boxes. So all he knows is the plaintext. This one byte, which is the same for all 16 S boxes. And this makes dividing conquer attacks pretty hard if there is no other way to distinguish between the S boxes and which key byte is which. So even in the ideal case, where all key bytes are recovered, the order is still unknown. Our contribution is to take this concept and bring it to microcontrollers. As we have seen, the only hardware requirement for this LRPF is that we have an AS with parallel S boxes. And like I said in the beginning, this is a feature that we often have on microcontrollers to enhance the performance. That means we can port this entire construction to microcontrollers if we just use the existing accelerators for the AS encryptions inside the PF. And then we do all the rest, the protocol handling, key updates, generation of the inputs with the equal bytes in software. And then if we can secure this building block so we can implement a secure PIF, we can also implement the full LRAED scheme. Because the rest of the AED scheme, like the hash function, it processes no secrets. So this is not security critical. We can do this in software. Or if the back controller has a hash accelerator too, we can use the hash accelerator for this. Let's look at our construction, identify the attack vectors that we need to evaluate in our side channel analysis. First, there is, of course, the AS. So we have no influence over the number of S boxes because the hardware is given by the manufacturer. But what we can do is change the data complexity that we allow. So the most interesting question here is how efficient can we make our implementation? So how high can we go with the data complexity that we allow before it gets insecure? Or can we even find one configuration that is secure? The second attack vector is the key transfer over the bus. Since this is no security controller, but a regular microcontroller, the bus transfer is, of course, not encrypted. This means that the key goes over the bus in plane. And therefore, we need to evaluate this in a side channel attack also. For our evaluation, we implemented the authenticated encryption. And in particular, the PRF on two devices, the STM32 and the EFM32. Both are ARM Cortex-M processors. And the main difference for us is that the STM32 has an AS coprocessor with 16 parallel S boxes, whereas the EFM32 has an AS coprocessor with only four parallel S boxes. So we would expect that the leakage resilience is more effective on the STM32. Since for low data complexities, the attack success rate is very dependent on the actual value of the key. We have to attack multiple keys and give statistics over the security level. So you can probably imagine that if the key leaks, for example, it's hamming weight, that a key with all zero or all ones is easier to attack than a random key with equal zeros and equal ones. And for the evaluation setup, we used something that we thought is appropriate for this use case. So we don't use 100,000 euro or US dollars worth of equipment and de-capsulate the chip and do some highly localized, precise measurements. Instead, we take the chip as it is and we use an EM probe with a 2.5 millimeter diameter coil. And then manually position the probe where we find a good signal by visual inspection on a scope. And then we do some correlation-based leakage tests to detect points of interest, extract them, and then run multivariate template attacks on the key bytes. And we do this both for the key transfer and for the PRF, that means the ES encryption with different configurations for the data complexity. Let's start by investigating the key transfer. So what you can see here are histograms for the security level after we attacked 1,000 keys. So on the x-axis, you can see the security level. And on the y-axis, the number of keys that fall into that range. And what you can see for both devices for the STM and EFM, that the median security level is very high with 120 and 113 bits, respectively. And also, the spread isn't that large. So there are a few keys that go down to almost 100 bits of security, but the absolute minimum is still above 96 bits. So we can check the first part of our evaluation, the key transfer. We still achieve high security levels and we can securely transfer the key to our accelerator. Now the next part is the actual evaluation of the AES core. Let's recap what we expect. As we've learned, the attacker can only observe a limited number of inputs depending on the data complexity. Therefore, what we expect is that after observing those inputs enough to remove the noise, at some point, the security level should stagnate and the attack should not improve anymore. And therefore, we first want to look at the number of traces in relation to the data complexity and the security levels. So here you can see the median security level of 300 random keys where we plot the number of traces over the median security level and this for all the data complexities that we evaluated. And you can see, as a most important observation on this slide, that all of the different data complexities at some point stagnate at a security level and do not increase or decrease anymore. And what we can also observe here is what we expect, that higher data complexities lead to lower security levels. And this is also on both devices. Now if we compare the two devices, so the SDM with 16 S boxes and the EFM with its four S boxes, you can also see that the 16 S boxes that generate more algorithmic noise lead to higher security levels, you can see here on the left, compared to the EFM and the four S boxes on the right. This is also in accordance with the theory that we looked at earlier. Since we know that the security level is actually dependent on the actual key value, it's not enough to look only at the median security levels. And therefore, in the next slide, we will focus on this rightmost vertical, where we use the maximum number of traces for both devices. And then we look at the distributions that occur for those number of traces. So here you can see for the maximum amount of traces, the security level again for all the data complexities. This time, the data complexities are on the x-axis, security level again on the y-axis. And it's plotted as a box plot for all the different data complexities. So in comparison to the figure on the previous slide, we now also see the variance and the outliers, which is important because if you choose a key randomly, you don't know where you end up on this range. Again, we see that higher data complexities in general lead to lower security levels. And we also see that especially for higher data complexities, the variance in the security levels get quite high. Most importantly on this slide, we observe that for both devices, we have configurations in terms of data complexity that lead to high security levels of over 100 bits. So on the STM, up to data complexity 16, we cannot observe a single key that leads to security level of less than 112 bits. And similarly, on the EFM, up to data complexity 8, all observed keys lead to security levels of over 100 bits. We can also again see that the EFM leads to lower security levels compared to the STM. And we most likely attribute this to the S-boxes. To summarize this talk, we started off initially with the problem of securing firmware updates against side-channel attacks. To solve this, we brought concepts from leakage resilience and applied them to microcontrollers. This allows us to use existing hardware accelerators that are not protected against side-channel attacks and use them in a way that makes the entire construction resilient against such attacks. We showed that for the both controllers that we analyzed, we find implementations that still result in high security levels of above 100 bits. This enables us to implement, for example, secure firmware updates, even on commodity devices that don't provide any hardware measures against side-channel attacks. Our solution has a small memory overhead and performance impact, and you can find a full analysis of that in our paper. Also, since we only require an AES accelerator with a certain amount of parallelity in the S-boxes, it is applicable to a wide range of microcontrollers. We don't require any specialized hardware or two random number generators. We make the full source code available, both for the LRPF and the LRAAD. And for both devices, you can find it under this link. These are the references that we used in this talk and finally our contact information. Thank you for listening.