 Hi, my name is Olivier Bronchand, and in this video, I will present the work, side-channel control measure dissection and the limits of closed-source security evaluation. This is a joint work with my supervisor, François Xavier Standard. So side-channel auto-design security. The first thing a designer can think of is, OK, side-channel attacks are a physical problem, so let's solve it by doing some physical solution. So the first thing you can do is can add some noise. You can also reduce the signal emitted by the device. This is maybe a good solution, but it might not be enough to provide high protection. And this is mainly because adding noise or reducing the signal does not give a parameter to increase exponentially the difficulty of an attack. To do so, we explore what we call noise amplification that is based on some mathematical analysis. This mathematical analysis typically requires additional hypotheses. So when you do masking, you have to assume that the leakage of each share is independent. When you want to evaluate this design, there is multiple approaches. The first one is what we call an open approach where the evaluators get all the knowledge and control on the target. It means that he knows the implementation, he knows the control measures, he knows the RAM data that is used, and so on, he knows all that. And because he knows everything, it is the verification of the physical assumptions. And this approach is more privileged in academic research, but it's not only the case. As a counterpart, we have also closed approach. There, the evaluator gets only a restricted knowledge on control of the target. Maybe it does not know the randomness because maybe it does not know what are the countermeasures, and maybe it does not even know what algorithm is working on. And because of all that, it's harder for him to verify some physical assumption. This is in contradiction with Kirchhoff principle, mainly because security there does not only come from the key, but also from the fact that evaluator does not know the implementation. And this approach is encouraged in some certification practices where an implementation with loose points if all the details are linked in the pubic domain. So what about a real-world attack? So there is a few of them. We list them. So the first one is attack on big stream encryption keys of exiling FPGA. So there, the adversary was able to recover that encryption key. As another attack, you have attacks where adversary by recovering one single key on a device was able to forge updates for many IoT devices. And then, not in the field of such channel, but more in embedded security, you have that attack that was presented last year at chess where the adversary is able to open a Tesla car just by cloning a key fob. All these attacks are in common that when you have done a possibly huge reverse engineering, the attack is quite straightforward. And this is mainly because all these examples are not representative of certified products. To go to more practical security, for me as an academic, I think I lack of some practical relevant example of some combination of countermeasures. And I was really happy when we saw the first step that answered it in that direction when they published an open source protected AES. So they published that AES implementation that is doing encryption and decryption in software on 32-bit MCUs. This implementation is done by a team of experts that have both knowledge in industry and in academia. It contains mixed countermeasures that are combined, so that's exactly what's the focus of this talk. And it also provides some preliminary leakage assessment. Their implementation is not aimed to be certified, it's just there for education. And for education, we will use it for three points. The first one is our mixed countermeasures combining well. What security can we get on this popular 32-bit MCUs, because they are often used in embedded systems? And what's the impact of knowing all the sources when you want to do a worst case security evaluation? So in this work, we'll do the worst case security evolution of this implementation. And this goes into phases. The first one is what we call profiling or learning, whatever, of the target behavior. So there, adversary or evaluator gets the source code, randomness source code, knows what algorithm is facing, he knows what countermeasures are implemented. And then he will collect measurements in a control setting, so he knows the randomness, and he can collect measures on that. Because of that, he will be able to say, OK, that measurement corresponds to that intermediate value, that measurement to that value, and that measurement to that other value. Then he can perform the attack. So he's no more in that control setting, so he does not know the randomness anymore, and he can only extract information from the leakage. And he extracts information, and then he can process it to perform the attack. And as an example, he can do a key recovery attack. So next, we will detail what are the countermeasures within that implementation, and how we can run countermeasures this section. At the high level, this implementation contains two countermeasures. The first one is affine masking. It involves a multiplicative mask that is the same for all the 16 bytes. This is for optimization reasons. It also requires an additive mask which is different for all the 16 bytes, and it requires alternative S-box pre-computation. The second countermeasures is shuffled execution, where there is one permutation among all the 16 S-boxes and one other permutation on the forming columns. And both permutations are pre-computed. If I put that in a diagram, there is three steps. So first, you have inputs, then pre-computation, and finally, encryption. So as inputs, you have some additive mask, some plain text, and some multiplicative masks. And you do the pre-computation of the alternative S-box. Then during the encryption, you can share everything, and you are left with two branches, which are left branch and right branch that we detail next in the talk. So these two branches goes into add-on key, then there is some addition, then S-box computation, then other addition, and finally, shift rows and mix columns. This is for masking, but you also have shuffling. So you have also input for shufflings that are seeds. And based on these seeds, we will derive some permutation. And these pre-computed permutations will be used during the encryption phase. So because we know all the countermeasures, we can write what we call optimal distinguisher. And this optimal attack is based on a conditional distribution that depends on the countermeasures. And in this precise case, the expression is given as follow. It's a sum which goes over all the possible multiplicative mask, all the possible additive mask, and all the possible permutation. Each of the terms is one template. This is optimal, but there is one template per combination of possible humbleness. So it can rapidly become out of reach. And you have to sum on everything, which makes also the attack phase quite slow. So to cope with all that, we have to introduce some hypothesis. So we rephrased the previous equation by assuming some independence between some secrets. And the expression is given as follow. So you still have a sum on all the possible multiplicative masks, and then a sum on all the possible additive mask permutation, and then another but not joint sum on all the other shares and the permutation. This is a question you can do what we call countermeasure dissection. So what is it is when you want to combine countermeasures, you hope that there will be a multiplicative effect between them. As an example, if you do masking plus shuffling, you hope that information on a share will be degraded because you do shuffling, and then masking will be more effective. The goal of countermeasure dissection is to reduce that to a small factor, which is ideally close to 1, meaning that the two countermeasures are completely not combined. How do we do that? So that's what's shown in red. It's that we launch attacks on partial piece of the secret, and this will allow us to bias the sums. Ideally, this will bias the sum up to the point that there is some terms that we can skip. And also, it reduces the number of templates because there is not one huge template on all the possible randomness, but templates that are based on fewer bits of randomness. So note that you have all the maths we can try to extract information. The first thing to do when you want to extract the information, of course, is to have a measurement setup. So in this work, we use the Cortex M4 at mail. Then we use, as a measurement point, EM probe. And we were using a picoscope to sample all that at 1 gigahertz. So first, you have to place everything. So we placed the EM probe on top of the package. And then how can we extract information in that sitting, taking into account that we are in an open approach? So next, I will illustrate that for the 2-bit seed that is used to generate the mixed column permutation. So the first thing to do is to compute SNR. The x-axis there is the time, and the y-axis is SNR. And it tells you where information is located about that seed. So based on that, you can select the point of interest that you will use during your attack. But maybe there is a lot of points, let's say 2,000. You would like to work with a bit less dimension, so you train a projection that will reduce this amount of dimension. In this work, we use the PCA, which is a Profiled Dimensionality Reduction Technique. So you have your PCA. You can project the points by using the PCA. And then you are left with a subspace of three dimensions here. So here dimension of the graph is one dimension of the subspace. And each color corresponds to one possible seed value. And on top of these clusters, you can fit PDF estimation. In this work, we used Gaussian estimation. So then when you run the attack, the thing you have to do is that you have to collect first one measurement. Then you keep exactly the same 2,000 points. You project them against to the subspace. And once you are in the subspace, you can estimate probability from PDF. So it means that you will tell, OK, I observe that leakage value. I can tell that there is that probability that that seed value was used. So if I go back to the equation, we did partial attack on all these things. And for the multiplicative mask, we were able to recover it with 100% accuracy given one single trace. And the same goes for the permutations. Overall, it means that we were able to do almost perfect dissection because multiplicative mask was completely ineffective while permutation were almost ineffective. Note that we know how to extract information we can do to the attack. The first phase when you want to attack is to know where to attack. So adversary or the evaluator has to for sure get information about the multiplicative mask. And there a sweet spot is the multiplicative precomputation. Then you have to get information about the left branch and the right branch within encryption phase. There is uneven shuffling across all the rounds. So first, there is operations that are not shuffled. They should be and these can easily be fixed. There is also permutation that are just seeded with two bits and other permutation that are seeded with 16 bits. Overall, all the permutation can be eliminated. So attacks will present can be done on all the shuffled parts. And for simplicity, we focused on the two-bit seeded permutation. Looking at the attack, we did first a divide and conquer. So we did that for the 16 bytes. And on the x-axis, you have the number of measurements. And on the y-axis, you have the guessing entropy. So the first observation is that when you increase the number of measurements, you decrease guessing entropy. More precisely, there is just less than one bit left on each of the bytes after 3,000 measurements. We also note that there is one harder byte to recover per column. We can do attack also on the full key directly by using rank estimation and key elimination. So we see the same graph there. So x-axis is the number of measurements and y-axis is the correct key rank. If you increase the number of measurements, you decrease the entropy, of course. And with less than 4,000 measurements, you got the key at rank 1. If you can do key animation, so you can do some post-processing, you can get the key with 1,100 measurements. Overall, you can just let your scope run for one minute and you will have enough data to attack. The next question is, can this be automated in a closed approach? More precisely, does the knowledge of the target helps when you want to do a worst case security evaluation? Why do we ask this question is because in practice, evaluator does not always have full control of the target. And if it helps, it can be worrying for long-term security. More precisely, if an adversary comes and have a better strategy than the evaluator had, so maybe it had more knowledge about the target, maybe it did more reverse engineering, or maybe it had a better model, it will be able to extract more information that the evaluator expected it. And so maybe the conclusion round that the evaluator are wrong. So we did some experiment with machine learning. Why is machine learning? Because we think it's representative of closed approach in security evaluation. Namely, it doesn't care what are the counter measures within your device. It can just, you can just fill in with leakage, and it will output two keys. In practice, we instantiate some MLP classifier in the simulated settings we will describe next. The first setting is standard Boolean masking, where there is two shares, and each of the shares are leaking in amingweights plus Gaussian noise. The second simulation setting is affine masking, which is representative of previous implementation, where there is that left branch, that right branch, and the multiplicative mask. So the right branch and the left branch are leaking with amingweights plus noise, while the multiplicative mask is given in clear to the adversary. So the question is, how does open and closed approach compare in these settings? Once again, on this graph, you have on the x-axis the number of measurements, and on the y-axis, you have the Gaussian entropy. The blue curves are for closed approach, and purple one for open approach. Dash line are for affine masking, and continuous line for Boolean masking. So how does this approach compare? First, you observe that for open approach, both are equivalent, and for black box approach, they are not equivalent. The point is that, in an open approach, the adversary knows that he received a multiplicative mask, so he can inverse the multiplication and just left with Boolean masking, and that's what we observe there. However, in a closed approach, MLPS to realize that he is receiving a multiplicative mask and that the best solution is to reverse the multiplication. So we can increase the field size, and we see that the gap between affine and Boolean masking increases. So it gets harder for him. The point is that learning that multiplication and having MLP doing that is certainly not impossible, but however, it looks to be prohibitive. And what should we get worried about having MLP to learn that while it comes roughly in a white box fashion? As a conclusion, on a technical point, I would say that we did the analysis of the mixed contour measures within the anti-implementation and that we were able to do the attack in less than a minute of recording. So this was using old state-of-the-art estimation techniques, some equations that depend on the contour measures, and finally, some sounded hypotheses. We reproduced the attack on anti-measurements. And with these measurements, ANSI was not able to find weakness with 100,000 traces. Our work does not show that anti-implementation was flawed. It was just showing that it's difficult to protect it to its software. And this is inherent to the low-nose that is available within the platform and not because there is too few shuffling or whatever. To reproduce on the other target, you need a few things. So first, for sure, you need the source code, and you need to know the randomness while you are doing the profiling that when you are doing the attack. You need to have a sufficient understanding of the contour measures, and you do not need so much time. So to illustrate that, I did a timeline of the different steps within the work. So on the zero, I was online scoring Twitter, and I found some manual prof tweets. I opened it, and I saw some protected code available. So that was interesting. On D1, I started looking at it, and I found the MCU that was able to run the library. I removed some capacitors because removing capacitors is most of the time a good idea. I found the EM probe, and then my setup was ready after five days. So I started working on that setup to record measurements and do attacks. And quite rapidly, I was able to recover the multiplicative mask. And this made me happy. I think the mask is not having an attack, so I started working on it again. And after 10 days, I had my first attack. I was happy. But first attack, you are not really convinced with it. So you start to work again. You go online. You find some key animation and rank estimation tools. So you plug it within your system, and it goes faster, because you do not have to wait to have your key rank to one. I was working on it again. And after 15 days, I had a complete attack that I was convinced with, and I was happy. So as a take home message, I would say that NC implementation was really stimulating first steps. And I would like to thank them for that. It was a nice research change to design a more secure implementation. This implementation will help with that. And this implementation should be able to deal with limited physical noise. Thanks for watching this video, and you can contact me to these links.