 Hello. In this video, I will present the work broken mass implementation with many share on 32 bit platforms or when the security order doesn't matter. This is a joint work by myself, Olivier Bronchon and my supervisor, François Guéstandard. So how to design sites and security with masking? To obtain a secure implementation with masking, the design goes in two steps. The first one is to leverage proof in an abstract model, such as random probing or threshold probing security. And once we have implementation secure in this abstract model, we leverage reductions to obtain security with noisy leakage, which is closest to the reality. If done that way, the attack complexity can be defined by that. So on the left you have A with the attack complexity and on the right you have MI, which is the information that the adversary can obtain on a share from the leakage and D, which is the number of shares. And you see that the attack complexity increases exponentially with the number of shares. To have that equation, the right two conditions must be fulfilled. The first one is that all the shares leak independently and this ensures that the day D will be the one that's expected. And the second condition is the noise condition, which requires that the mutual information between the leakage and the shares is small enough to be amplified. This seems simple, but there is a lot of challenges when designing secure implementation. So the first one is that an implementation, a masking scheme, depends on the platform. So either we are running on software, on hardware, and there is a large design space to choose the masking scheme. Maybe it will depend on the cost of randomness, what's the best choice. And the designer attempts to optimize his design choices regarding to the cost versus performance trade-offs. So this paper tackles the problem of security evaluation. And here we will show a timeline of the secure implementation, where on the x-axis you have the time and on the y-axis you have the best current attack. So first, when there is a product, of course there is some attacks. The technical state of the heart attacks was so done. And they improve this time, so the best attack complexity decreases this time. And at some point, one has to deploy the implementation and say no, it's in the wild, I cannot touch it anymore. And at that time, you would like to know what's the current resistance of your target. But this time, attack still improves. The goal of this paper is to propose worst-case attacks or worst-case evaluation. And the goal of that is to anticipate future improvement of state of the heart. How do we do that? So what are the ingredients? Basically, we give full knowledge to the evaluator during profiling. During profiling, the evaluator has access to the source code, to the randomness that is used for masking. So what are the shares? What are the inputs? The plaintext was the key. He also has the opportunity to have measurement setup that is as clean as possible. And he has to have efficient technique to explore all information within the leakage. So why do we need randomness during profiling? Basically, it allows to simplify the profiling stage. And it allows also to give a good interpretation of the attack results. And that's what we do in the paper. And it also enables to give clear guidelines to the evaluator. Because we are able to understand what are the weaknesses of the target. We can say to a designer, OK, this is the issue, and that's what you should fix to have a situation implementation. So of course, if we are back to work to the previous figure from the previous slide, in a short term, there is probably a gap between the best possible attack and the worst case attack evaluation. But as attack improves, probably that this gap vanishes. So what's in the paper? We propose a new methodology to analyze mass-bit slice software. This methodology is really simple and it's based on Gaussian templates in Saskia with dedicated factor graphs. We apply the methodology to two implementations. The first one is AIS implementation following the guidelines of Goudarin. And the other one is for Clyde, which follows the same guidelines and has been used for the chess 2023-2020 CTF. These targets and these softwares are running on two low-cost MCUs that are Cortex-M0 and Cortex-M3. We use them because they are widely used by the lightweight crypto community for benchmark of these competition candidates. And it puts forward that security on these devices is at least expected. We also extrapolate the security of these implementation to larger mass-gain order. And we also discuss the impact of additional contour measures. Now we will detail the methodology used for this work and to analyze mass-softwares. So the methodology is based on profile attacks, which goes in perfect. So the first phase is profiling where there is a target and we provide a target with randomness, plaintext and key. The target to the target, a scope is attached and which records the leakage for each of the execution of the cryptographic primitive. Then, because the narrator knows the source code and knows the key, the plaintext and the randomness, he is able to derive all the intermediate variables or all the shares that are processed by the target. Based on these leakages and shares value, the evaluator can build a model that maps for a given share value, the leakage distribution. The second step is the attack phase where once again we feed the target with only a plaintext, no more the key and no more the randomness. And once again to the target, a scope is attached and it records leakages. The second step is to derive probabilities for each of the shares value from the leakage by leveraging the model builds during the profiling phase. Then, with all these probabilities on all the shares, the evaluator can run some computation that usually involves the fact that he knows the source code and knows all the target and all all these variables are interacting with each other to output a key guess. So, the methodology is based on soft analytical side-gen attacks or SASCAs. And here we will take a small example of what SASCAs in general. So, let's take the simple example where we have circuits c equals a plus b and d equals a times b and SASCAs goes in multiple steps. So, the first one is to build a graph where you have first variables, so here say a, b, and d and operations that are addition and multiplications. And then we draw edges between the variables and the operations. Then, from the probabilities obtained with the model in the attack phase that we described in the previous slide, we init all the variables with their distribution. And then we apply a message passing rules, which basically involves passing messages from operations to variables and then from variables to operations. And it's a iterative processor repeat that many times. So, what are the limitations and complexities involved in running SASCAs? First, multiplication, I mean the message passing rules for multiplication is expensive. Indeed, its complexity goes in 2 to power 2b, where b is the number of bits in the target variables. So, if you want to profile 8 bits values, so b equals 8, then the complexity of multiplication is 2 to the 16. And additions, it's less expensive because its complexity goes in b times 2b. So, SASCAs is a optimal way to recombine information if the graph is a 3, meaning that there is no cycles. If there is cycles in the graph, then this SASCAs becomes realistic. So, how do you apply that for a mask software? So, let's take a simple unprotected circuit that we want to protect. So, we have a times b equals c. The secure implementation looks to something like that, where you have shares of a, which are a0, a1, shares of b, b0, b1, shares of c, c0, c1. And secure multiplication, so the multiplication above is replaced by a secure multiplication. Let's say the ISW multiplication that takes the shares as inputs and outputs the shares of c. And for this methodology, we would like to keep providing an attack complexity who scales gently with d, and to reduce the UB sticks. A third solution to reach that would be to build a factor graph for the entire ISW multiplication. This may be a good idea, but there is some limitations, but let's start with the advantage. Which is that it can possibly exploit all the information from the leakage within these ISW multiplications. However, the number of nodes, number of variables processed by ISW multiplication is quadratic with the number of shares, and the number of multiplications is also quadratic, which makes this computational cost expensive and not scaling gently with d. Additionally, this graph to represent ISW multiplication contains a lot of cycles. So what's the efficient methodology we propose? It goes in two steps. First, for each of the secret variables, so a, b and c, we build them from their shares. So let's say here with a two-shot example, we have graph that will... we have XOR operation that recombine a1 and a2 to obtain a. Same goes for b0 and b1 that gives b, and c0 and c1 that gives c. And this small graph, we call them encoding graphs, and there is one for all the secret variables. The second step is to link these secret variables, so a, b, n, c, by operations, which is here a multiplication, for example. And this is called the unmasked graph. So what are the points of this methodology? First, regarding the complexity, it's quite nice because the number of variables to profiles remains linear with d because only the shares needs to be profiled. I mean, the shares have the input and output of secure multiplications. Then, the number of multiplications is also constant with the number of shares. The encoding graph contains no cycles, meaning that information on a is obtained without any heuristics because the encoding graphs are trees. And finally, once again, regarding complexities, because a variator can solve all the encoding graphs independently, it allows to have some trade-off between, I mean, to have trade-off not to load everything in memory at once. The drawback is that with this methodology, we cannot exploit lower order flows, and that's what we call order preserving. This methodology has been implemented with Scalib, which is a Python library that we developed with my colleague Guy Tancasius. So what is it? It's a Python package that you can install simply by running pip install Scalib. There is multiple tools for site and analysis, and it's optimized to run on a single or multiple threads. In this project, we use it to compute signal-to-nose ratio for each of the shares, to run Saskia, and to run a key animation. So what are the concrete results of our methodology on the investigated implementations? So first, we look at concrete attacks. On this graph here, you have on the x-axis the number of traces used by the adversary, and on the y-axis you have the medium key rank, which is the remaining entropy of the key. So we see that if you increase the number of shares from 3 to 6 here, you also increase the attack complexity n. And here with 9 traces, you are able to break the 6 share implementation of AES running on Cortex M0. If you move on a Cortex M3, then things get better, and with 2000 traces, we are able to break 5 share implementation. And then for client on Cortex M0, which is the chess 2020 dataset, we are able to break 8 share implementation with 12,000 traces. So with these numbers, we are able to extrapolate that. So each of the concrete attacks from the previous slide is a bullet on this graph. And because we know that the complexity of attacks is exponential with D, we can extrapolate the trends. What do we see and what do we have as conclusion there? So first we observe that by comparing AES running on Cortex M0 and Cortex M3, we see that Cortex M0 provides more information per share, hence it requires more shares to be secure. Or for a fixed number of shares, the security is lower on Cortex M0. Then we can compare AES and client running on both on the same target, Cortex M0. And you observe that client provides a better security than the AES. This is probably because client has a lightweight S-box that so requires not that much computation in a bit like settings. And this is because the client S-box has been designed to be mask friendly. Overall, you observe that in the best case, so AES running on Cortex M3, we need at least 16 shares to reach forte bit security. What does all this implies for performances? So here we look at client running on Cortex M0 and we look at security versus performances. For the performances, we take a recent paper by Belayet published at Europe in 2020. So we observe that if the designer targets 20-bit security, so N equals 2 to the power 20, then about 12 shares are needed. We require, I mean, the throughput is about 3 kilobytes per second. Then if the target is 30 bits of security, then 18 shares are needed and the throughput is 1.2 k bytes per second. And once again, if you increase the security you want to have, so 40 bits of security here, you need 24 shares and the throughput still decreases. Yet we observe that by slightly paying additional cost in performances, so between 20 and 40 bits of security, there is more or less a factor 4 in performances. So you get 20 more bits of security. So what can we tell about masking on 30-bit software and what should we do? So first, the main issue we observe is that there is a lack of noise on these devices. Indeed, this is not the big issue. Our attack is not able to exploit these weaknesses, yet the attacks are quite efficient. So the issue becomes from the other part of the equation, which is MI. That is the MI per share is too large. This is illustrated with a graph on the left where on the x axis you have the number of leaking points and on the y axis you have the mutual information. And we see that the mutual information is quite large. We also observe that the noise is mostly algorithmic, meaning that the noise really depends on what you put in your registers, what's the layout of your registers and what instruction you apply to them. Overall, reducing this MI would lead to a significant gain in performances. So the natural solution to anticipate all that is to take a general contour measure that's represented by a factor of gamma, which reduces MI. To do that, the data complexity becomes something like this where we have gamma that is raised to the poor D thanks to masking. So that parameter gamma allows to cover many contour measures that can be physical such as noise addition or algorithmic such as shuffling. And based on this gamma we can define a quantitative target for their effectiveness. And in the paper we present many of these graphs where on the D axis you have the number of shares within your implementation. On the Y axis you have the gamma and the color represents the data complexity to the N. Y means that N is very large, so there is high security. Black means that N is low and so there is low security. So on this graph we can see that for a given security target if you are able to decrease mutual information per shares by a given factor gamma then you need a given number of shares to fulfill your desired data complexity. From all this, now we will be back to our goal of estimating the worst attack data complexity. And a usual question is to know what happens when you want to to profile on a device and attack on another device. So how do we evaluate that? We profile on a device I, we evaluate MI with that model on device G and then we estimate the loss or the reduction of mutual information that it implies. And so we have this kind of grid and we see that for some pairs there is a large loss but for others there is no I mean or a small one such as 0.9. So for what does it implies for side channel evolution? Basically the cross profiling does not provide any large or significant security gain. Indeed there is some pairs where the gamma is large and other when it's small. And if you want to run evaluation on one device, profiling on one device and attacking on another device you expose yourself to a risk of false sense of security. As example you could be profiling on one pair where gamma is very low and maybe adversary will pair with where gamma is very high. So overall evaluator should profile and attack on the same device to avoid these risks. Now let jump to the conclusion. So what are the takeaways of this presentation and of this work? We've been presenting an efficient methodology to attack or evaluate masked implementation. We've been putting forward that the main issue for protecting this low cost MCU with the lack of noise and you require a large number of shares to protect this device. One solution would be to reduce the number of operations that need to be masked by leveraging dedicated mode of operation et with this work we also present Scalib which is a Python toolbox that contains a lot of optimized tools for such an analysis. So thanks for listening. Here is some useful links and one related work that exploit the same methodology and apply to the well-known ASCAD dataset.