 maximum likelihood of attacks, or masked and shuffled implementations. The co-authors are Nicolas Bruneau, Bonkion, Annalie Hauser, Olivier Girot, Hansois Abyech-Pondard, and Yannick Teckit, and Nicolas. Do you hear me well? So thank you for this nice introduction. So in this talk, I will present you our result on Taylor expansion of maximum likelihood attacks. The maximum likelihood attacks are known to be the best attacks. But in some cases, they can be really inefficient. And in this presentation, I will show you a new attack with a better efficiency. So this work is a joint work with my advisor, Sylvain Guillet, from Telecom Paritech and Securiti, from Annalie Hauser, from IRISA and Telecom Paritech, but also with Olivier Rihoul from Ecole Polytechnique and Telecom Paritech. This is François Xavier Standerth from UCL. And this work has been done while Yannick Teckit was at ST. I will begin this talk by an introduction on such-and-and analysis. Such-and-and analysis are classical threats against cryptographic algorithm. And as a consequence, countermeasures have been developed. These countermeasures are themselves the target of particular kinds of such-and-and attacks. In this scenario, when the attacker wants to apply the best attacks, we'll face the issue of the high complexity. And in the second part of this presentation, I will show you a new attack, which kept the good properties of the good effectiveness of the optimal attacks, but with a better efficiency. I will then illustrate this new attack in a practical scenario. So such-and-and analysis are a classical threat against cryptographic algorithm in embedded system. The aim at revealing the secret data, such as the secret key, by exploiting the physical leakages emitted by the device during the execution of the algorithm. These leakages can be, for example, the electromagnetic emanation or the power consumption or even the heat. And as a consequence, some contra-measures have been developed. One of the most classical contra-measures are the masking scheme, as their security can be formally grounded. The rationale of an omega minus 1 under masking scheme is as follows. Any sensitive variable will be randomly split over omega shares using omega minus 1 random variables, random value called the masks. The impact of these two contra-measures is to make the first omega minus 1 order moments independent from the key. And as a consequence, if an attacker wants to attack such scheme, he should at least look at the omega moment. Another classical contra-measure is the shuffling. The idea of shuffling is to randomize the order of execution of independent operation in the algorithm. And this contra-measure needs a random permutation denoted in this talk by pi. So if the idea of, so if in a normal implementation, we will start by use the value z1 followed by z2, z3, and z4. In a shuffled implementation, we'll, for example, start by z3 followed by z1, then z2, and finally z4. The classical way to defeat such contra-measure is to make the sum over all the shuffled variables. And the consequence is that we add some noise. So the extrinsic noise, which are the non-targeted values. And we also add all the extrinsic noises of the variables. These two contra-measures depend on some parameters. The most important parameters for masking scheme are omega, which is the number of shares. This number is strongly linked to the number of masks. If we want to have omega shares, we need omega minus 1 masks. The other important parameter is the order of the implementation. It's the minimal key-dependent statistical moment. At best, we have O is equal to omega. And if a masking scheme meets this property, it will be called a perfect masking scheme. In this talk, we assume that all the masking scheme are perfect. The most important parameter in the shuffling implementation is the size of the permutation, denoted by capital pi. These two contra-measures can be defeated by the template attacks. Template attacks are a particular kind of such-and-all attacks. They are known to be the most powerful in an information theoretic sense. Such attacks can be divided into two phases. In an offline profiling, the leakage model is learned and by using a golden device in which we can modify the value of the keys. It exists different methods to perform this estimation. There is the non-parametric method, such as the histogram or the kernel method, or the parametric method, for example, the mixture models. The second step of the attack is to recover the key using the model learned during the offline profiling and by applying the maximum likelihood attacks. So I will present to you in this slide the main advantages and disadvantages of parametric methods and non-parametric methods. In non-parametric methods, the only random part is the noise. It means that, in general, these methods are easy to estimate, as we can only compute means or variants. But because we assume that the only part is the noise, we assume that the shuffle and the mask are known. This means that we assume here a powerful attacker. And the other drawback is that it can lead to many templates. In non-parametric methods, the shuffle and the mask are part of the noise. And as a consequence, it's not need to know them. So this method can be difficult to estimate as they may suffer of the curse of dimensionality, which is a common issue in machine learning and statistics. So in this presentation, we assume that all the models are perfectly known and estimated using parametric methods. In this presentation, the attacks are applied on Q queries, which are the leakage measurements, so the traces. And on the leakage sample, this value D is called the dimension of the attacks. Each leakage measurement can be modeled by a deterministic part and a random noisy part. The deterministic part, y, is a function of a secret key, k star, a random plaintext, t, which are n-bit words. But it also depends on some random values, which are the values used by the contour measure, such as the mask and the shuffle. And in this presentation, these values are denoted by a capital R. The random noisy part is assumed to be a white random Gaussian noise with sigma squared as variance. When the model are perfectly known, so we have shown in our 2014 azir clip paper that when the model is known, the optimal distinguisher, meaning the distinguisher that maximizes the success rate is the maximum likelihood. It consists in maximizing the sum over all the traces of the log likelihood. And we can see that in this formula, this expression here is applied on R. So it's applied on all the mask and all the permutation. And in this presentation, for convenience, we have denoted by gamma, the SNR, which has 1 over 2 sigma squared. Now, if we look at the complexity, we can see that its complexity depends on the number of traces, q, on the dimension of the attack, d, but also on the number of possible share value. This value comes from the expectation over the masks. But it also depends on the possible permutation, which comes from the expectation over the permutation. And if the size of the permutation, pi, is high, this term, pi factorial, became huge. And as a consequence, the optimal distinguisher will not be computable due to its high complexity. So in this part, I will present to you which is our main result. It's a rounded version of the optimal attacks, which kept the good properties of effectiveness of the optimal attacks, but with smaller complexity. As already mentioned, the optimal attack consists in maximizing the sum over all traces of the log likelihood. But using the cumulant degenerating function, we can rewritten this formula as the sum over all traces of this series expansion. This series expansion depends on the value kappa, which are the higher-order cumulants. The higher-order cumulant can be expressed recursively using lower-order cumulants and the higher-order moments. Now, we can introduce our new attack, which is a rounded optimal attack. This attack depends on a parameter, L, which is an arbitrary source of the attacker. So our new attack is the rounded optimal L-degree attack consists in maximizing the L-order Taylor expansion in the center of the log likelihood. In other words, we take the L first term, terms of the previous expansion, series expansion. So we can see that the term used by the maximum likelihood is equal to the sum of the term used by our new attack and an error term where the small o is the longer notation. This means that this error term is small compared to gamma to the L when the SNR is small. If we now look at the complexity of this new attack, you can see that this attack, the complexity still depends on the number of possible share values, so the expectation over the mask, the number of traces. But it also depends on two factorial terms. These terms depends on the dimension of the attack, on the degree of the Taylor expansion, so L, and the size of the permutation. And we can notice that when the degree of the Taylor expansion is small compared to the dimension of the attack, these two factorial terms are small. And as a consequence, this complexity is also small. So I will now illustrate this new attack in a practical scenario. As case study, we have chosen the case of a masking scheme with a shuffle table reconstruction step. So it exists different implementation of masking scheme. They mainly differ by the way the nonlinear part are implemented. Indeed, the implementation on linear part are quite obvious, but the implementation on nonlinear one are more difficult. So we can cite the algebraic method, which exploits the algebraic representation of the nonlinear part. We can also cite the global lookup table method, which all the value for all the possible masks are stored in a global lookup table. But we can also cite the table recomputation method which pre-computed a masked box stored in a table. This method are often used in practice as they represent a good thread of between memory complexity and time complexity. Moreover, recently, this kind of implementation have been extended to a higher order masking scheme. So the sketch of table recomputation algorithm is as follows. In the first part, the table is recomputed, and then it's the classical mask cipher. In the recomputation, we go from all the possible entry of the table and mask the input and mask the output. This kind of algorithm can be defeated by classical second-order attack. Sorry. But it's known that better attacks will take into account the leakages of the table recomputation. The first example was given by Pan-Halle in 2009, which recovered a mask by a first horizontal attack and then recovered a secret key by a first vertical attack. But a better attack is to apply the maximum likelihood and take into account all the leakages of the table recomputation and the cipher in once. In order to protect such implementation against this attack, the classical contour measure is to randomize the order of execution in the table recomputation, and it will need a random permutation here, five. And here we are in the case of shuffling. So, of course, this kind of algorithm can be still defeated by exploiting the leakages of the classical cipher, so by building the v-variate attacks. So, for example, we can apply the second-order CPA, or the optimal disenguisher, as in this case, no shuffling is involved. But we can also apply our new-rendined optimal attack, for example, at degree two or four. So, we can see in this figure the success rate expressed on the number of traces. So, first of all, we can notice that the optimal attack here is the better attacks. We can also notice that our new attack at degree two and the second-order CPA perform similarly and are close to the optimal attacks. But the interesting fact in this figure is that our new attack at degree four is not as good as the other attacks. This means that in this case, we have had some term in the Taylor expansion which not make the attack better. And this is due to the fact that in this case, the noise, so for a standard deviation is equal to one, is not high enough to have a good approximation of the maximum likelihood by the Taylor expansion. Now, if the noise increase, for example, if sigma is equal to two, we have still a success rate explaining the number of traces. We can see that all the attacks perform similarly and are close to the optimal disenguisher. Of course, better attacks will take into account the leakages of the table recomputation. In this case, the optimal disenguisher is not computable due to the term in two-power and factorial. The first example of attacks which target shuffle table recomputation have been presented in test 2015. It's the multivariate attack on table recomputation and it's a third-order attack. And we can also, in this case, apply our new, indeed, optimal disenguisher. We select as degree three, as we know there is third-order leakages because of this attack. We can now study the complexity of this case study. So we can see that, of course, the optimal disenguisher is not computable. You do this high factorial terms. And we can also notice that our new attack is longer to compute than the multivariate attack. But we will see later that it will give better results. Now we can see this figure that, of course, noise sigma is equal to three. We can see that the new attack, errupted free, and the multivariate attacks perform similarly and are better than the secondary disappear. This means that, in this case, the most important term in the Taylor expansion is the third-degree one. Now, when the noise increases, for example, for sigma is equal to 12, we can see that now the second-order CPA is closer to our new attack. And we can also see that it's much better than the multivariate table-recapitation attack. This means that, in this case, the important term in the Taylor expansion is the second-degree one. But we can also notice that because the new attack is better than the second-order CPA, the third-degree terms have still an aspect. Now, for intermediate noises, we can see that for sigma is equal to eight, for example, our new attack is much better than the attack of the state of the art, as it needs two times less traces to reach the same level of success rate. For example, for 80% of success, we need around 60,000 traces to reach 80% of success for the new attack, while the other attack needs around 120,000 traces. For other value of noise, we can see that the gain is smaller, but still our new attack is much better than the attack of the state of the art. We can see it in figure, the number of traces needed to reach 80% of success depending on the noise variance, on the noise standard deviation. And we can see that in all the noise scenarios, our new attack is better than the attack of the state of the art, as the discurve is below the curve of the multivariate attack and the second-order CPA. So, as a conclusion, we have shown in this presentation a new attack, which is a trichetin version of the theoretical optimal distinguisher, which becomes efficient. This means that now the attacker can manage this attack, it can compute. And remains effective. Indeed, this attack is, in the case of the Sheffield Table Recomputation, always better than the attack of the state of the arts. Now, the question arises, is how to quantify the accuracy of the approximation? So, in other words, how to choose the degree of the Taylor expansion? Indeed, we have seen that the good choice of the degree of the Taylor expansion will depend on several parameters, such as the noise variance, the dimension of the attack, and also, of course, the parameters of the counter measure. So, we can imagine that it will be a trade-off between the effectiveness and the efficiency to choose the good parameter, and it will be an interesting task to do this for a generic counter-measure implementation. So, I hope you will have some questions, and thank you for your intention. And the speaker? Thank you. Any questions or comments? Yes? So, you build the model, the deterministic part, and then there's a noise. What happens? What variation is there between one chip and another chip in the deterministic part that makes the optimist, the distinguisher then, accuracy go down or not work anymore? Sure. So, we have done all our tests, assuming that, of course, the model are perfectly known. This means that we have no process variation between the two devices. And if it's the case, of course, there will be an important impact on the effectiveness of the attacks. So, it will degrade the results of the attacks. Were there any results in practice with this kind of strategy? So, yes, we have some practical results. So, we make it on software with Natsumaga with two different cards, sorry. And in this case, we have much the result we see here, exactly the same. We have no so much difference between the two set of cases. So, it was okay, but it doesn't mean in general it can be the case. Thank you. No, I'll bring over questions or comments. Sorry, I wonder if your attack gives an advantage also for the case of applying just masking without shuffling. If you have the case. So, if we have no shuffling, in fact, we are close to these cases. And in this case, of course, our new attack is close to the second order secret. So, the advantages are small. But we can also imagine other scenarios where the devices mix in different order. So, we can exploit second order attacks, but also third order attacks, and so on. And we guess that in this case, this approach will lead to better results than, for example, the second order CPA alone. And how many degrees do you think are practical to check? So, in fact, it will be different because in this case, we can easily check the higher-order degrees. But now, for the multivariate case, we have to establish the exact formula of the Taylor expansion. And it's quite difficult in practice. So, for degree three, it's okay. For it, it can be difficult. So, of course, if we have time, we assume here the result will be smaller, but better. But I think in this case, it's why it will be the best case. Any other questions, comments? If not, let's thank the speaker again. Thank you.