 Hi, my name is Nikolai Kost, I'm a PhD student at Simula UeB at Bergen, Norway, and this is a joint work with my supervisor, Martin Stamm, called Redundant Code Basemasking Revisited. And first, I'm going to start by talking a bit about the context. This work is about side-channel attacks and one of its countermeasures, so it's important to understand where they come from. Side-channel attacks are a type of cryptographic attack that arises when one moves from the model to the real world. In a cryptographic model, one sees a cryptographic algorithm, such as an encryption in that case, as a black box. A key in a plain text goes in and a sci-fi text goes out, and the attacker is only allowed to play with the sci-fi text and or the plain text. Now when an algorithm is turned into an implementation, and said implementation is running on a platform in the real world, what we see is the appearances of leakages. Those leakages, such as the power consumption of the platform, or the timing, the runtime of the implementation, lead to different type of attack, which is called side-channel attacks, which exploit the leakages to recover the key much faster. Now this work focuses on a type of side-channel attack, which is called power analysis attack. In that case, an attacker needs physical access to the device, and while the device is running the encryption algorithm, it will recall the power consumption of the device and later use those traces of power consumption to attack the implementation and recover the key. Not only does an attacker is able to know what operation is running on the device, as you can see in the picture, but since the power consumption is dependent on the data being processed, an attacker is able to use statistical attack on sufficiently many traces and recover the secret. This is called differential power analysis, and it's a serious threat that needs to be protected. In order to protect against this type of attack, several countermeasures have been designed, such as shuffling the order of operation, introducing random delays in the trace, adding electronic noise with capacitors, for example, and the one that's going to be the focus of this talk, masking. And I'm just going to note that all those countermeasures aim at degrading the signal to noise ratio because we know that the amount of trace required to recover the secrets is inversely proportional to the signal to noise ratio, where signal means the part of the power that directly depends on the data being processed. Now as I said, this talk is going to be about masking, which is usually known as the main countermeasure against action attack, but not necessarily completely sufficient by itself. Usually it's combined with other countermeasures. So masking work has followed. You pick D, which is your security order parameter, and D is going to tell you how effective your masking is. Now there is a trade-off in picking D. The higher D, the stronger is going to be your protection, but the slower your implementation is going to be. So usually D is picked to be reasonable in both security and speed. Once you have picked your security order, you generate D random variables and you encode your secret V into D plus 1 shells. You then compute your algorithm without recombining those two together using what we call implementation gadgets. Now the main encoding used in this software is called Boolean Maskings. In Boolean Masking, your D random variables are your first D shells, and the last share is made by summing together all your random variables and your secret. And nearby summing, I mean B2XOR, and so it's called Boolean Masking because we are doing it in a final field. Now in 2011, Proof and Rush introduced another type of masking, which is called Polynomial Masking. And Polynomial Masking boils down to Shamir's secret sharing scheme with parameter D and N. So here D is still the security order, but N is the number of shares which is not necessarily D plus 1. The way it works is that you still generate your D random variable, which I've noted in my formula. And those random variables are going to be used as the random coefficient of a random polynomial whose value at 0 is your secret V. You're then going to evaluate this polynomial on as many public points as you want. And this is how you can have more shares than you need. Those public points are in a set of points called S. And to reconstruct, once you have evaluated those points, you will get your shares. And to reconstruct the secret from your shares, you can use Lagrange interpolation formula. Now the main claim from Polynomial Masking is that if N is equal to D plus 1, meaning you have the minimal amount of shares to reconstruct, it should leak less than Boolean Masking given that the signal to inertia ratio is low enough. However, if N is over D plus 1, meaning that you have some shares that are extra, they are redundant, those shares can be used to defeat another type of such an attacker, which is called fault attacker, by doing error correction. Now there are two questions that arises from Polynomial Masking. And they are mainly due to the fact that we have introduced two new parameters into the game. Now we can decide on the amount of shares that we want, and we can decide on the public point on which we evaluate the polynomial. So this is what this work is going to investigate. The first question is about the number of shares. Since we are introducing redundant information in the trace, can those shares be used by the attacker? Can you use them to mount a more powerful attack than without redundant shares? And the second question is about the choice of points. Does the choice of points influences the leakage, and if so, are there points that should be avoided when choosing your parameter? Now the first section is going to talk about the redundant leakage. And in that section, we are going to focus on another paper that was published in 2018 by Shabad Maghreb and Proof, who already tried to address that question. Now as we are going to see, we are going to go counter to that conclusion, and we are going to fix a mistake that they did in their paper, which faulted their analysis. But first, let me introduce the leakage model, or the model on which we are going to conduct our experiment to validate what we are finding. And this is called the noisy aminguate model. For all the shares of a masked variable, the adversary, so us, is going to get the amingweights of the share with some addition of Gaussian noise. It's a widely used model that were used by Roche and Proof in their original paper for polynomial masking, but also by Gouben-Martinelli for another masking scheme, and by Balast et al. in yet another masking scheme studied in 2015. It's very commonly used, and it's also used in real attack because there is a link to reality with why we would get amingweights from the shares. And in our case, the secret variable that we are going to try to recover is the output, is a single output of a first-round S-box in an IS-128, which is also a very common setting. Now going back to the paper, Shaban Magrima and Proof in 2018 tried to address our written polynomial masking leaks. They use maximum likelihood estimator as a distinguisher. I'm going to come back to that a bit later. And they observed that using strictly more than D plus 1 shares will merely provide the attacker with more noise than information. And actually, their work is even stronger than that because what is true is that the attacker is performing worse when he's trying to attack more than D plus 1 share, which is very strange because there is this adage given by information theory that more information should always lead to more successful attack. And the maximum likelihood estimator is the optimal distinguisher in their model. So since we are adding redundant share, it should not be possible for the attacker to get worse. At worse, it should be the same, but not worse. And it turned out that they made a mistake in their distinguisher. So just as a reminder, in maximum likelihood estimator, what we are trying to do is we are trying to compute a score for each possible value of the sensitive variable based on the trace. And what we hope is that given sufficiently many trace, those scores are going to converge to one being much higher than all the other, giving us the correct value. And what they are doing is that they are summing over all the possible value of all the shares but one and then computing the last shares based on all the previous one and the secret. And this worked perfectly fine for a non-redundant polynomial masking case. It's actually adapted from a Boolean masking case. However, it does not work anymore when you introduce redundant shares into the game because not all the shares are random but one. By definition, some shares are redundant. They are completely fixed by all the other. So there is what we call a dimension mismatch. And if we take a degenerate case, which would be d equals 0 and n equals 2, meaning that there is no sharing, we are just repeating the secret twice. Their formula would still assume that one of the shares, which is the secret itself, is random and can be random. So I just plotted what the distribution would look like if we were to use their formula. We get a quite strange Gaussian mixture while in the case of repeating the secret with independent noise, we know what we should get. We should get a very clean 3D Gaussian. And this is what we get if we use the correct family formula, which is the one that we use in our paper, where you sum over the value of the random coefficients and then you compute the share based on those random coefficients. And if we go back to our degenerate case, since there is no random coefficient, the sum just vanishes and we just have the product of two Gaussian of an independent noise which give us all Gaussian as expected. Now we re-run the experiments using the same set of interpolation points. And what we did in this experiment is that we start with D equal one, meaning that you only need two shares to reconstruct. And we start by only giving the adversary those two shares. So this is the first line with N equal two. And then we increase the amount of shares that the adversary is allowed to use by introducing redundant shares. N equal three, N equal four, N equal five. And what matters is not really the shape of the curve but the relationship to each other. N equals three is much lower than N equals two, which is itself much lower than N equals four. So we can clearly see that as we put more redundant shares into the system, the amount of trace is dropping down. It's dropping down by a lot because in the abscess, the value that we plot is the amount of trace in log two required to reach a 90% success rate. And in X, you have the evolution of the signal to noise ratio. So you can see that we have a huge drop of trace required to be successful. Now, what is interesting in that picture is that the distance between the curve seems to be constant from the range of noise that we are targeting. So we looked at plotting the lowest noise possible to have a quantification of this degradation. And we did that both for D equal one, which is the blue line and an increasing N up to six and for D equal two for N up to five. This was an attempt at quantifying the security degradation when the amount of share increase. Now it appears that for the low noise, the range of noise that we used in our experiments, this is representative, but for higher noise, the behavior is a bit different. And for that, I refer you to a paper by Cheng et al that is also from this year at chess. Now I'm gonna move on to the second section, which is about investigating the sets of points. So in that case, we know from a previous work from 2020 and from 2018 that said that the public points matter in the leakage. So what we are interested in was trying to identify the particularly bad sets of points that should be avoided in implementation. What we did not look at is how to select optimal points and for that, I also refer you to the paper by Cheng et al. So the two questions that we are leading to are there some sets of points that lead to an equivalence between polyamide masking and odor masking? Not only is this interesting, but also if some set of points make polyamide masking equivalent to an odor masking, then the odor masking may be leaking more than the usual polyamide masking, such as Boolean. And the other question is, are there some sets of points that lead to the shares being outright more leaky without necessarily being linked to odor masking? And just for what I mean by equivalence, in the paper, all the analysis in that section was conducted using coding theory. So I'm just gonna present the highlights of the finding, but if you want the crisp analysis and the mathematical definition, I refer you to the paper. We leverage a pretty well-known link between Shamir's secret sharing scheme and MDS code for this analysis. And the equivalence between two masking scheme I will informalize as follow is that two masking scheme are equivalent if the adversary can attack them with the same attack results. And so for the first question, we can answer positively. There are some set of points where polynomial masking is equivalent to Boolean masking. And the example that I would give you is for parameter D equal two and N equal three. So minimal amount of shares to reconstruct is three. And the sets of points A, B and A plus B with A and B distinct. If you compute the shares C1, C2 and C3, you can see that if you were to sum them together, and again by summing, I mean B twice XOR, all the terms would cancel out except for V, would be left, and so we would get back to our secrets. So from the shares of this set of points, we can reconstruct our secret just by summing it together, which is exactly Boolean masking. We did some amingweights experiment to confirm that the result from those shares is exactly the same from the result of shares made with Boolean masking. And so in more general, we identified all the possible sets of points up to N, sorry, up to N equal five, and we give some necessary condition. If D is odd, you need something that is called the point at infinity, which is probably never gonna be used in a real implementation. And in general, one necessary condition is that all the points should sum to zero. We did not identify sets of points over N plus six, but we over N equal to six, but we highly suspect that there are more. Now, this cannot be generalized completely. There are not sets of points for all Boolean masking because you need distinct points for polynomial masking, and at some point, you just run out of points in your final field, while Boolean masking can have an infinite amount of share. Now, another sets of points that we identified, which has a weaker condition this time without being equivalent, is what we called quasi-Boolean. And the idea is that by introducing redundant shares, we introduce an alternate way of reconstructing the secret from the shares. So if you take the same set of points as before, A, B, and A plus B, but this time with D equal one, meaning that you only need two shares to reconstruct, you can still sum the share together in a Boolean way to recover your secrets. But this time you can also interpolate two shares to recover a secret using the normal polynomial way. So there is two different ways of recovering the secret from this set of shares. And since there is this additional structure, this kind of tells us that this set of points should leak more than a random polynomial masking points. Now, the interesting thing is that Proof and Roche in their extended version of their original paper suggested to use a stable sets of point that is stable under the Frobenius automorphism with parameter D equal one and N equal three. And more recently, 2018, an older paper suggested to use the same set of points. This is for implementation trick. Frobenius stable under Frobenius automorphism mean that if you square one point, you go back to another point in the sets. And this is used for implementing secure multiplication gadget faster. Now, the problem is that there is a unique set of point that match this condition. And it's a quasi Boolean set of points. Now, it would be interesting to look at what happened if you go for IRD or IRN. In our paper, we looked at all the possible, quasi, all the provenius, all the possible, sorry, Frobenius stable sets of points up to N equal, up to N less than seven. And we looked at if they were safe from quasi Booleanness or not. So you can have all the results in the paper. Now, the last question left to answer is, are those quasi Boolean set of points leaking more as we thought? And so we did aming weight experiments again. And on this curve, on this graph, sorry, you have the red curve that would show you the second order Boolean, which is more here as a bound, not really a bound, sorry, more here as an indication. And in the blue area, you have all the non-quasi Boolean sets of points that we investigated in this work for. We did not investigate all the possible points, so it's possible that some points are a bit outside of this zone, but this is where they all fall. Now, the quasi Boolean one is in black and is much lower than the blue one. So as expected, it has a much weaker leakage profile than the blue one and it should be avoided. So I've come to the end of my talk and I'm gonna conclude by summarizing the results that we got. So we corrected the previous paper published at Chase by Shaban, Maghreb and Prouf. More redundant shares always lead to less security. And this means that there is a trade-off when you use polynomial masking between defending yourself against fault attacker by introducing redundant shares and making yourself more vulnerable to passive attacker who will use those redundant shares to mount more powerful attack. We formalized the notion of equivalent masking using code theory and we investigated the choice of points using coding theory as well. We found two sets that should be avoided when doing implementations. Boolean equivalent set which leads to polynomial masking being equivalent to Boolean masking and quasi Boolean set with an atypical leakage profile. All our results are confirmed with experiments in a simulation-based I mean weight model. Here are all the references that are used in my talk and thank you for listening. Have a good day.