 So thanks for the introduction. So this is the outline of the talk. Essentially, I will start with a few words of introduction and motivation, and try to explain what are the challenges that we face when we want to implement pasting in parallel. I will follow with the description of this new leakage model called the bounded moment model, and try to explain why it's actually relevant to analyze the security of parallel implementations. And I will also argue that security in this new model is actually implied by security in the well-known probing model. I will then try to show how we can exploit parallelism in order to design very efficient multiplication gadgets. And I will conclude the talk by discussing a little bit the fact that security in the bounded moment model is actually weaker than probing security and what are the implications of this observation. So to start the introduction, I would like to say a few words about certain attacks, which very summarized can be viewed as physical attacks that decrease the security of an implementation exponentially in the number of measurements. And if I take a very simple, maybe naive example, we could imagine a block cipher with a 128-bit key. And each time the adversary makes a measurement, he gets one key bit, so that after 128 measurements, the adversary gets the key in full. And of course, that's a little bit oversimplifying, because in practice what happens is that each time we want to recover a key bit, we have to distinguish between two noisy distributions here, represented by the two red and orange gaussians. And of course, that suggests that if we want to improve security against intentional attacks, one natural way to do that is to try improving the noise or increasing the noise. And for example, this can be done by leveraging what we call algorithmic noise or additive noise, and the idea is that by doubling the size of the circuit, you have more things happening in the implementation that you don't target in the attack, and that's going to increase the noise. But somehow, this is still not good in terms of security, because essentially if you want to double the security, you will have to double the cost of your implementation, which is not nice as a security parameter. And that's exactly where masking comes into play. So intuitively, I would say masking can really be viewed as noise amplification. And I will try to explain this with the most famous example, which is Boolean encoding. So what we have here is a secret value y that you want to share, or that we share in the different pieces. D minus one pieces are obtained randomly. The last one is computed accordingly. And if we want to reason about security of this type of encodings, one convenient way is to use the abstract model called probing security, and that was introduced by Shai Sahain Wagner in 2003. And the idea there is that the adversary can probe a certain number of wires. If he probes up to D minus one wires, he will not gain any information about the secret, but if he gets one more wire, then of course he can recover the secret in full. And this is of course a quite abstract view of security, but the expectation is that when we implement this encoding in a real device, what happens is that the device is going to manipulate the D different shares at different points in times, and the adversary will receive what we call a leakage trace. So this is really the power consumption or the electromagnetic radiation of the implementation. What's important is that we hope that every share will be manipulated in different cycles. And the expectation is that if we give that to the adversary, we still have some kind of security reflected by the fact that the information leakage is going to be bounded. And in particular, what's important is that the information is raised to a certain power which corresponds to the number of shares. And this has been later formalized as noise leakage security by Prouf and Rivain in 2013. And what's very interesting or one very nice result in this field is due to Duke Tiembrowski and Fost in 2014 because they showed that this very convenient abstract model called probing security actually implies concrete security in the noisy leakage model under two assumptions that we can somehow verify empirically. One is the fact that the leakage corresponding to the shares are sufficiently noisy. And the other condition is that these leakages are independent. And that's extremely convenient because it shows, and that's exactly what Gilles says in the morning, that we can actually do all the security analysis in this abstract model and then gain something concretely useful. Now, what are the questions we started with in this talk or in this work? The first one, which is probably the main one is what happens with parallel implementations. And first, I have to explain what is a parallel implementation. Essentially, it's an implementation that will manipulate all the shares in one clock cycle. So, rather than having all these cycles where the shares are manipulated, I put them all in parallel and I will of course require a bit more power consumption to do that. And then one question that we could ask is what happens if we do this probing attack and the probing adversary obtains the sum of all the shares? And clearly we see that directly we don't have probing security in the sense that these sums is leaking something about the secret which we don't have usually in the probing model. And essentially the question here is whether the probing model which was very convenient for serial implementation is also applicable or relevant to parallel implementations. The other question or the second question we asked ourselves which is a bit more consolidating is whether it's serial or parallel, all these mass implementations heavily rely on the fact that the leakage of the shares is independent. So how can we test that? In the case of serial implementation it means that we need independence over time which corresponds to the x-axis of the figure. In the case of a parallel implementation it's going to be independence over the y-axis which corresponds to the leakage. And of course, ideally we would like to do that without directly working in the noisy leakage model essentially because it's a bit more difficult to manipulate. So it would be convenient to have something simpler than directly dealing with the full distributions. So I will now introduce the model that we use to reason about that. And I will start with a few words about the statistical intuition that we have in masking. And I will illustrate this with a two-share and one-bit example for a serial implementation. So what we have here is we have two leakage samples. The first one is a noisy version of the first share. The second leakage sample is a noisy version of the second share. And what we give to the adversary is essentially these two distributions. If the secret is zero, he can observe zero, zero, or one, one plus noise. If the secret is one, he can observe one, zero, or zero, one plus noise. And clearly we see that we have information. We can distinguish the two distribution. What's interesting here is that if we compute or if we estimate the mean vector of these two distribution, it's going to be 0.5, 0.5 and 0.5, 0.5 independent of the value that we manipulate. So at least something happens compared to the unprotected case where we only needed to estimate the mean. Here we need to estimate the covariance of the distribution. And what's convenient with respect to parallelism is that of course this reasoning extends to my example where the adversary can observe a sum of the share. So here, for example, the adversary has one leakage sample that is a sum of the two share plus noise and the two univariate distributions are like that. So again, either the adversary observes zero, two, or always one. And what's going to allow the adversary to distinguish the distribution is not the mean because the value is one anyway. It's the variance which is different in the two cases. And this gives motivation for formalizing something that's pretty well known I think in the chess community which is that we could say an implementation is secure at all in this bounded moment leakage model if all the statistical moments of the leakage distributions are independent of any sensitive variable manipulated by the target device up to order O of course. Now we have this tool to reason about parallel implementations. The first question that we ask ourselves is whether probing model was still relevant in this case. And we answer this question in the paper with a kind of abstract reduction which says informally that if you have a parallel implementation it's going to be secure at order O in this bounded moment model if its serialized version is secure at order O in the probing model. And essentially we will consider the probing adversary who can probe up to D minus one wires and the bounded moment adversary who can observe any sum of the shares. So I guess intuitively it's kind of expected that this holds because summing the shares in the wheels is not supposed to break the independence condition. What I think is interesting is that we have significant differences between the two models. One is somehow useful is the fact that this bounded moment adversary can sum over all the shares. So it means really we don't need to exclude one share like in the probing model. We can really give everything to the adversary but he can only observe the sum. And this is going to be very useful later on to argue about continuous security. And of course we need to pay something for that and what we pay is the fact that the bounded moment security notion is strictly weaker because we know reason about statistical moments and not about the full distributions. So what does it mean concretely? Essentially it means that if you have an abstract implementation that you push secure in the probing model and also secure in the bounded moment model because it's implied. And on the top of this you have a concrete implementation such that you have physically independent ligatures. So the implementation is not mixing the shares in a strange manner. Then bounded moment security directly extends to actual measurements. And I just show here an example with three shares on the upper left figure you have a concrete trace. All the other ones are what we call detection. So we try to extract or to detect information depending on the means, the variance, and the skewness. And what we see here is that only the skewness leads to clearly detectable pattern. So only the skewness is providing information to the adversary. And that actually answers the second question that we ask ourselves, which is how to test independence. In fact, testing the statistical moments is a good solution for that of a very convenient one because if you don't have that you can clearly claim that ligatures are not independent. Which is again something that has been used intensively in the chess literature for many years. Now, we know that we can manipulate shares in parallel. So parallel masking is allowed. We know what we can hope for that like bond and moment security and probably noisy leakage security. What I'd like to do now is to argue a little bit about what we can gain in terms of performance. And I will illustrate that with multiplication algorithm and starting with a serial multiplication inspired from the paper of Fischer-Eissach and Wagner in 2003. And here what we do is multiplication between two shared secrets A and B. And we went to learn the three shares of C. And intuitively the algorithm essentially works by computing partial products, refreshing these partial products by adding new randomness and then compressing. Now, if I apply this to the ASS box, so the ASS box is of course 8-bit wide. And for simplicity I will consider a case with H shares. The implementation will work such as we put in every 8-bit register 1gf2 to the 8 elements and then memory is going to be proportional to N times D and time proportional to D square multiplications in gf2 to the 8. And of course that's the squadrating time over H is what really is expensive in practice. And apply to the ASS box, it means of course that we will need three multiplications to implement the ASS box. So what we propose in the paper is essentially to tweak this algorithm by better interleaving and regularizing the operations. So for example what we do for three shares is we are going to separate the computation of the partial products and the refresh and the refresh is going to be very simple. It's only adding fresh randomness or XORing within and rotating the randomness. So that overall the only thing that we need to implement that is XOR operations and operations and rotations which are all very efficient in concrete devices. If we do that for the ASS box, same story, it's an 8-bit ASS box. I will take eight shares. The main or the only difference is fact is previously we were storing one share in an 8-bit register. Now in one 8-bit register I store eight shares and the shares are all in gf2. So essentially we move to a bit slice representation of the ASS box. Memory is unchanged, of course. We still require N times D. But what's interesting is that time is now going to be linear in D because essentially what we need to perform is these N operations on the columns and this can be done in parallel for all the shares and in one clock cycle. Now if you want to apply that, of course, the ASS-bit slices box requires more multiplications, more N gates, like 32 for the moment I think. So essentially the message is that we expect that we will have nice performance gains if we increase the number of shares like 8, 16 or 32 if we take current devices. And I would argue that this is typically the number of shares that you need if you want to claim high security levels. Like if you want to claim that you have security up to the 50 or to the 60 measurements, this is something completely realistic. Of course we analyzed the security of this new multiplication algorithm. In particular we looked at a slightly stronger security notion than probing security which is a strong non-interference, also discussed by Jill in the morning. And this is an interesting notion because it's again composable. So if you show that for one small gadget it means that we can put all sorts of gadgets together, build an implementation and we are going to preserve security. So we show in the paper that by iterating D minus one divided by three refresh algorithm we have something SNI up to D smaller than 12. And for multiplication it's a bit more tricky. So as such multiplication is only SNI for D equal three. But if we compose the refresh and multiplication algorithm it remains SNI up to D equals to eight. And we also gain there a little bit of randomness compared to the algorithm of Isha Isha Hay and Wagner. And of course all this is based on an automated tool. So it would be very interesting to make a mathematical proof that holds for any possible order and maybe to improve that. So far we showed that we can mask in parallel. We showed that we can gain performance and we have a nice possibility to implement that in software devices by manipulating things with simple and because of operations. I would like to finish by discussing a little bit the fact that the bounded moment security or bounded moment security does not imply probing security and what does it mean? So we'll start with specialized encodings and as I said probing security is stronger than bounded moment security. It's also stronger than noisy leakage security. The question I'd like to ask is whether it is sometimes too strong. And of course you could say a model is never too strong which is somehow true. What I exactly mean is the question is somehow does it sometimes happen that we break something in this probing model despite the gadget is expected to have some kind of practical security against the DPA for example. And of course if we take the Boolean encoding the answer is no because we've seen previously with two shares, the mean is going to be independent of the secret, not the variance and this corresponds to the number of probes that you need and if you increase the number of shares this will scale. But interestingly we have other types of encodings in the literature, for example inner product encodings. In this case what you do is you do masking in larger fields like GF2 to the eighth. And if, I will assume for now that we have a special type of leakage function that doesn't mix the bits within this GF2 to the eighth field. Then we can show that for this inner product encoding which is essentially encoding as the near product between a public vector P and a secret vector S. It is possible to have security or the security order in the bounded moment model that is larger than two, so larger than the number of shares. Here it's an example in GF2 to the eighth where we can go up to three. So that's nice because it shows that at least there are some properties that seem relevant to practice like this bounded moment security that are not at all captured by the probing model. So here at least the fact that it's weaker gives us something. I would say even more interestingly it has nice impact in the context of continuous attacks. So what do I mean here? So far essentially I discussed what I would call one shot attacks but this contradicts a little bit what I said in the introduction because I said in the first light side channel attacks are essentially continuous attacks where the adversary measures more and more and essentially accumulates information about the secret. And a typical issue that happens in this case is if you take this very simple refresh where we add a share of zero which is the refresh algorithm that I gave previously and this is I think frequently used in practice because it's very cheap and nice to implement. This refreshing gadget is actually insecure in the continuous probing model and the question that I would like to ask finally is what does it mean in practice? So can we sometimes use such a simple refreshing gadget? And to give intuitions about that I'd like to maybe first describe what is this continuous probing attack? So the target is really this refresh algorithm where we have a secret A. We absorb with a randomness and we absorb with a rotation of the randomness. Initially I have four shares but it holds for any number of shares and no information. At the first step I will probe one wire, okay, this one, the first share, I accumulate this and then I do the refreshing. So I absorb with the random vector, the rotation of the random vector and I have A2. And what the attack will always do is probe only three wires. So three wires independent of the, I do that here, I take these three wires. What's easy to see if I absorb the first line I'm actually able to deduce the first bit of the encoding in the second step. And since I probed the second one, I can accumulate and at the second step I have two bits of my secret, of my shares. If it goes on, first step, again I will probe three wires and I will do the same type of things. I'm going to absorb the two first lines of the implementation. We observe that these two bits cancel each other and essentially I can learn the absorb between the two first bits after three steps since I probed the third one then essentially I can learn three bits. And if I go on like that, essentially after the iterations the adversary will be able to learn the secret in full. So that's annoying because that seems realistic and what's very interesting I think is that these type of attacks are not possible in the bounded moment model. And that relates to something that I said earlier intuitively. So the key point is that this attack that I showed here in the continuous probing model crucially relies on the fact that the adversary can adaptively choose its probes. But in fact if you're in the bounded moment model it's not helping so much because the adversary can anyway have all the probes or the sum of all the probes so adaptation doesn't have much impact here. And of course we discuss in the paper the details. It doesn't mean that you can always use this type of refreshing algorithm but one nice application is that this linear refresh can be used to refresh the key of key homomorphic encryption scheme. And that means that if you have that you would have fully linear overheads. That's at least interesting. And it leads me to the conclusion. I think one first conclusion is that maybe the most important one probing security is actually relevant to parallel implementation. So all these nice things starting with abstract probing security still holds in this case. And I would say this bounded moment model formalizes or suggests a principled path to security evaluation where you would start from something abstract then use bounded moment security to test independence and finally quantify the noise to get noisy leakage security. In this paper we really focused on the connections between probing and bounded moment security and of course there remains noise work to be done in the connections between bounded moment and noisy leakage security in particular what are the conditions that make these two things equivalent. One other fact that we showed is that parallel implementations are extremely appealing for masking in software implementations and this is because they really leverage the memory that you anyway need to store the shares. So it's usually something useful. And finally I would say the last example this is nice topic for further research but this continuous probing thing probably suggests that continuous probing security is sometimes too strong to capture the security that we have in the case of DPA. So that makes also a slight difference between probing and concrete sectional attacks. Thank you.