 Welcome to this presentation about provably secure hardware masking in the transition and glitch-robust probing model. I am Gail Donkassiers, and this is joint work with Francois Xavier Stauda, my advisor. If you look at the approximate timeline of masking, all starts with the invention of the section attacks. Then people developed some empirical quantum measures, which were then studied in a more formalized model, which then enables to prove the security of some quantum measures. Finally, if you want to have a very strong sense of security, we'll sometimes want to have formal verification of the quantum measures and of their implementation, and finally, have some automated generation of the quantum measure. So this is sort of valid for both masked software, glitch-robust hardware for which we have already seen, for instance, in previous chess editions, some works that want to make formal verification or even automated generation. And in this work, we study hardware masking is the slightly more powerful adversarial setting, which is the transition-robust masked hardware. And we aim to design and prove the security of a masking scheme. So all starts with our circuit model, which is quite abstract, our arithmetic circuit, in which we have a combinatorial gate, such that here, the XOR gate or the N gate are in arithmetic circuit addition and modifications. I will use both interchangeably. And we also have the sequential notion of registers, which in practice are implemented by flip-flops, and that corresponds to a sequential evaluation of the circuits. Here they are represented by these boxes. So now that we have our circuit model, we can analyze how to mask this example circuit. So at the heart of masking, there is the notion of secret sharing, where you replace a secret bit A with a set of shares that are taken randomly, such that there are some, or the XOR, is the secret itself. So here we mask or share the two input sharing with simply two shares. Note that we have our wires that are shared. We need a masking compiler that will replace each wire by a sharing and each logic gate in the original circuit with a masked gadget that performs the same operation as the original logic gate, but in the unmasked domain. So here we have first a simple XOR gadget that's simply made of N XOR gates. For register, that's even as simple. We just replicate the register as many times as there are shares. And finally, for the N gates, a bit more complex. There have been many work, but here is an example of N gates. I will not go too much into the detail about specific gadgets or multiplication gadgets in this presentation. If you want more details, all of them are in the paper. The security of such a masking scheme can be evaluated in the T probing model, whereby the adversary is able to get the value of T chosen wires in the circuit. And we say that the secret is T probing secure if no adversary can learn anything about the secret input or the secret. The T probing security is hard to verify. First, because its computational cost of verification grows exponentially with the size of the secret, and second, because it is not composable. That is, if you take one gadget that is T probing secure, another gadget that's also T probing secure, and you connect both, the result might not be T probing secure. So usually, we'd like to use a stronger notion, which is called simulatability, and that can be used as a basis for composition in such a way that you can first evaluate the security of individual gadgets and then evaluate how they are connected together in order to prove the security of a larger secret. And simulatability is a broad framework that has been used in many concrete instances. For instance, it covers the definition of n-i non-interference, strong non-interference, spiny, and so on. Concretely, simulatability works by having a set of probes. In this simple example circuit, z is a probe, and asking the question of what are the inputs to the circuits that are needed to be able to simulate properly this probe, assuming that the distinguisher that has to distinguish between our simulation and the real circuit knows the values of all the inputs of the circuit. So in this case, if you want to simulate z, you will need to know both input x and y. In this other example, where you observe an input x with a random bit, then you don't need to know anything about the input as the probes appears to the distinguisher as just a fresh random value. Then if you look at this AND gate, you don't have this masking effect of the randomness, and you will need the value x to simulate. And a last example would be where we use a randomness in two distinct cases. And in that case, if the adversary observes both z1 and z2, you will need to know x and y to simulate. This can be seen by observing that the XOR between z1 and z2 is equal to the XOR between x and y. So in summary, this notion of simulatability is the key for composability because it helps you replace the knowledge of probes by just the knowledge of some inputs of the gadgets. And once you know that you need only to know some inputs of the gadget, you can forget about what's inside this gadget. Know that your equity is this framework of simulatability. Let's see how we can make concrete T probing security proof. So the intuition for our proofs is the share isolation. So when we have share isolation in circuit, the circuit is split into one part for each share, and those parts don't communicate. So in this example, if we forget about the great part, we have one green share and one orange share, and those are not interconnected. And here we have two shares. So we have security against one probe because if the adversary probes the green part, it doesn't learn anything about the orange part, which means that it doesn't learn nothing about a1 nor b1, which means that it doesn't learn anything about a nor b. And this extends simply to higher order as long as you have more shares than you have probes. And this share isolation notion is composable, and it is further more trivial to instantiate linear gadgets that are share isolating. However, this is not the case for non-linear gadgets. For or and gates, that is great here, it has actually interconnections between the orange and green parts. So the idea for security proof is to have some simulated isolation. And that's what we called BINY by probe isolating interference. Let's now look at the definition of BINY. So basically definition of BINY relates to how we need inputs to simulate probes. As a source of inspiration, let's look at how it happens for share isolating gadget. In a share isolating gadget here on the right, if you have to simulate an output probe whose share index is 0, we need possibly all the inputs in the same circuit share, that is x0 and y0. If on the other hand, we have a probe inside the gadget, it will be inside one circuit share, inside the circuit share 2 for this example, and we will need the inputs of circuit share 2. And if we need to simulate both jointly, we'll need all those input shares. For BINY, it will be a bit of the same. So if we have first one output probe, we will have the corresponding input shares to simulate. If on the other hand, we have an internal probe, since there is no share isolation inside the circuit, we will let the simulator choose what additionally it needs as input share index. So in this case, he choose the share index 1. And that's how BINY works. And basically, the only difference is that we kind of blow the line inside the gadget about where the internal probes are. But from the outside, it doesn't matter. You could make the upper boxes disappear, and it would basically behave the same from the point of view of composability. Let's go physically. Optional, we have been analyzing arithmetic circuit, but in the real world, we have more CMOS logic. That is, we will have in practice glitches. This means that when you probe one value, after a transition of one clock cycle to another, it will not be stable. It will move. And the value that you will take at this point can be kind of arbitrary functions of the input of the combinatorial circuit. So in our model, we'll have an extended probe that if it probes this wire, you will get as values for the adversary all the values of the input wires to the combinatorial circuit. Then we also have transition. So if we execute a circuit for two consecutive clock cycles, first clock cycle, second clock cycle, then one probe might capture the transition between two states of the wires. And using only one probe, the adversary might learn information about those two states. So this gives us a second kind of extended probes. And of course, both transition and glitches can be combined together. And then the adversary learns all the input of the combinatorial circuit for two consecutive clock cycle. Let's now briefly discuss the glitch-only case. This is a case we already tackled in our previous Hardware Private Circuit paper. The intuition for this is that if you have a share isolating circuit, then you have no problem with glitches. Simply said, one glitch or one set of glitches only propagate into one circuit share. So one glitch-extended probe will not teach you more in terms of number of circuit shares than one plain old probe. So the old argument about counting circuit shares still applies. So we can try to translate this to piny by making glitch-robust piny gadgets, which amount to saying, OK, the extended probe that you have still due to glitches inside of your gadgets are not a problem. You still have this piny property. And then once you have that, we can have a composition theorem to handle the glitch that are crossing multiple gadgets, and for which you cannot handle them at the level of a simple definition of one gadget. And this composition theorem is basically that if in the non-glitchy case, you have a valid composition or secure composition based on a simulatability-based definition, and you take glitch-robust gadgets, such as take piny gadgets decompose. Now you take glitch-robust piny gadgets. This gives you a glitch-robust piny composition, as it is a glitch-robust probing secure. Let's now move at the transition-robust case, or the glitch-plus-transition-robust case, which doesn't matter much at this level of abstraction. So the good news here is that if you take a share isolating circuit G and you have a transition, you don't have issue. Because since this wire on which you have the transition is in the same circuit share for both executions, it asks you for the same circuit share on the input, and then our intuition still holds that you have more circuit shares than probes. So if you go for piny, we can do a transition-robust piny, which will take into account for transition, where both part of the transition-exended probe will be inside one execution of the gadgets. Then for comparison, however, there is a bad news, which is that if you look at this example, let's take a transition-extended probe. That happens to be in two executions of the same gadget as previously, but in this case, we make the gadget loop itself. That is, we have a first execution here, and the output goes back to the gadget, which is represented here for the second execution. And there, if we want to see how we can simulate this probe, we first need to simulate this part of the probe, and since we have transition-robust piny, we will need probably one circuit share to simulate this. Then in order to simulate the second part of the probe and the output for the first execution of the gadget, according to the transition-robust piny definition, we might need two circuit shares. And now we have a problem, since we have more circuit shares needed to simulate than we have probe, so we have two versus one. And that basically means that we cannot compose transition-robust gadgets. However, we have this intuition that all these observations that share isolating gadgets still work, even if you have transitions. So the real cause here is a mismatch between the properties of share isolating gadgets and the properties that we require for piny. So concretely for share isolating, let's go again on the second execution. If you ask to simulate for one probe, you will, of course, need to know one circuit share, but then for true share isolating gadget, you will automatically be able to simulate, to get the value for the corresponding output shares. And this is an additional property that is not satisfied for piny. And if you pack it into piny, which is the definition that we called opiny, so you add this property on top of piny, you get the following result. For this second execution, you can simulate the output. Okay, you don't care about it, but this means that no, based on this input, you know that the simulator will be able to simulate both the probe and the corresponding output, which is exactly what you need for the previous execution of the gadget. So you can simulate, no, the full probe based on only one circuit share. And actually, this generalizes well and we have a composition theorem that says transition robust opiny gadget compose, that is, you can compose them and get still an opiny gadget. So that's one solution to get provably secure gates and transition robust composition. However, there are some costs as you need to make your gadget not only piny, but no opiny, which is a stronger property. Therefore, you might need more cost in terms of area, randomness, or latency. Another solution that we investigate in the paper is to use pipeline bubbles. ID would be to completely avoid those transition or actually make them benign. That is, we'll make it such that when we have a transition between two states, one of the states will be some public or nonsensitive states. Therefore, the transition probe will not learn you more about the secrets than would the non-extended probe. And we do this by having a kind of cleanup cycle, every other cycle. So we put an empty state or non-secret state between two secret states. And this makes that you do not need any transition between two sensitive states. And you do not need extra hardware to do this, except maybe a bit of control. But the main cost of this is that it lowers the throughput. So concretely, if we take an example here, let's imagine we want to evaluate an SP and a block cipher where we have an implementation of an S-box. That's a two-stage pipeline. So let's imagine that there are only three S-boxes in the block cipher. And we implement them in a serial way. That is, we only implement one instance of the S-box and we evaluate it sequentially. So let's look at the first round. So R0, we take the data of our first S-box there. So that's we have the computation there. At the next cycle, we put a gray box. That is, we put a nonsensitive state like useless computation. And of course, our state is now at the second stage of the pipeline. Now that we've cleaned the first stage of the pipeline, we can go with the second S-box. And the pipeline bubble propagates so that we don't have a transition between sensitive state there. And so on, and so on, we can do for it for the whole three S-boxes. We put again an empty cycle and we can go on for the second round. So as you can see, every other cycle, we just put useless data in this. So this costs us a factor of basically two in throughput. But we don't need extra hardware. So that's the basic solution. And then we investigated an alternative, which is to amortize those bubbles in order to improve the throughput. So the core idea for this is the original observation that this parallel piny transition robustness is still sometimes okay. You sometimes don't need opiny to compose and this is the specific case where you have parallel composition. That is in your abstract arithmetic circuit when the two evaluations of the circuit are in parallel, exactly like the S-box of an SPN. And there, if you have a transition extended probe between the two evaluations of the gadgets, since those gadgets are very similar and you don't have an input output dependency between them, you can show that this parallel composition is still a piny transition robust. And therefore we can allow transition to a career between those two execution as long as they are in parallel. If you go to the next round, you will need still a pipeline bubble. So practically, on our example, this means that we can directly feed in consecutive cycles the three S-boxes. And then only after that, we need a pipeline bubble before going to the next block cipher round. So this quite nicely amortizes and it's even better than that because usually if we take our examples of block ciphers, we see that in order to evaluate the second round, we need to have the output of the first round. So we cannot move this back to the left. It's just like dependent computational dependency. So you need anyway this pipeline bubble, which means that basically previous architecture, such as the one we built for the HPC paper, already satisfies this bubble condition. And this bubble is a kind of minimal cost that you will always have. So we can have glitch plus transition robustness security as exactly the same cost, actually the same circuit than the basic HPC that was only proven to be secure again glitch. So basically this observation works as long as you are in a kind of SPN evaluation. So summarizing on our results, we have three basic proposals. The opiniy proposal, the one where you put bubbles and you completely avoid transition between two sensitive states and the one where you amortize it. So across a few criteria, we can first look at generosity, the first two opiniy and bubbles works for any secret. This amortized bubble, you only gain in some specific secrets when you do many parallel evaluation. Then in terms of area, randomness usage latency, the bubble and amortized bubble do not cost more than a non-transition robust case. However, they will cost more for the opiniy case. And finally, in terms of throughput, you don't pay except in this bubble case. So as a conclusion, when it applies this amortized bubble proposal seems to be the best of both worlds. So if we concretely look at the cost, you know, we look at the opiniy proposal, the HPC one, which is equivalent to this amortized bubbles and the DOM, our cautionary notes on the DOM, we should know that it's not composable. So starting at some masking order, you will start to get composition issues and then broken security. So that's kind of just for reference, but I wouldn't recommend to use it for real-world implementations. So in terms of area, we can see that opiniy and amortized bubble or HPC are very close to each other. So the overhead of opiniy is quite moderate while we have a significant, but not huge overhead compared to DOM. And in terms of randomness usage, we can see that opiniy is a bit above the two proposals. Finally, looking at the latency, we can see that opiniy has a quite steep overhead over the HPC, which itself is a bit more costly than DOM. So just cautionary notes, this is only for current gadgets and in principle, if we had to design, for instance, better opiniy gadgets, those results could improve, but that's the best that we have for now. To conclude, my contribution is a set of probably secure glitch and transition robust masking scheme for hardware circuits that's applied to existing masking scheme in some condition and also for some new kind of gadgets. Natural future work for work is to go on to automated verification of the properties required on the implementation. I leave you with a list of topics that are in the paper and that I didn't have time to discuss here, which is some kind of more sophisticated gadget, either they contain loop or they can be modified during the execution, the discussion of gadgets for the transition only and non-glitchy case, and finally the construction for all of our gadgets. Thank you for your attention. I'll be happy to answer any question.