 I will now present protecting against statistical ineffective fault attacks, which is a joint work with Ion Daman, Christoph Doppraunig, Maria Eiches-Eder, Hannes Gross and Florian Mendel. Now, if you want to use crypto in the wild, then you usually want to make sure that you use mathematical secure cryptographic teams. And especially in cases where attackers could have physical access to your devices, you want to make sure that you feature some additional countermeasures against implementation attacks, such as power analysis or fault attacks. Now, in this talk, we take a closer look at one specific type of fault attacks, namely statistically ineffective fault attacks in short CIFAR, which were first presented at chess in 2018. Now, CIFAR has a couple of nice properties. For once, it works pretty much out of the box against all kinds of cryptographic schemes, including block ciphers and AEAD schemes. It can also circumvent typical countermeasures against fault attacks, such as redundant computation and infection. And on top of that, you only need one fault injection per cipher execution to mount this attack. Now, in a follow-up at Asia Crypt, in the same year, it was shown that CIFAR cannot only circumvent fault countermeasures, but it can additionally also circumvent masking, even higher order masking, or TI schemes, which makes it a quite compelling choice for attacking combined countermeasures. Now, at the time, the proposed countermeasures involved using some kind of error correction or a lot of hiding or just self-destruction. And since then, many papers proposed different trade-offs that utilize error correction, but they turn out to be rather expensive, especially when combined with masking. And it was not quite sure how much error correction is actually necessary to provide practically protection. It's also not always clear whether or not these countermeasures also deal with DFA attacks, which you probably also want to protect against in practice. So we propose more efficient countermeasures against CIFAR. And the strategy that we follow is that we essentially perform a very careful combination of redundant computation with masking. And this results in a low overhead for lightweight schemes and, let's say, moderate overhead for more party schemes like AES. Now, before we talk about countermeasures, I would first like to explain how these attacks actually work, starting with statistical fault attacks, which were first proposed by Fouret Orr in 2013. Now these attacks essentially exploit the fact that AES is a pseudo-random permutation, which essentially means that if you encrypt a set of different plain texts and you receive the corresponding set of CIFAR texts, and you look at one byte position of all of those CIFAR texts, then the distribution of this byte should be uniform. Now this is not only the case at the output of AES, but is also the case if you, for example, look at specific state bytes in round nine, which would be more interesting for our attack. So statistical fault attacks assume that an attacker is able to perform a repeated fault ejection in the encryption of our plain texts. Instead, the distribution of certain bytes within round nine of the AES state is not uniform anymore, so there is some kind of bias in the distribution. So in our concrete example here, we assume that we inject faults that cause a bias distribution of the first state byte here in round nine. Now when it comes to these fault injections, the attacker does not necessarily need to know what he is doing. He could use stuck at faults, bit flips, random faults, and most notably, he does not need to know the bias that he is causing, so as long as there is some kind of bias, he is good. Now also notice that if we, for example, modify one byte here in round nine, then this difference will propagate to four bytes in the CIFAR text due to the mixed carding separation in round nine here. So next, let's observe that we can actually calculate four bytes of the AES state in round nine from four bytes of a CIFAR text and the corresponding four byte keygas in round ten. Now we can use this for a key recovery as follows. Let's take a set of CIFAR texts and make a keygas. For now, let's assume we make actually a correct keygas of this four bytes subkey here. We will calculate back the value of certain bytes E of the AES state in round nine. And now, if we make the correct keygas, we will actually observe the bias that we have caused with our fault injections. However, if we make a wrong keygas here in round ten, then the distribution of the state bytes here will always be uniform. And this is essentially what we can use to determine whether or not a keygas is correct or not. So what about redundant computation? On first glance, they should fix the problem and it kind of does. So let's have a look at this. So here in this example, we now execute the encryption twice and we assume that the attacker is only able to perform a fault injection in one of the redundant computations again in round nine somewhere. Now the redundant computation ensures that the faults that actually change the output of this AES encryption here never make it to the attacker because it does not match with the corresponding redundant computation. So the attacker will never get to see any faulted computations in this example. However, while redundant computation does prevent statistical fault attacks, it does not prevent a variation of this attack which is statistically ineffective for the tips. So let's have a look at this. Now let's for simplicity assume that we as the attacker perform a stuck at zero fault again in round nine at this one byte location here. Now if this stuck at zero fault actually affects the outcome of the computation, then this effect or this computation is actually filtered out by our redundancy countermeasure. However, if the byte was already zero prior to the fault injection, then our fault is essentially ineffective and from an attacker perspective you still get a ciphertext that will show a bias here in round nine. And then we can simply use the same exploitation strategy as shown before. So again we collect a couple of ciphertexts, this time they are correct and not 40. We make again t-guesses and for a correct t-guess we will again see a bias here and for an incorrect t-guess we will again see just a uniform distribution here. So this is how you can use statistically ineffective fault attacks to circumvent countermeasures that are based on redundancy. Now what about masking? Because at first glance it seems like masking fixes the problem and it kind of does. So let's first take a look at what happens if you perform a fault injection in the linear layer. So here we assume that the attacker targets one of the redundant computation and only affects one share of the mass implementation here. Now then we have the case that if we only affect one of the shares then there will obviously the other share that will pull back the distribution of the underlying native value to a uniform distribution. So using fault injections in the linear layer of a masked and redundant computation will actually not be exploitable via cipher. However the situation is different if we consider fault injections in a non-linear operation such as in AS, the sub-byte separation. And in order to understand why we need to take a closer look at what happens within a non-linear operation and to keep things simple we just take a look at the masked end gate. So naturally when we consider end gates if the inputs are distributed uniformly then we expect a bias at the output because this is essentially how an end gate works. Now this is also of course the case for masked end gate and if we now assume that an attacker causes some kind of difference in one of the shares of only input so in that case x0 so the difference is between this computation and the corresponding redundant computation then we can observe that this difference cancels out for example in the case that both shares of y are zero because in that case the difference does not propagate through those two end gates here. Similarly if both shares of y have the value one then this difference will propagate through both of the end gates and cancel out later. So in other words we know that if a fault is ineffective then we actually learn something about the native value of y because the condition whether or not the fault actually propagates through the input through the end gate depends on the native value of y and if we if we can basically probe whether or not y is zero then this essentially allows us again to create some kind of bias at the output because if we only get to observe computations where y is zero then also of course the output of the end gate for example is biased towards zero so there is actually only zero that's the only possible outcome now and in the following I will refer to such a scenario as a dangerous fault because not all fault locations are potentially exploitable we are SIVA but this one for example is so to sum this up briefly SIVA can actually circumvent both masking and redundant computation and in principle adding more redundancy doesn't really help because we only perform one fault injection and we only exploit correct computations so more redundancy doesn't make any difference here and as we've seen before masking doesn't work and this actually also holds for higher order masking so adding additional shares does not alleviate this problem so in the remainder of this talk I will now explain how one can actually counteract SIVA using again a combination of masking and redundant computation but a very careful combination okay so first I would like to start with a bit of notation that I want to introduce so in the paper we express a SIVA as a circuit which basically takes as input an array of errors and also produces such an output then we define so-called subcircuits that basically take either as SIVA's input as input or another subcircuit's output as input and we can recursively perform the splitting until we end up with so-called basic circuits and they essentially only consist of very simple operations such as addition and multiplication now on the high level what we now want to do is we want to build a CIFA circuit that allows us to detect all those dangerous faults that could be exploitable via SIVA and in order to do this we want we could try to make sure that such fault injections can be detected either at an S-box output or even better at the CIFA output now in order to do this we want to start with a traditionally masked and redundant CIFA circuit but now we additionally require for each basic circuit that it only operates on an incomplete set of shares and we optionally also require that these basic circuits should be permutations now this is not strictly required but if we do not have this permutation property here then a little bit more manual work would be probably needed to ensure that all the dangerous faults can be detected now in the case that we want to use permutations as basic circuits we could either use linear functions or if we want to model a non-linear function this permutation should be a variant of the so-called TOFOLI gate which is the simplest non-linear invertible function so here on the right we have an example of what a TOFOLI gate looks like it's a function that takes as input free bits and also produces a free bit output so ABC and the only thing that it does is it performs an XOR and an AND operation and again because we have an AND gate it's non-linear and it's also invertible now for sake of completeness I've also put here a picture of the masked variant since in the end we want to build masked circuits so this is how the masked version of the TOFOLI gate looks like so let's now have a look at the concrete example that is a little bit larger than just an AND gate and for this we want to take a look at the so-called free bit key SBOX which is the smaller version of the five-bit catcher SBOX so let's first observe that we have actually a similar problem as shown before for the masked AND gate so if we assume that we induce a difference in the computation here again a difference to a redundant computation then we can observe that this difference propagates into free AND gates and they are indicated here in blue and we can also observe that for two of these AND gates the other inputs are the two shares of B so this is again a problem because if we cause a difference here then it cancels depending on B0 B1 and C1 now C1 is not really interesting because here we only learn at most one share of C but again this already is a problem because we essentially know that if we cause a difference here and the computation is correct then we basically know that B was 0 and this creates a bias again at the SBOX output now let's take a look at a slightly different version of this free bit key SBOX where each of the basic circles are now incomplete but not limitations now again why do we want to do this because we want to ensure that all dangerous forms are visible at the SBOX output so let's have a look at what happens now this is again a slightly different version of the previously shown version and what happens here is again we have the case that a difference propagates to our free AND gates but it now also propagates directly to the SBOX output so here it can be now detected via ordinary redundancy count the measure for example at the SBOX output or even further at the CIFA output so some additional remaps here this presented approach can be implemented actually quite efficiently and for lightweight SBOXes there is no noticeable performance difference for example in the case of key 3 we do not notice any difference to an ordinary masked key 3 SBOX and this approach can also be implemented without any fresh randomness because for example in the case of free bit key you can write this SBOX as a three times repeated application of the TOFRI gate now in the paper we do a little bit more we also prove that our approach is applicable for all 3-bit SBOXes and actually many 4-bit SBOXes and we also show applicability for the 5-bit key SBOX again this is the catcher SBOX and a few equivalent versions of these SBOXes now at this point you might ask yourself what about even larger SBOXes for example the one that is used in AES and well here we can also make use of the TOFRI gate but now we use it for bigger fields so on a high level we now will build the SBOX description of AES that is for once based on Canva S description but also grabs some ideas from Sugavaras SBOX description that was presented at chess last year so essentially what we do is we we take Sugavaras a year of his free share implementation and we convert it to a two-share implementation of the AES SBOX description and we do this essentially by replacing all the multiplications in F2 to the N by corresponding TOFRI gates that also operate in the same field and use an additional input that is set to zero now here on the right we can see a description of an AES SBOX that only relies on basic circuits that are incomplete and limitations so we have a couple of inputs here we have X which is the ordinary input to the AES SBOX so 8 bits and additionally we have the inputs A B C and D in total 18 bits that are hardwired to the value of zero now again we need this additional zero inputs such that they form a TOFRI gate together with the multiplications that occur in the SBOX description such that in the end they act as commutations now if we take a look at the outputs we again have Y which is the ordinary AES SBOX output so 8 bits and we have additional outputs E F G and H now if we consider a mass version of this SBOX description then intuitively X and Y so input and output are now twice the size so 16 instead of 8 bits now in case of the additional inputs A B C and D we actually still only require 18 bits here but we now require them to be random instead of zero and the main reason for that is that we now require sharings of zero and the easy way to get a sharing of zero is to just randomly generate one share and then to clone it now this does sound kind of expensive but it is actually not that bad because we can reuse one share of each of the additional outputs as the additional inputs in the next SBOX layer so this is quite nice because we only need randomness for the initial inputs and after than that we can simply reuse the randomness for the next SBOX layers also we do not need any additional randomness during the computation of the SBOX there is one downside though here we cannot really do anything with for example to share E1 so the other share of E and this basically means that we need to discard it but this is a problem because E1 could contain information about the fault injection so to cope with that we are forced to include redundancy checks after each SBOX computation here okay some final remaps in the paper we give a complete description of the ASS SBOX that is resistant to a seafarer as long as the attacker does not perform more than 140 injections we also discuss some additional implementation aspects for software and hardware so for example in the paper we mainly discuss seafarer protection in terms of circuit descriptions and in the paper we also specify or give recommendations how you can map these circuit descriptions into concrete software or hardware implementations we also present an alternative countermeasure strategy against seafarer that utilizes fine-grained redundancy checks now this is one advantage that this countermeasure strategy can be generalized such that it also protects against attackers that perform multi-fold seafarer so seafarer where the attacker just performs multiple fault injections in the non-linear operations but the downside of offering protection against multi-fold seafarer is that the countermeasure becomes quite expensive now as a side note you do not necessarily need masking or redundancy or error correction to protect an implementation against seafarer in fact one could also use motor-level protection against seafarer as it is done by the AEG teams RIGASCON or ISAP that are currently competing in the second round of the NIST lightweight competition thank you very much for your attention