 Hello everyone, my name is Thierry Simon and in this presentation I will talk about FRIIT, an authenticated encryption scheme with built-in fault detection. This is a joint work with Leila Patina, Johan Damme, Vincent Grosseau, Pedro Mad Costa-Masolino, Costa's Papa Giannopoulos, Francesco Recazoni and Nils Samuel. Let me start this presentation by giving you some motivation for our work. Today there is an increasing interest about lightweight cryptography, as is illustrated by the recent NIST lightweight competition for example. This rising interest can be understood in today's world where we observe an ever-increasing number of small connected devices. Those devices often operate in a hostile environment where they might be subject to side-channel or fault attacks. We wanted to address this second type of attacks in particular by coming up with a lightweight cryptographic primitive that would integrate some fault detection. More specifically we wanted to do that using an error-detecting code. Using error-detecting code in cryptography is not a particularly new idea and in 2016 for example Schneider-Hettal proposed a party which is a hardware countermeasure against side-channel and fault attacks that combine a threshold implementation with some error-detecting codes. Another example for that is that no later than last year by Erle-Hettal proposed a craft which is a tweakable block cipher that can be adapted to different error-detecting codes. Now while using error-detecting code is not a particularly new idea our contribution lies in the fact that with Fritt we designed a lightweight cryptographic permutation that could be efficiently adapted to a specific error-detecting code by choosing appropriate building blocks. In the first part of this presentation we will talk about the generic strategy used to design a permutation for a specific error-detecting code and in the second part we will see together a concrete example for that with Fritt. The error-detecting code that we used for our permutation is the parity code 432. It maps any 3-bit message to a 4-bit value that has an even parity. In other words this 4-bit value that we will denote with ABCD satisfies the parity equation XORB, XORC, XORD equals 0. In order to encode a message ABC you just then need to add to it a 4-bit D equal to XORB, XORC and on the other hand to decode a 4-bit value ABCD you just map it to ABC ABCD is a valid code word meaning that it satisfies the parity equation and if it is not a valid code word then an error is returned. Now how do we use the parity code in order to build a fault-detecting permutation? Well let's imagine that we have pi, a 384-bit permutation then its state can be divided into 3 128-bit blocks that we will call limbs ABC we can then by adding one extra limb extend pi into pi-bar a 512-bit permutation that satisfies two different properties. The first property that pi-bar needs to satisfy is that it preserves the parity by definition that means that the input of pi-bar satisfies the parity equation if and only if its output satisfies it as well. In the context of permutation we will often abuse the term code word in order to designate a state that satisfies the parity equation with that convention this property can be rephrased as pi-bar must map code words to code words and non-code words to non-code words if this is the case then we will say that pi-bar abides the parity code The second property that pi-bar needs to satisfy is that it extends pi and by that we will mean that the restriction of the output of pi-bar to the first three limbs must coincide with the output of pi. If this is the case then we will say that pi is the embedding of pi-bar by the parity code Now in order to compute pi of ABC we will proceed the following way First we will initialize the parity limp d as d equals x or bx or c Then we will compute pi-bar of ABCd And finally we will verify whether the output a prime, b prime, c prime, d prime verifies the parity equation or not If no fault occurs then by the parity preserving property we know that the output must satisfy the parity equation And in that case we also know by the extension property that a prime, b prime, c prime must be the output of pi of ABC which is what we wanted to compute in the first place On the other hand if the parity equation is not satisfied by the output we know by the parity preserving property that a fault must have occurred and this is how we detect faults So far we've seen together how to detect faults provided that we can extend a permutation into a code abiding permutation But the missing information that we still need to address is how do we concretely build such an extension And the way to proceed is as follows So we will split the permutation pi into a sequence of step functions that can be of two forms So the first type of step function that we will consider will modify a single limb by XORing to its function phi of the state The second type of step functions simply consists in swapping two limbs The reason why we choose such step functions is that they can be efficiently extended into a code abiding step function And the first method to do that, which is also the most generic one and can always be used, is called limb adaptation Limb adaptation consists in recomputing phi a second time and adapting the parity limb by adding the output of phi to it In this method it's very important that phi is not computed only once but really twice independently Because if this is not the case, a fault injected in the computation of phi will lead to the same 40 outputs being added to two different limbs which would then compensate when adding the limbs together in the parity equation The limb adaptation has a performance overhead of 100% and while it might not seem great as it is the same as simply duplicating the step function for example it has the advantage that the size of the state is only increased by a third instead of being doubled But where we really gain in terms of performance is by extending a step function using the limb transposition method A limb transposition is just a reordering of the limbs in the code abiding permutation This is what happens, for example, when a step function replaces one limb by the sum of the others Since the parity limb D already contains the sum of the other limbs one can simply swap the limb to be modified with the parity limb in order to extend this step This is what we call a non-native limb transposition and it's particularly interesting because it can basically be implemented for free in the code abiding permutation while it costs too much in the original permutation Note also that this method is highly dependent on the error-detecting code that we chose for our permutation Besides non-native limb transposition, we also have a native limb transposition which is based on the observation that swapping two limbs does not affect the parity of the state In such case, the parity limb D does not need to be adapted What kind of faults can we hope to detect with the code abiding permutation? Well, obviously we will detect any fault that breaks the parity equation but more precisely, we are guaranteed to detect any single limb fault which is the term that we use for a fault that affects only one limb The code abiding permutation, however, might not detect every possible simple fault For example, it will not detect a fault that will flip two bits at the same position in two different limbs since the parity equation will still be satisfied Whether such a fault can indeed occur or not, highly depends on the architecture and the implementation In order to avoid such faults, one should try to separate the different limbs as much as possible in the implementation for example, by storing them into different registers The code abiding permutation might also not detect multiple faults For example, when the same exact fault is injected in the two computations of phi during a limb adaptation However, injecting multiple compensating faults requires for the attacker to have very precise control on the effect of the fault it injects And if this is not the case, then multiple or random faults are very much likely to break the parity equation and thus be detected On a side note, our model does not take into account faults that affect the verification of the parity equation but of course these faults need also to be treated carefully One last observation that I would like to make is that verifying the parity equation after each step function is not needed Indeed, if a fault occurs during a step function that breaks the parity equation and if no other fault happens afterwards, then by the parity-preserving property we know that the output of the subsequent step functions will also not verify the parity equation That is to say that a single check of the parity equation at the very end of the permutation is enough to detect any single limb fault I would even argue that checking the parity equation more than once at the very end of the permutation is not really worth it It has a cost in terms of performance but it does not bring much in terms of additional fault detection capabilities Indeed, the only reason why you would want to check the parity equation more often would be to detect for example multiple compensating faults over different step functions But in our opinion, if an attacker is able to pull that off he would also be able to inject multiple compensating faults in the same step function for example, which would not be detected anyway In the first part of the presentation we talked about how to design a code abiding permutation We will now give a concrete example for that with Fritt So what is Fritt? Fritt is actually the Dutch word for French fries Some other people think it also stands for fault-resistant iterative extended transformation but personally I like the first explanation much better Jokes aside though, Fritt is an authenticated encryption scheme based on the duplex construction in sponge wrap mode Its underlying cryptographic permutation is called FrittPC and has a state of 384 bits This is the permutation that we study when we want to know more about the cryptographic properties of the scheme But the actual permutation that we implemented is rather its code abiding variant, FrittPC The underlying cryptographic permutation FrittPC has a state that can be divided into 3 128-bit limbs, ABC It comes 24 rounds, each of them divided into 6 steps The first step of the round is delta, it consists in adding a round constant to a limb C The round also contains two transpositions, T1 and T2, where one of the limbs is replaced by the sum of the limbs Combined in the case of T1 with some limb permutation These operations help achieving faster diffusion by mixing the bits between the limbs The round function also contains two mixing steps, mu1 and mu2 Where the circular shift of one limb by some offset is added to another limb This operation also helps achieving diffusion by mixing bits between the limbs, this time at different positions The last operation of the round is XC, it is the only non-linear operation And it adds to a limb A, the bitwise end of the circular shift of limb B and limb C by some offsets All the circular shift offsets in the round functions were chosen in order to achieve faster diffusion Using the strategy that we described before, one can extend FritPC into the code-abiding permutation FritP As you can see in the diagram here, the step functions delta, mu and XC were extended using the limb adaptation technique Basically you just need to recompute the step function a second time in order to adapt the parity limb The step functions tau1 and tau2 on the other hand were extended using the limb transposition technique And it results in a single step, tau at the end of the code-abiding permutation, which is just a permutation of the limbs What is interesting to do here is to compare the number of bitwise operations between one round of the two permutations So you can for example notice that the number of circular shifts double going from 4 to 8 And it's also the case for the number of bitwise ends going from 1 to 2 While the number of exclusives are on the other hand remain constant and equal to 8 In order to evaluate the cost of our quantum measure, we implemented it both in hardware and software In the following table you can find easy performances for different configurations of Frit So here FritC stands for FritCompact, which is just Frit but with the non-code-abiding permutation We have one configuration where we compute one round per clock cycle And one configuration where we compute two rounds per clock cycle In this table we also compare Frit with K2Signore, which is an authenticated encryption scheme That was submitted to the Cezar competition and that has a quite efficient hardware implementation You can notice that a round of compact Frit is actually a bit lighter than one round of K2Signore Which translates in 27% less area being used for about 20% less power consumption in the one round per cycle version Both area and power consumption are quite similar when comparing K2Signore with Frit instead The biggest difference between the two lies in the throughput But this gap can be explained by the fact that K2Signore processes 32 bits of input per round While Frit only processes 128 bits per 24 rounds Computing two rounds of Frit per clock cycle costs about between 20 and 30% more area For only a marginal increase in power consumption And finally, implementing the code-abiding permutation instead of the non-code-abiding permutation Results in about 23% more area being used for about 29% more power consumption We also implemented both Frit P and Frit PC on ARM Cortex M4 And we tried to optimize this implementation for speed by using a bit-interleaved representation of the state In the table here, we compared the performance results for Frit with Zudu and Gimli Which are permutations that are used by two of the second round candidates for the NIST lightweight competitions There are two points that I want to make in this slide The first one being that Frit P offers performances that are quite competitive with Zudu and Gimli So if you look at the Cycles per byte result, for example, Frit P is nearly twice as slow as Zudu But only about 11% slower than Gimli And you have to keep in mind that Frit P actually offers some full detection capabilities Which is not the case for the other permutations The other point that I want to make is that implementing the code-abiding permutation Instead of the non-code-abiding one results in 36% overhead But actually the main reason for this slowdown is due to the fact that we needed to use additional load and store instructions Because the 512-bit state of Frit P would not fit into the 14 available internal registers of the Cortex M4 In order to validate the full detection capabilities of our permutation, we did two experiments The first experiment is more of a sanity check and consisted in injecting single-bit glitches into a simulated hardware implementation of Frit P We injected them both at the RTL level but also after synthesis And this experiment resulted in all the injected faults being detected In the second experiment, we injected electromagnetic faults on a single-run implementation of Frit P on ARM Cortex M4 We divided the chip as a 100 by 100 grid And we injected 10 faults per positions for each of the 10 grid scans This resulted in 1 million glitches And in about 86% of the cases we could not see any visible effects So basically we only received the expected output In nearly 14% of the cases, the glitches were too much for the chip to handle and the chip went into reset mode All in all, there were only 596 cases in which the output was modified But in all of those cases, the parity check was able to detect the faults To conclude, in this presentation we saw a new design approach for cryptographic primitives That consisted in choosing appropriate building blocks that could be efficiently adapted to abide a specific error-detecting code We saw a concrete example for that with Frit for the parity code 432 That allowed us to detect any single limb faults And also showed competitive performances both in hardware and in software What we did not discuss about, but you can find more about that in the paper Is how to generalize our approach to larger codes We also did not talk much about the authenticated encryption scheme But it's discussed more in the paper As well as for example the cryptographic properties of the permutation So diffusion, algebraic degree, but also linear and differential trails In the paper we also explained how Frit can be adapted in order to address statistical ineffective fault attacks So that's it, thank you for your attention And if you have any questions, I invite you to ask them during the live presentation at EuroCrypt