 Hi, my name is Jan Richter-Wogmann and I'm going to talk about five robust verification of countermeasures against fault injections, which is a joint work with Ein, Ries, Ayesha, Nizadi, Pascal, Sastrich, Ami, Moradi and Tim Ginesu. And as implementations of cryptographic algorithms can be broken by active fault injection texts, researchers came up with the plesora of countermeasures. However, the verification process of such countermeasures is often a manual and error-prone task. And therefore, we propose a fault verification framework that works on a gate-level net list in order to analyze countermeasures against fault injection attacks designed for FPGAs and ASICs. And to inject a fault in an ongoing encryption or decryption process, an attacker can use several techniques, which range from simple clock glitches or voltage glitches over more advanced techniques like using electromagnetic pulses or the high energy from a laser beam. And as I mentioned before, there are different methods to protect a cryptographic implementation against fault injection attacks. And the most simple one is to use temporary redundancy and encrypt or decrypt the same input several times. A similar approach, encrypts or decrypts the same input in parallel, which is done by spatial redundancy. More advanced techniques use linear error-correcting codes, where the input is encoded and processed in an adapted version of the cryptographic algorithm. And finally, especially in the last three years, there were many dedicated countermeasures against CFA proposed. And currently, the state of the art tool to verify these fault countermeasures is VARFI, which is the first automated open source cryptographic fault diagnostic tool, which works on a gate level net list. And as an input, you can pass a fault model to VARFI, which is basically, if you would like to inject bit flips or set or reset faults, then an adversary model which contains the number of faults you would like to inject, then the location of fault and the target block cycle. So you can really define a sub-model which you would like to fault and the target block cycle. And you have to define a set of input test vectors, which are used for the verification. And as output, you will get the total number of faults, the number of non-detected faults, and the location of the effective faults, and also the faulty outputs. But the problem is that you have to define the input test vectors, that you have to define a set of input test vectors that are used for the evaluation. And as an example, let's consider the present S-box, which is protected by a single parity bit. So at the top, we have the S-box, implementation S, and at the bottom, we have the redundant circuit S prime, which is used to compute the parity bit concurrently to the S-box computation. And what you can see is this S-box is not protected against single bit faults. So if you inject a fault, a single bit fault, the fault can propagate to two output bits, and if both output bits are toggled, the parity bit is not able to detect this fault. And this is the problem in WERPHY, because WERPHY can report faults' positives. And this is, in this example, this happens when we inject the red set of input vectors. So if we inject the green one, so 045DE or F, WERPHY can report or will report all single bit faults that can occur in the design. But if we select 1, 2, 3, 6, 7, 8, 9, A, B, or C, WERPHY is at least one non-detected single bit fault, which is not detected by WERPHY, and WERPHY would report OK, your design is secure, you can use it. But at the end, it's wrong, there are test vectors where the design is not secure. And this is why we developed or propose FIVAR, which is short for fault injection verification. And our verification approach also works on a gate-level netlist, and the gate-level netlist is passed into a directed isoaclic graph, and then we are working on the duck, we are working on BDDs, which are used to evaluate the circuit. And all of this happens in an initialization phase. And after this initialization phase, we go on with an evaluation phase where we use symbolic fault injection, and after a symbolic fault injection, we apply a diagnosis where we compare the correct model with the faulty model, and then we can report if the circuit is secure or not. OK, so let's step through the different phases and steps here. So first, we will create a deck. So let's consider this circuit here. And first of all, we constrain our set of gates which we support by FIVAR. So we have sequential gates, which are basically just registers. Then we have combinational gates, which are not NAND or NOR, XOR and XNOR gates. And then we unite all gates in a set G. So G contains all valid gates, which are supported by FIVAR. And then such a circuit can be translated into a duck. And first of all, we will introduce one single input node for each original input of the circuit. And also we will introduce an output node for each output. And then each gate is translated to a dedicated node. So for each OR gate, for example, we will have a node which is associated with an OR gate and so on. OK. So this is very straightforward and a common way if you would like to build a model from a digital circuit. But then we will build BDDs for each node in our duck. And what does it mean? So normally, we have our duck representation and now it's just a simple example here. So we have three inputs, X0, X1 and X2, and two gates or two N gates. And we will have an output Y. And then we can represent this function by a simple truth table or we can represent it by a BDD. And in this case, we will introduce one BDD variable for each input node. So X0, X1 and X2. And then the BDD looks as follows. So if we evaluate X0 and X0, X0, we directly know, OK, the output of Y will be zero and we can directly jump to the output zero. But if X0 is one, we have to evaluate X1 and then here's the same. If X1 is zero, we can jump to zero. But if X1 is one, we have to evaluate X2. And then if X2 is one as well, we will jump to the final output one. So this representation is the BDD of the second AND gate. And we will do this for every node in our duck. So first, we will start by introducing a BDD variable for each input node. And then we will go through the duck and evaluate each node of the duck. OK. And then we will go on with the evaluation phase. And first of all, how do we inject faults? So and here we are following a paper which we proposed also in this year where we revisited fault adversary models. And we say we can model a fault by replace a Boolean function. So in this case, let's consider we would like to inject the fault in this AND gate here. And then we say, OK, our fault model, for example, considers that we exchange the AND gate by an OR gate by a M-set or by a RESET fault. And then we will start by replacing the AND gate with an OR gate. And by doing so, we have to re-evaluate all nodes that lie in the propagation path of the OR gate. So all the red nodes here. And this means we have to re-evaluate all BDDs. And at the end, this results in a faulty duck D prime. OK. But as I mentioned earlier, fiber should prevent faults positive. And that means that we not only would like to evaluate fault injections over all possible or all valid input vectors, but also over all possible fault combinations that can occur in a given circuit. And to do so, we introduce a set lambda. And in the set lambda, we collect all nodes that are associated with a given location L. And by location L, we refer to sequential gates or combinational gates or both. And for this example, let's assume L is equal to C. That means our set lambda would contain all nodes from 7 to 14 and from 19 to 22. And of course, all nodes that will come after these last line of nodes will include it in lambda as well. OK. So this is the first step. Then in the second step, we will divide lambda into different sets, theta i's, where each theta i contains all nodes from one specific logic stage. So in this example, we have two logic stage, stage 0 and stage 1. And that means theta 0 will contain the node 7 to 14 and theta 1 will contain the node 19 to 22. OK. And then the next step is that we would like to incorporate the parameter n, which says how many faults do we inject simultaneously in one logic stage? So we collect in gamma i's the valid combinations up to n. So in this example, let's consider n is equal to 2. Then we build two sets, gamma 0 and gamma 1, which contains all valid pairs of nodes that are available in our circuit. So for example, for gamma 0, we have 7, 8, 7, 9, 7, 10, 7, 11, 7, 12, and so on and so on. And for gamma 1, yeah, the similar combinations. Now as I mentioned earlier, we inject faults by replacing a Boolean function. And we will cover all valid replacements here. So for example, let's consider we would like to fault the or and the end gate at the top. Then we will look up the fault mapping. So we say the end gate is mapped to an or gate or to a set or reset fault. And so or gate is, for example, mapped to a nor gate or a set and a reset fault. And all in all, this gives us nine different possibilities to inject the fault or nine different fault mappings, which just has to be, which just have to be tested for this combination 7, 8 of the node 7 and 8. And then our tool also supports univariate and multivariate fault injections. And yeah, if you just consider univariate fault injections, faults can solely injected into one single gamma i. So we first will inject all possible faults in gamma 0 and then in gamma 1. So this is very straightforward. But this gets more complicated if we would like to analyze multivariate fault injections. And here we say we can inject up to n faults in one single logic stage. So let's assume we would like to analyze b-variate fault injections, then we would be equal to 2 and we would inject two faults in the first logic stage and two faults in the second logic stage. So in total, we have 2 or v times n nodes that affected per fault injection. So in our example, four nodes. And this again increases the number of fault injection drastically. And as you see, not only the parameter v, but also the parameter n and the fault mapping will increase the number of valid fault combinations that need to be evaluated drastically. So therefore we came up with two optimization strategies. And the first optimization strategy exploits the optimization passes of each node, which is located in our fault model in our DAG. And here we do not reevaluate the entire DAG every time, so after every fault injection, not only the nodes which lie in the propagation paths of the faulted node. And the second one is called complexity reduction. And here we reduce the initial set lambda, or we reduce the nodes in the initial set lambda. So by ending up with a lower set of faults which need to be evaluated. But more details can be found in the paper. Okay, and finally we have the diagnosis. And here we assume that we have our golden model from the DAG D and our faulty model of the DAG D prime. And let's assume we inject a fault in the XOR gate here and inject the set fault. Then we have, or we could have two faulty output bits. And to check this, we introduce some additional layer, let's say, which is where we do not really introduce new DAG nodes, but only BDDs are created. And we create XOR BDDs between the outputs of the DAG, of the golden DAG and the faulty DAG. And the nice thing when using BDDs is that we can check the outputs really efficiently. So we can count all satisfying variable assignments at the output over all possible input combinations. And as you can see, we have the same BDD variables for the golden DAG and for the faulty DAG, so in 0 to in 6. So we have the same input variables and therefore we consider the same input vectors. And therefore we can directly check for given fault injection over all possible input combinations if a fault is effective or not. Okay. And yeah, finally, we performed a few case studies for CRAFT, LED and AES. We considered detection and correction countermeasures, which are based on linear error correcting codes. Then we considered only variate and multivariate fault models. And we showed that the optimization strategy based on the complexity reduction is effective. And all our experiments were done on an Xeon E5 CPU with 3.2 GHz. The machine was equipped with 128 GB of RAM. We used 8 threads for our tool and each thread was allowed to use up to 8 GB of RAM. And maybe let's have a look on some numbers. So first, when we evaluate a CRAFT one round or one round CRAFT design, we have 766 valid combinations here. We just used a bit flip model and injected one fault. And then our tool is very fast and can verify the design in under one second. If we inject two bit faults, we have 330,000 combinations and the tool just takes 1.5 seconds here. And this gets a little bit more complicated. So if we increase the circuit, so the countermeasure is now able to detect up to three bits and we inject three bits here and we do not use any complexity reduction, then we have 90 million valid combinations. And these combinations can be checked in roughly 3,000 seconds. And we are also able to analyze an entire AS128 round, which is equipped with a detection countermeasure. And the first design is able to detect one bit or one bit faults. And here we have 24,000 or roughly 25,000 combinations, which can be checked in 22 seconds. And if we increase the countermeasure, such that the countermeasure is able to detect two faults, we have a huge number of combinations. So if we would not use any complexity reduction, there would be 300 million combinations. And yeah, we are not able to check these amount of combinations for such a large circuit. So we can see here, we can check a huge amount of combinations, but the circuit is smaller for the LEDs design, but the AS design is too large, so the program would not finish. But if we apply our complexity reduction, we just have 56 million combinations. And then our tool day needs 470,000 seconds. But it is possible to analyze or to verify such a design. And if you are interested in our design, you can find it on GitHub on our share for security engineering site slash Fiverr. But of course, there are some limitations. So first of all, the circuit size. As you could see, if you use large circuits, our tool is not able to pass the circuit and translate it into BDDs. So this is one drawback, but we are able to pass an entire AS round. But for example, using two AS rounds would not be possible. And the next thing is a fault model, which is a natural limitation, because if we increase N and V, we would have too many combinations that need to be checked. And then our tool would not be able to finish in an adequate time here. And finally, the circuit structure. As we are using DAX as a model, as a circuit model, we are not able to pass circuits where we have a loop. So if you would like to use Fiverr and check a design, you first have to unroll the design and then you can pass it into our data structure. Okay. And to wrap up, once again, here's our flow or tool flow and our tool works on a gate level net list and we will pass the gate level net list into a DAX. Then the DAX nodes are evaluated and BDDs are created. And then we can exhaustively check all possible fault combinations over all valid input vectors and all these possible combinations are checked in our diagnosis step. And this gives us a tool which is really fast. So we can check 90 million fault injection for a single round of craft in under 50 minutes while testing all two to the power of 128 input assignments. So this is pretty impressive. Yes. Thank you very much for listening to my talk and watching the video. You can see our references. And if you have any questions, do not hesitate to contact us and we will be happy to help you with a tool or to answer any questions. Thanks a lot.