 Welcome to the stock, I am Shayan Dev Shah and I am going to present our work called Fault Template Attacks on Block Siphers Exploiting Fault Propagation. This is a joint work with Ornabh Bagh, Devapriya Basurai, Shikhar Patrono Beach and Professor Devjit Pupapadthai. All of us are from Secure Embedded Architecture Laboratory, Indian Institute of Technology, Kharagpur. Today is paramount in modern days and to enable secure communication, we need cryptographic primitives such as block ciphers, hash functions, public primitives like RSA, PCC, post quantum systems, etc. While most of the systems or cryptographic primitives are mathematically secure or wide we believe to be secure, it is not in the case of implementation. In fact, mathematically robust cryptoprimitive is just the beginning of an end-to-end secure communication. In practice, implementations may leak secret information which can enable an adversary to perform several class of attacks. Among these implementation based attack classes, the most prominent words are side channel analysis. Side channel analysis exploit the fact that the power consumption or electromagnetic radiation or acoustic signals coming out from a chip is correlated with a computation going inside and using this fact a passive adversary who is just observing these power signals or EM signals can extract the secret within minutes. In a very similar manner, an adversary can deliberately inject a fault in the computation and exploit the faulty responses to extract the secret. In this talk, we will be mostly talking about fault attacks. And most of these attacks like fault attacks or side channel attacks are so powerful that minor modifications in the implementations cannot actually prevent these attacks. So one need to dedicate separate countermeasures and over the years people have developed several countermeasures both for fault attacks and side channel analysis. In this work, we evaluate whether all these countermeasures work again faulted all classes of fault attacks or not. So as I mentioned in the previous slide that injecting fault in a device during encryption and decryption and analyzing the faulty response gives you the secret. Now there are several ways by which one can perform a fault analysis attack. The most popular one is called the differential fault attack where the adversary can encrypt the same plain text twice, once with fault, once without a fault and then can analyze the differential between the correct surface text and the faulty surface text to extract the secret. Now interestingly in this class of fault attack, only a few ciphertexts are required, faulty ciphertext are required and in some cases the adversary can attack only with a single faulty ciphertext. Also, the assumption over the fault is very minimal. For example, you just require a single byte fault, enable fault or maybe a multibyte fault to extract the key even from ciphers like AS. However, in practice we gain something more, in real devices the faults are highly biased. By biased I mean that over the entire possible fault space, only some of the faults occur again and again repeatedly and many of the faults never occur. Now using the statistical bias in the distribution, an adversary can launch several other classes of attacks. For example, statistical fault analysis or statistical ineffective fault analysis where the adversary is allowed to inject several faults on several different plain text and gather the faulty or correct ciphertext and analyze those ciphertexts to get the secret. There also exists another class of fault attacks where the adversary might not require to know the ciphertext, he just need to know whether the ciphertext is faulty or not. And exploiting that information as well, he can clearly extract the secret key. Some of the prominent member of this class are the safe error attacks, fault sensitivity analysis and design fault attack. Now when there exists attack, there also exists some countermeasures. Most of the fault attack countermeasures uses some form of redundancy in computation to detect faults. Redundancy can be two-way or in-way and it can be applied at every round or end of the entire computation. One can also implement a redundancy temporally like for example, one can increase the same plain text twice and compare the results and then output it if both are matching or it can be implemented in space special domain like there will be two redundant branches computing parallelly. There can also be information redundancy like using error correcting or error detection codes. If a fault is detected during the computation, no ciphertext is written or alternatively one can return a randomized version of the ciphertext or complete random string. But in general, most of the countermeasures existing follows this principle. Now what kind of security guarantees does the redundancy provide? The answer is if you can inject a fault in one of one or two of the redundant branches, then somehow the fault will get detected and it will get enough. However, you can still attack in principle if you can inject fault and equal value fault to be precise in all the redundant branches. However, if the degree of redundancy is fairly high, then the probability of injecting same valued fault in all of the redundant branches reduces. So it provides some sort of practical security if you assume the fault model to be like this. However, as we will show that this security is not sufficient. In this context, we present fault template attacks. Unlike the previously proposed fault attacks, here we have an extra step that is the profiling technique. The idea is that the attacker may have complete access to a device which is very similar to the device he wants to target. And he can extensively analyze this test device and construct and gather knowledge over that which is called as template. Now template building can be done in several ways. From this specific paper, we have utilized the fault propagation through digital circuits for constructing templates and use this template for successful peer recover. Our attack does not require any cipher text access, not even a direct access to the plain text but in some cases we need to keep the plain text fixed. We can target middle rounds of block ciphers. The attack also works in the presence of both SCA and FA countermeasures. Note that previous attacks which does not require cipher text access or can target middle round where safe area attack, fault sensitivity analysis or blind fault attack. However, the popular side channel countermeasure masking is a countermeasure for these attacks as well. However, our attack works in the presence of both faulted countermeasure and masking. The other attacks which works in the presence of both of these are CFA and SFA. However, both of them require fault access to the cipher text which we do not report. Fault template attacks have two phases. In the first phase, the adversary is assumed to have a device in which he can set the key. He can also have the knowledge of the randomness that is being used inside and using this device he can inject faults in several locations and construct a template. Now, what is the template? Template can be described as a mapping where the domain set of the mapping consists of a function over fault locations as well as the observable that the attacker is observing. People can do many things, side channel signatures may be, in our case we will be using the fact that whether the outcome is faulty or not and it can extend to other things as well. The x part that is the range set of this template is some part of the secret, it might be a nibble of the secret key or a nibble of the intermediate state whatever it wants. Now, once the adversary has this template and if he gets a device of similar kind, he can simply injects faults at some pre-specified locations in this new device for which he does not know anything and by referring to the template he can extract the secret key. Now, as I mentioned in this specific attack, we have constructed template using a property called fault propagation. Fault propagation and fault activation are fundamental properties of digital circuits. So, with the size, when you see a fault in the outcome of a digital circuit that is driven by two consecutive events. The first one is fault activation that is the generation of the faulty computation at the fault injection point and second one is the fault propagation that is how the fault reaches to the output of the circuit. Let us explain this by means of this surrogate. So, if we inject a stuck at zero fault at location A or wire A, if the value in the wire A is zero then the fault has no impact, it does not changes the output. However, if A is 1 then the fault gets activated that is the value of A will be altered and one can see the fault impact in the output of the circuit. Note that in this case the output corruptibility does not depend on the other input of the surrogate. Whatever value this other input carries, the output will always be corrupted. This is not the case in the case of AND gate here even if the fault is activated it might not propagate to the AND gate output because in this case it depends on the other input value also. If the input is 1 the fault will propagate otherwise it will not. Now fault propagation and activation is data dependent. How? You can see that in the case of surrogate if somebody is injecting a fault at some point the data dependent activation of the fault will only lead to a faulty outcome. So by seeing whether the outcome is faulty or not the adversary can understand that the point where he has attacked what is the value at that point. In contrast in the AND gate if you see this example the fault propagation in the output necessarily means that this input has value 1 also this input has value 1. So basically the fault propagation of the outcome results in the leakage of two values A and B. Now one can easily extend this fact for any communicable circuit and as an example I can show that if one observes a faulty outcome here and the fault injection location is this point then he readily knows that B equals to 1. In contrary if some fault injection location is set here and the adversary can observe a faulty output here he readily knows that A equals to 1 and C equals to 1 but he has no idea about the value of B. Now let us see how can we exploit this fact to attack a real implementation in this case we considered a block cipher present with fault data countermeasures and no masking. We also assume no cipher text access and the thing that only attacker knows whether the output is faulty or not. So also allowed to inject the attacker only one fault per encryption and we also ask him to keep the plain text fix. Now if the attacker targets the S box more precisely one polynomial of one output of the S box and injects a fault and this sort computation the stuck at manner he can only observe a faulty outcome at this point or globally at the output of the cipher text if x1 equals to 1 sorry x1 equals to 0 and he would not see any faulty outcome if x1 equals to 1. Note that the fault activation is playing the role here because insurgates the fault propagation does not depend on anything else. In a similar manner he can activate other faults in other location and eventually can extract all x1, x2, x3, x4. Now he can compile the knowledge of the outcomes of this faulty outcomes whether it is faulty or not and can extract the nibble of the key that is x1, x2, x3. Now this entire knowledge can be compiled in the form of a table that is the template for this four fault location as it is shown here and each of the entries in the template actually leads you to one of the key nibble. Using this templates equally one can extract an inter intermediate state and extraction of two inter intermediate states results in the recovery of the 6th key through a middle round attack. Now this templates can also be used for performing an in round attack like if you have the plain text access or if you have the cipher text access the attack might be a little bit easier in the sense that you require lesser number of faults and lesser number of executions but the attack principles will almost remain the same. Now the question is whether this simple attack still works on masking. As I already mentioned masking is a counter measure but it still works for certain fault attacks like BFA or safer attack. The main idea of masking is to randomize the power consumption and thereby throttling the attacker from doing attack. So masking requires first randomness at every execution the idea is to split one single value into multiple shares such that the sort of all the shares lead you to the actual value. The functions that are going to be operating on those values are also shared in component function and each of this component function can work on those shares separately and the outcomes of all those component functions is solved will return you the actual output. Now this functions splitting is quite trivial in the case of non-linear function sorry linear function because in those case each of the component function will operate on only one single share and can give you the output quite clearly but in the case of non-linear function each of the component function should work on multiple shares and the designer must be careful in this scenario because the combination of multiple shares will lead to potential leakage. So this leakage has to be carefully handled and in practice there exist several successful methods which can perform optimal sharing for non-linear functions. Now let us see how FDA works on masking scheme. One thing is quite clear that faulting linear terms in the logic does not work in this context why because effectively if you are faulting a Zorgat input you are only gaining the information about that specific wire however in masking the information about one actual beat is shared among different shares that is different wires. So just by injecting fault at one single wire you may not get information about all the wire and that will lead to a failure of this attack but we will show here how to work in this context. Let us consider the simplest example a mask and gate where x is shared into x naught and x 1 and y is shared into y naught and y naught. Now here we exploit the fault propagation feature of AND gate how? Let us assume that we inject a stack at 0 or even a bit flip fault at x naught and let the fault propagate through the both of this polynomial. Now if the fault comes at the output of q naught then that means y naught is 1 also if the fault reaches the output of q 1 then y 1 is 1. Now the actual output which is the sort of q naught and q 1 is only faulted if y naught sort y 1 equals to 1 and it is correct if y equals to c. So effectively by observing the actual outcome unmasked outcome you can readily understand whether the unmasked input is 0 or 1. Now you are not observing this unmasked outcome at the end of the AND gate because AND gate might be a part of a larger circuit but due to the correctness property even at the end of the encryption if you observe a fault and no fault occur in any other places then you can readily understand that this fault corresponds to the fault you have made in the input of the AND gate and that will readily give you the output the unmasked outcome bit of the input bit of the AND gate. Now let us see how can now we extend this idea for attacking actual implementations. In this context we consider a faster RTI implementation of present with three shares. The S box is divided into two sub functions f and g and in this attack we specifically target different fault locations in the sub function f. Now the sub function f is already shared and here I have shown the shares correspond to a specific output bit of f naught. So f naught equals to f 1 0, f 2 0, f 3 0. Now if I inject a fault at x 2 0 using the similar principle described in the previous slide the fault will only be propagated to the output if the sort of this tree that is equals to x 2 equals to 1 and otherwise the fault will never propagate to the output. Now using this fault location along with three other fault locations we can actually extract the input of the f box and that leads to a template with 16 possible values and using this template we can perform a middle round attack on a mass implementation of present with fault counter measures. Note that in practice we may not get all the injection correctly and there will be noise in injection however we have seen that this noise noise can be circumvented by just increasing the number of observations that is the number of injections, number of execution and you can still do that. As a practical experiment we constructed a faster TI implementation of present in hardware one configuration of it results in a faster secure cycle secure implementation and we target that implementation. Our target platform was Sakura G2 and we have two boards of same family. We inject EM faults in this context which are non-invasive that is no de-packaging is required it is a bit level and it is quite highly repeatable that is same fault can be repeated with high probability and most interestingly reproducible that is the same parameters which we use for injecting faults in one device whether I will see in a very similar fault injection in the other device as well. For example, consider this table for one fault in for one nibble fault injection here if we just repeat this parameters for another Sakura G board with the same bit file embedded inside it it results in very similar faults and this really helps us to construct the template based attack. We construct the template on one device and test the template I mean do the thing actual attack on another Sakura G board and we are successfully recovering the keys in around 3000 to 4000 from fault injection in total. So, also tested open source software implementation of AES with fast order masking where the AND gates were implemented in ISW manner. We simulated bit flip and stuck at faults and this case our target was an AES implementation we found that for the S box we require 16 different fault location for constructing the templates and some of the entries of the templates might result in multiple I mean suggest multiple values intermediate values but overall the attack complexity remains quite reasonable and you can recover the key within a minute. To conclude templates based fault attack is a powerful form of attack which can target combined counter measures middle middle round attacks with no ciphertext access and in this work we have shown template building mainly with fault propagation. However, we would like to point out that template building can compile information from several other sources such as different or several different fault location and different clock cycles and in this context as a potential future work would like to analyze CIFA counter measures which has been recently proposed it will be really interesting to see if CIFA counter measure also works for template attacks. However, we would like to point out that template attacks are fundamentally different from CIFA in the sense that they do not need any ciphertext access and also template attack can combine information from multiple executions of the cipher of the cipher under the influence of fault. So, there is a hope that this attack might bypass some of the counter measures. However, it requires further experiment to comment on this fact straightforward extension of this attack is also possible for hash functions or MAC like unauthenticated implementation authenticated encryption. We would also like to extend this attack in future for public implementation so here by I conclude our talk if you have any question you are encouraged to ask that during our live session. Thank you.