 I will start my presentation with a bit of motivation on why we actually need these combined countermeasures. So in a traditional crypto world, you probably have like this, you have a black box with a key in it, and you control the input and get like the output, and then you try to break it based on these. But as you all know, this is actually not the case in practice where you have like a actual device and a manipulation that runs the secret function. And in this case, you can actually do some physical attacks. So one type of physical attacks are so-called passive attacks that use so-called side-times. So you could probably measure just the power consumption or the timing of an unprotected implementation and get the key very easily. So since this is like a serious threat to any cryptographic implementation, there have been a lot of countermeasures that have been proposed in the recent years. For example, masking or hiding and other stuff. And if you actually implement them correctly on your device, then you get a resistance against this type of attack up to a certain degree in order. So you're not completely secure, but you can make it impractical for an attacker to attack you using side-times. However, since, I mean, in the physical world, there are also other types of attacks. So the other kind are active attacks. So even if you have like a perfectly secure implementation that cannot be attacked with side-times analysis, it's still possible to just inject the fault during the computation and use the faulty output to derive the key. So if you want to actually implement a secure against all type of physical attacks, you will also need countermeasure against this type of attacks. Again, as with side-times, there are a lot of different countermeasures. Most rely on some kind of fault detection. And if you influence them correctly, you get fault resistance. I mean, resistance can fault injection attacks up to a certain degree again. Currently, the state is that you have a lot of different countermeasures against side-times attacks and a lot of different countermeasures against fault injection attacks. But they have been developed independently and also evaluated independently. So if you have your side-times countermeasure, you can easily break it with fault injection. And if you have your fault injection countermeasure, you can easily break it with side-times attacks. However, in the real world, you want to be secure against both types of attacks. So what you could do is you just take one countermeasure for each type and combine it, just combine it. However, this is not the most optional approach. So this has the problem that it's either not very efficient because some countermeasures do not really fit together. So you have an overhead if you want to implement it. And in the worst case, you actually reduce the security of one of the countermeasures. So for example, if you compute a parity bit over your state and forget to mask it, then your side-channel countermeasure will be useless. And there's another threat actually, so-called combined attacks that use both side-channel and fault injection. And if you just combine it very straightforward way, you're probably also vulnerable to these kind of combined attacks. So you say, why not just use duplication with mask implementations? I mean, you could do this. It's pretty easy. You just have your mask implementation and run it twice, maybe in parallel. At the end, you check the output or maybe some intermediate states, you can check it. This is actually very easy to do. And if you only consider uniform faults, meaning that all types of faults are equally probable, this also gives you some kind of good resistance against both types of attacks. So for example, if you can only inject faults in one side of the implementation, then you will always be able to detect this kind of fault. However, in reality, the faults are usually not uniformly distributed and you have some kinds of bias in your distribution. Meaning that for example, if you have laser fault injection, then you have like a very extreme bias where you can maybe target one bit separately. And if you can do that twice, then you can just flip one bit in each of the implementation and then you cannot detect it with just simple implication. This is also true for any symmetric errors faults that can occur if you inject it. So for example, if you use clock litters or something else and you just apply it at the same time on both implementations, you'll probably have a high probability that it's the same fault and then you cannot detect it. So it's probably better to use some more sophisticated schemes. And in literature, there are actually a few that do not use like this, just use two kind of countermeasures in the hope that they're secure and efficient, but actually use like a complete system that provides security against both types of attacks. So for one, I want to note that like these dual or electric styles implicitly have like some kind of fault detection based on their duplication property in addition to their hiding countermeasure. And there's also some countermeasures based on codes using wiretap or LCD codes. However, these countermeasures usually are actually only for software implementations or only provide limited security in a hardware setting. So that's why it's still an open problem to find like a very efficient combined countermeasure that can be used for hardware circuits. And that's where we come in with our paper, which is like parity for hardware circuit, basically. But in the end, you probably won't use parity but like a more sophisticated code, but we like the name, so we stuck with it. And our basic idea is basically that we, again, in the same setting as before that we have like two parts, but instead of using simple duplication, we actually use some more sophisticated error detecting codes to be able to give some more formal bounds on the amount of, for example, bit flip setting to a cure. So if you have a code that can actually detect three faults, and we have lasers that can flip one fault, then we can detect every error that is injected with just two lasers. Even with three, it's not possible. You need actually four at the same time. And this is actually, I mean, you can imagine that the more lasers you need, the more complex the attack will be. But this, the effect is basically depends on the chosen code. Maybe if you choose like a very complex code, you need maybe a lot more from the area, but then you're also more secure. So it's basically a trade-off that you have to choose during the design process. Okay. Now, these were our design considerations when we came up with the scheme. So first, we wanted to minimize the throughput penalty for protected design, because in hardware, you really want to have like high performance or high throughput, because otherwise, if you can do it slowly, then you could also do it in software. The next part is that we wanted to have some kind of approval security and such analysis, so we could give some formal guarantees in this category. And so we opted like for masking, because like a really popular counter measure with some sound theory behind it. So if you wanted to masking hardware, you usually would use threshold implementation or something similar, because you have these glitches in a circuit that can reduce the security severely if you don't really take care of them. So we basically concentrate on threshold implementations. However, there are also recent schemes that use a lower number of shares, but I think our approach can be easily extended to these kind of schemes. Okay. Now, threshold implementation basically works like this, that you have your shared input, your shared output, and some shared function. And your shared function has to comply to certain properties. These properties are basically correctness, meaning that you have to actually compute the correct function with your shared functions. Then non-completeness, meaning that each of these component functions only get like a subset of the inputs, depending on what order of TI you want to implement. And uniformity, meaning that your input has to be in uniform sharing. If you want to use your output as an input later on, you really make sure that your output is also a uniform sharing, otherwise you need additional randomness. Now that we have like clear this aspect, the next one is that we wanted to extend this kind of countermeasure to also include some resistance against fault detection attacks. And in this case, we opted for error detecting codes because they have these nice properties. And if you use error detecting codes with side trends, you basically have like some scheme that looks like this, where you have your input, then you use some generator to generate your check bits, and then you perform an operation on your input, and in parallel you compute the predictors using the check bits in the input, and at the end you compare both outputs and see if there's an error. So for the operation, you can just use your highly optimized threshold innovation that you actually have currently, but you have to also make sure that your predictors follow these principles. So for correctness it's pretty easy, and I won't go into much detail, but we described in the paper a bit more. In terms of sound completeness, you actually have to make sure that your predictors don't have a higher algorithmic degree as in your operation, otherwise you would need more shares, and we don't really want that. So we use linear codes, meaning that our predictors don't have a higher algorithmic degree than the operation. The third one is uniformity, and to actually get this in a more efficient way we actually use systematic codes with a length that is twice the rank of the code, and in this case basically every code word has just a message padded with check bits P, and these P are long enough that the message can be recovered using only this P. So we don't have this, so we basically don't have predictors that use input, that get this input from the left side and from the right side, because these inputs are actually not completely independent and would lead to problems with uniformity and you would need additional randomness to actually solve this problem. So in this case we actually can just compute on the input separately and on the check bits separately, we don't really need any additional randomness to achieve uniformity. Now that we have this part, we have to make sure that both kind of countermeasures are combined in a secure and very efficient way. So our basic structure of our scheme is basically like this. On the left side you have just your normal TI target algorithm and on the right you have your threshold limitation of the predictors. They run completely independent and the only joint function is in the middle of the error check. You can even skip this if you just do an error check at the end, then you don't really care if it's a joint function. And actually this is a very general structure, meaning that even the duplication that we've seen before is just a special case of doing a 2kk2 code, meaning that it can detect one error, but if you've seen before it cannot detect two bit errors. This kind of structure is actually very nice because it allows for further optimization. So for example you could reuse some of the randomness, but given the time I won't go into much detail here and you could just read it in the paper. I just want to mention one detail since it's very important. Since the error check is the only joint function, you actually have to take care of it to make sure it's actually secure. And there are actually two cases. So in the first case you just generate your check bits on a shared input and then you make sure that your predictors exactly match the target operation function and you reuse the same randomness as always. And just make sure that the first share of your message and the first share of your check bits are actually valid, I mean your check bits are actually valid for this share. So you can check each share separately, meaning you don't have to combine anything. This helps also again a lot of these combined attacks because you don't have to combine anything during the error check. But sometimes you also want to optimize your predictors very hardly and very strongly so you get a very, very small implementation. So in this case you cannot actually check the shares separately, but you have to combine them. To avoid the propagation of glitches you actually have to combine first two shares then put a register in between and then combine them in the first order case where you have like three shares. But I mean in higher order cases it's probably more complex but you have to take care of that otherwise you would leak information. Now that we've taken care of everything basically we evaluated our methodology in some kind of attacker model where we bounded the passive attacker by the order he can attack so he can measure a finite number of executions because it's just enough to perform a de-order attack button on D plus 1 and the active attacker is actually, it's actually possible to inject faults in the data pass and not the control pass because the control pass actually usually do not have to be masked so there's no requirement for like this combined countermeasure and you should use some asexual approaches for example you can use one that is based on error detecting codes. Now this injected faults we actually modeled there's some kind of error variable that follows specific distributions and since a lot of papers in the past considered a uniform distribution we also considered but we also considered some more bias distribution. Now the result is that we actually, I mean as we've seen before everything is TI so we get these TI properties on the whole design so we get some security at a certain order. In terms of fault injecting resistance it strongly depends on the chosen code and it basically, the higher the distance the more errors we can detect and we measure the effectiveness with the so called fault coverage with basically the proportion between faults that can be detected and faults that cannot be detected and in the uniform setting we can actually easily compute it because it's independent of the distance so in this uniform setting we actually just ask for this application however for more bias distribution we can actually perform better for lower number of bits and we can give like an actually bound on how many faults you need to actually inject the faults that cannot be detected which is actually nice I think. Now we also applied this methodology at an example using the LED cipher so LED is just a cipher with a 64 bit state and we use the 128 bit key variant and it has this present S box which can be efficiently implemented as a transfer implementation and this row column wise I think transformation that is also very efficient with our scheme in comparison to the linear layer of present for example we implement it as a first order transfer implementation and use the extended Hamming code with 844 properties okay now when we compare to just simple lubrication we see that in terms of area we actually like around 12% bigger because our predictors are somewhat larger than our just normal TI however after the final version was submitted we also did some further optimization and we found a better implementation which is actually just only 4.3% larger which I think is still reasonable okay in terms of throughput we actually better because we have like this additional resistor stage meaning that we do not lose any performance in this area okay we also did some practical side channel invalations just to see if we actually implemented it correctly and we used for this we just used a non-specific T test with 100 million measurements as you can see in the first order we actually do not see any leakage because we are like in these two thresholds and as expected for second and third order we actually have like a really high leakage okay against fault indirectly we actually just look at the coverage at the fault coverage rate for different types of errors now as we've seen before I mean in the uniform fault model we get the same one if only one code word is affected and then we looked at faults with different having rates going from 1 to 7 and basically I only showed like to 5 because at the end it's basically the same and as you can see our scheme actually gives us full fault coverage for up to 3 bits and for 4 and 5 it's slightly worse but if you want to inject it like with a laser fault injecting it's probably harder to do it with 4 or 5 and this is also just the best case where only one code word is affected if I mean since we have like around best architecture it is probable that all of the code words are affected at once if you use like clock or power glitch in this case the fault coverage actually very very high even for the simple application but in this case we actually can assume that that the errors may not be completely random and that the same errors could occur in both parts of the implementation which would reduce the fault coverage of the simple application significantly now coming to the conclusion we actually presented a hardware counter measure against both side channel and fault injection text which is based on transformation and error detecting codes we identified as a custom of a sterile interest of an efficient and secure combination of the both parts which can then be applied to arbitrary cycle basically with an adaptive level of security and we showed that using a case study of LED however there is just a two warts paper so there is a lot of future work to do basically meaning that we are currently working on practical fault evaluation to actually profile this bias that can happen in the error distribution and choosing code based on this to get a better fault coverage also it is interesting to extend it to the more widely used AES however there is this problem that AES cannot be very efficiently implemented as a traditional technology because you need a lot of randomness for the S-box I mean this is an aspect that was somehow not solved but tried to solve at a chess paper this year where they proposed strong adverse S-box that can be efficiently masked in hardware so it's possible to also extend this approach to give both resistance against fault and side-chain attacks and one last point is that I mean basically we also did took like two counter measures and combined them but we made it very efficient and secure but it's probably nicer to have an efficient masking scheme that integrates some kind of fault check just by design okay that's also it for me thank you for listening