 All right, so before the next presentation, I'll continue to harass anyone standing. There are chairs up front. There's like three here, right on the edge, so you don't even have to interrupt anyone. This row has some free. So please, if you'd like, come sit down. There's a few. Come this way. Insistence isn't working as well as I hope, so. I'll continue to try. But the second presentation, we'll have leakage detection with C squared tests. This is by Amir Muradi, Bastion Richter, Tobias Schneider, and Francis Xavier Stendert. And it will be presented here by Bastion, so. Yeah, thank you, Colin. So when we perform security evaluations, we want to assure that a device does not leak sensitive information or sensitive value during the execution of the cryptographic observations. And often, these tests are performed based on attacks, for example, in common criteria. And these have the downsides that you have a high complexity. You have to choose the method, the intermediate values, and the models. Every attack has to be optimized, and so it's easy to miss an attack vector. But there's also the approach of leakage detection, which tries to use methods or general statistical assumptions to get independent of models or attack methods. So we can really treat the implementation as a black box and don't need to adjust the method to it. And the most common today is a test vector leakage assessment based on Welsh's T test. And this one uses basically two properties to reduce this complexity. So first, it reduces it to two classes. So for example, fixed versus random test. And it also applies simple statistical treatment. So we only estimate single statistical moments or multiple statistical moments. But there are also downsides resulting from these simplifications. So first, we have the reduction to two classes, which can result in false negatives. Because maybe we can get a leakage, which is too similar in these two classes, but might be detectable with multiple classes. And also, because it only depends on separate moments, maybe the leakage is spread or distributed over multiple moments and cannot be detected with only looking at a single moment. Because there the leakage is quite small, but summing it up or comparing the multiple moments, the leakage might be enough to be detected or exploited. But the square test can address these two issues. First, the square test works with multiple classes. So we can use more classes than the two in the t-test. And also, the square test is based on the whole distribution. So we can detect leakage, which is spread over the whole distribution and not only in single moments. So coming to our methodology, first, when we perform a fixed versus random test, we first sample two sets of traces. One with fixed input, one with random inputs. And from these, we compute the histograms for each point in time of the two classes. And from there, we can go to the contingency table and compute our test statistics. A good point here is that the first part is actually the same as for the t-test when we used this fast leakage assessment presented last year here by Ripperus et al. So we can save a lot of time during the computation and use the same pre-computations as for the t-test. This is one advantage of the square test. We can use multiple classes, which works basically the same. We sample more sets than the two multiple sets, for example, for different fixed inputs. And then we just get more rows in our contingency table and can, again, compute our statistics. So the specific test we use here is Pearson's She-Square Test of Independence. And this one has the null hypothesis that the occurrences of the observations are independent. And from this, we can conclude if this null hypothesis is rejected, the leakage is informative. So we might have some leaking information. This is based completely on this contingency table of frequencies and not on some estimated moments, but on the sample distribution we got from our measurements. But there's one downside. So we have to always compute the p-values for the She-Square Test because the degrees of freedom do not converge for the t-test. So we have chosen p as t to the power of minus 5 for the later experiments, which is equivalent to the usually used threshold of t equals 4.5. When we compute it, we can first build the contingency table, which is basically just transforming the histograms. Then we have to compute the expected values for each cells in there. And then the computation is very efficient. It's basically just computing the difference, summing everything up, and then applying the She-Square probability density function to get the p-value, which is the result of our test. So to test our approach, we first did simulated experiments with univariate leakage. So we basically simulated a mass cartwheel design with parallel processing of the D-shares. So our secret value is split up into D-shares, the xi, and we combined these shares with a Hemingway leakage function and added Gaussian noise so we can produce different SNR values. And when we test this with a fixed versus random test, we can see at the results that for the lower orders, the t-test actually performs better than the She-Square test. But with increasing orders in our test, order 3 and D4, we can see that the She-Square test improves and we expect that the advantage gets even higher for increasing orders. The other parameter we have to test is the SNR. And there we can see that the advantage we saw before decreases with shrinking SNR and that the t-test is not that strongly influenced by this. So we then went on with multivariate leakage, so with serialized computation of the shares or like software masking. So we did not add up the leakage we simulated. And for this, we need some kind of combination function. So first, we choose the normalized product, which is also usually used for t-test. But for the She-Square test, we additionally use two other combination functions, which is first the sum combining, which is, again, possible because we have a look at the whole distribution and not only at, for example, the means. This is the advantage that the noise is not multiplied. So it only adds up, but it's not multiplied as with the combination function before. And also we tested multivariate as soon as. So when we have a look at the results here, we see that the t-test actually performs better for multivariate leakage. It seems to cope better with a combined noise in the normalized combination function, normalized product. And also the normalized product seems to be the most efficient combination functions for non-negligible noise levels. So only when we get a really, really high SNR, the other combination functions like sum combining and multivariate histograms work better than the normalized product. We also tested this method on real hardware. So we implemented a present threshold implementation with three shares on a Sakura G board. This one has a split up S-box into the GNF function and is a byte serial implementation which uses a shift registers for the state. So when we have a look first at the fixed versus random results, we can see that the t-tests behave as expected for this implementation so we don't get a first order leakage. We have a small second order leakage and a high third order leakage. And when we have a look at the results of the G-square test, we can see that it actually behaves quite similar to the third order t-test. But at the same time, it gives us a higher confidence than the t-test. When we have a look at the fixed versus fixed, which is like one thing you cannot do with a t-test, for this test we recorded eight different fixed plane texts. And first, as a comparison, we compared different pairs of plain texts, which is on the right plot here. So we can see that the different combinations of the plain texts detect leakage at different points of the computation. But additionally, we used the G-square test to compute the G-square value over the whole eight classes. And this also detects like the fixed versus random test, this huge leakage at the beginning of the computation. But it additionally detects a more smaller leakage at later times during the computation. So we get a small benefit from this. OK, so we saw that we can detect leakage, which is distributed over multiple moments in our computation. And we also need to exploit this leakage. So we can, again, use this multi-class capability of the G-square test and then perform an attack by using it as a distinguisher. And for each key candidate k, we can compute a separate test for this. So for each key candidate, we can sort the traces into the different classes of the model, for example, a hamming distance or a hamming weight, and calculate the histogram for the classes, and then, again, perform our statistics. We can then rank our key candidates by their resulting p-value and get, hopefully, the correct key. And yeah, this really gives us the benefit of using the whole distribution. So in this case, it's similar to mutual information analysis, but it provides an additional confidence level for each key candidate. What we have to consider here is also like for mutual information analysis, the number of classes has to be lower than the number of key candidates. Otherwise, we get a bijection. And because the order is not considered in the test, all tests will result in the same result. Yeah, our results here for this present implementation are that, as expected, the first order CPA didn't work. The second also didn't work. But surprisingly, the third order CPA also did not work for this, even with 50 million traces. But using our square test, we were able to recover the correct key successfully after 28 million traces, so after half our trace set. So this really gives us a benefit of using multiple orders or using the higher orders of the leakage. So to conclude, we presented the square test as a complement to the t-test. It's able to outperform the t-test in the cases that, for example, the noise level is not sufficient or the leakage is distributed over multiple statistical moments. But you should always use it together with a t-test because there are also many cases in which the t-test works better than the square test. And so we proposed to use the t-test as before to evaluate the security order, see if you reached the order you intended with the implementation. But also use the g-square test to evaluate the noise level of the whole implementation and to see whether your leakage is supported by nothing noise and cannot be easily broken by switching to a higher order, for example. Thank you. Thank you very much. So before questions, there are still seats open. People at the back, you can come forward if you're standing and feel free, lots of seats. Questions? We had one up here, I believe. Thank you for your lectures. So I have a question. Have you ever made the cases with the x-square test that can cause the false positive? And how do you distinguish the cases with false positive and true positive and false positive? You mean that we detect a leakage we cannot later exploit, you mean? Yeah. Yeah, of course this can happen. I mean, especially when combined with noise, I mean, maybe you are able to detect it similar to the t-test. I mean, you also can detect leakage with a t-test. You might not be able to exploit in a divide and conquer attack. I mean, this can happen. So the behavior of the false positive and true positive, the square test, can you make a comparison of x-square test and t-test in the behavior of false positive and the true positive? Cannot. No, so maybe we can discuss this later offline. OK, thank you. We have another question over here, Manuel. Thank you. In the univariate case for the t-test, did you perform some preprocessing to attack the higher-order contaminators? You mean for the hardware experiment or for the simulated ones? For the simulations, you know? We did not perform any preprocessing. We just summed up the hamming weight leakage with additional noise. For the t-test? Oh, for the t-test? No, for the t-test, we also did not perform preprocessing. For the univariate, you said, right? Yeah. For the t-test against the sharing, you don't perform a preprocessing before? No, for this test, we did not perform preprocessing. OK. Great. I think there's time for one or two more, if any questions? If not, well, thank you very much. Yeah. Thank you very much. And yeah.