 Yeah, the second talk is given by Olivier Bronschrin, again the French name, sorry. The same story as always. The paper is titled Multi-Toppel Leakage Detection and the Dependent Signal Issue. The paper is written by Olivier Bronschrin, Tobias Schneider and Franz Fassavier's standard. And as I said, Olivier is going to give the talk, please. Okay, so thanks for your introduction. So what's the table of contents for this work? So I will start with a small introduction about, I mean, oh, we try to evaluate security of device in front of side-channel attacks. Then I go to what we call leakage detection and mainly to violate that you already heard about this morning. And then I go to the core of the work, which is multi-decage detection. And finally I summarize everything as my conclusion. So what's the point with side-channel is basically when you try to encrypt data, you need to do that on a physical device. And of course you give plain text a key, you get a cipher text, but you also get some physical signals because you are dealing with physical systems. So this signal can be, I mean, we heard about that already, so EM, power, time and so on. But the question is like, can an engineer with a dog get back the key? So that's what we are worried about, I think, in this community. And yes, the thing is that these side-channel attacks, they are hard to prevent. There is a bunch of work. We saw that this morning. And finally, I mean, evaluate this kind of device is also our task. So now I go briefly to two type of approaches, what we call attack-based evaluation and leakage detection. So let's start with the first one. So when we try to do an attack-based evaluation, we start by collecting measurements. And with these measurements, we will try to get back a subpart of the key. So you collect measurements, then you try to perform the attack, and maybe you get back the key. The thing is that this kind of approach requires usually a long time of measurements. If you are dealing with a highly protected device, you will spend weeks in taking measurements. Then you have to, I mean, if the device is protected, it's unlikely that you will be able to run automatic attacks on that, so we'll need some expert knowledge. And finally, but more on the statistical point of view, when you try to run an attack, you have to distinguish one class out of 256 if you are targeting a bite. So that's a solution. Another solution is what we call leakage detection. And here the goal is no more to mount an attack, but more to find relationship between the internal data manipulated by the device and the physical signals that we get. So how do we run leakage detection? Basically, you have your crypto core here, and you feed it with two sets of inputs. So let's say the first set is composed of plain text one and key one, and the second one of plain text two and key two. The goal of these two is to generate different data within the device, and yeah, we'll see afterwards. So you feed the core with these two sets of inputs, and then you collect the corresponding measurements. And the goal here of leakage detection will be to, I mean, try to observe the difference between these two sets. If you have found the differences, this means that it comes from the different internal data within your device, which might say that an attack is possible. So how does it compare with the previous attack-based evaluation? Usually this kind of approach is easier to implement, so you don't need an expert knowledge. And usually, I mean, it requires less measurements, because you don't try anymore to distinguish one class against two to the 56, but just one class against another, which is usually an easier problem. Okay. However, the drawback of this leakage detection is that, I mean, it's a good fortress for your security analysis, but it's not the golden grain. I mean, there is still a risk for a full positive, full negative. What happens if you do not, I mean, if you cannot find leakage, is it because, I mean, you didn't take measurements enough or because your device is actually leaking? Actually leaking. So these are still questions. So I say that the goal of leakage detection is to find differences between the two set of classes, but how do we do in practice? So let's say here we have the two set of traces, so the blue one and the green one. And what we will do, we will try to, I mean, we will select a point in time in both of the traces. So that's the first step. You select a point in time, and then you will start to collect measurements and observe the distribution at that point in time. So that's what is represented by the histograms there. So you observe the distribution, you are searching for difference. So we will have a statistical test on that. The statistical test basically says you if the device is leaking or if the device is not leaking. So we perform, I mean, we observe the binary output of the test, and maybe you didn't add enough measurements to spot the difference. So you take more measurements, and once again, when you have a fine understanding of what's the distribution, maybe we'll find differences. Okay, so I said that we have to do a test, but what test to choose? Basically it depends on what you are searching for. If you are searching for different means, you can run the Welsh T test. If you are searching for difference in distribution, you can use the key square test, which was proposed last year at chess. But basically the most used test is the Welsh T test, and it's used in what we call the TVLA procedure, that I will describe next. Okay, so what's TVLA? We had examples this morning about papers using the TVLA, but basically you take all your trace and you will perform the same analysis as before on all the time samples. Okay, so that's the first step, you select all the points in time. Once again, you observe the, and you record traces, and you observe the distribution. Then you perform independent tests on all these time samples, and once again, you observe their binary output. Okay, so maybe at some point we find linkage, but the question there is, when do we say that would be a form linkage? Basically we set what we call a threshold. We've seen an example this morning about how to set the threshold, and this threshold is given according a few parameters that are basically the desired confidence level. So I mean, do you want the p-value of 10 to the minus 5 or 10 to the minus 2? I mean, it may change something. It's the number of considered time samples, so the trace lengths that you are recording has also an influence on that threshold, and finally you have to assume independence to set that threshold soundly. All this leads, I mean, to some limitation of the TVLA procedure. So what are the limitations? First, TVLA is performing independent tests on all the time samples. Because of that, I mean, it will never be able to spot multivariate leakage, and that's maybe make it less faster, so maybe you will need more measurements than if you were performing multivariate tests. Another point is you set the threshold and you do all your analysis by assuming independence in the signal and between the tests. But because you are dealing with a physical signal coming out of a chip, it's not likely that everything will be independent. I mean, if you have some capacitive effects, you have, I mean, it shifts slowly across time. So there is no independence usually. And therefore, you base all your analysis on a wrong assumption. So maybe the conclusion you will draw out of that may be a bit hard to interpret. Okay, so that's the limitation, and that's why we came up with what you call multi-toppel leakage detection, which is the core of our work. Okay, so what's the approach? I guess you might have some insight about it now. So we took all the independent t-test that's giving different outputs, and we replaced them by a single one, by a multivariate statistical test. So what test to choose, once again, if you open a textbook about stats, they will tell you that you have to use the Otterling t-square test. It's a natural extension of the Welsh t-test, but two multivariate settings. The nice thing with that, it does not assume independence of sample across time. So we want a base of analysis on maybe a bit tricky hypothesis. But the counterpart of that is the test is harder to perform, in the sense that you have to compute a covariance matrix, and you have to invert that. So if you are dealing with super long traces, it will be hard to compute that matrix and to invert it. So it's not always applicable. Another solution that we came up with, if you cannot compute that matrix, is what you call the d-test, which is more heuristic, and which is basically an extension of the Otterling t-square test, but assuming independence. So you don't need that huge covariance matrix. And once again, it's because you are basing your analysis on how to confirm hypothesis, the result are harder to interpret. Okay, so now I go to some parameters that you can play on, and that will influence the data complexity of your leakage detection. So the first one is what we call density. Basically, it's the proportion of points that are leaking within your device. And you can also see that as the number of points that will leak where you will spot leakage if you add an infinite number of measurements. Okay, so here is an illustration. So we have a trace on the top, then we perform a t-test with an infinite number of measurements in all of them. And the green boxes are the points which are leaking, and the red boxes are the ones which are not leaking. So here we have a load density equal to 0.1. And if we increase density, the proportion of points leaking are increasing. So in what scenario are you facing what density typically? So if you are going with a protected software, usually you have a low density. Maybe you are spending a lot of time in generating randomness and so on. So your traces are very long and all the operations are not manipulating some sensitive data, so you likely get a low density. If you are dealing with hardware or protected hardware, there more usually you will process something at every cycle, so you will get a higher density and shorter traces because the computation takes less time, usually in hardware. Okay, these are parameters that influence the detection. But what about numbers? So in the paper we have some simulation and practical experiment. Here we will focus on some simulations. Okay, so I will show a graph there. On the x-axis we will plot the density, which is the thing I just mentioned before. And on the y-axis we will have the number of measurements that you need to use to perform your detection successfully. Okay, so let's see how it looks like. So on the right you have high density, and on the left you have low density. So the first thing that we can observe is that both methods are from that. If you are on the right, basically multi-tapel detection outperformed TVLA with a factor of five in terms of number of measurements. But if you have a low density, so on the left of the graph, there is the other way around, this TVLA that performs better than multi-tapel tests. Yeah, so multi-tapel tests suffer more than the TVLA. Okay, so the main message here is that if you reduce, I mean, you can reduce the data complexity of your test by trying to have a higher density of leaking point within your traces. Another parameter of influence is what you call the trace length, so the number of points you want to jointly estimate. And once again on the x-axis, the trace length, and on the y-axis, the number of measurements. Okay, we see that. So on the left there is, you just do it univariate, so you just consider one sample in time, and on the right is a million. And we see first that both methods take advantage of longer traces, and this is basically because, because it's more likely that you will be able to see a point that is leaking a lot. But here, we also see that multi-tapel tests get more advantage of longer traces. And at some point you have a factor of five, as previously, four, as previously. Okay, so if you are dealing with longer traces, maybe you will be able to reduce the measurement periods. But there is quite an even message that here is that, as I said previously, the autolink test, you cannot do that on very long traces because you have to compute that covariance matrix. But, yeah, so you cannot process traces of that length. Yeah, and in the paper, we came up with a solution which is basically you run multiple autolink key square tests in parallel. So that's also heuristic, but it performs well. Okay, so now in practice, I mean, we will be in an evaluation scenario in two different settings. The first setting will be white box, where we know everything about the design, and we see that we can play on some parameters there. And black box designs, where you know nothing about the implementation. Okay, and so how do we perform leakage detection in such scenarios? So the first one in white box, basically you have everything about the device, so you have prior information about which points are leaking within your traces. And because you know that, you will be able to remove all the useless points and just keep the ones that are of interest for you. So you can reduce the trace lengths. Because of that, you can just keep enough to invert the covariance matrix, and you will get an identity because you'll never consider useless points. As a result, you are in the best cases to perform a leakage detection, I mean, according to the previous graph, and you will get usually a smaller measurements period. And additionally, the analysis of the results will be easier because you don't rely on independent assumption which is not always fulfilled. The other way around, if you are in a black box scenario, there you know nothing about the implementation. So you cannot reduce the traces anymore. You cannot select the points of interest for you. You cannot reduce the traces. And if you are dealing with long traces in a protected software, as an example, you will not be able to invert the covariance matrix. So from time to time, you will be dealing with your traces, so you can do that, but it's an example. Okay, and also you have a fixed density. By fixed density, I mean once again, you cannot remove the useless point, so you cannot increase it. As a result, you are not in the best case for the performances of leakage detection, and so you will possibly have a larger measurement period. And you need to rely on independent assumption because you have longer traces. Okay, so you have to rely on your statistics. In the paper, we show that if you are using TVLA, typically you will be too conservative, meaning that if you say I have a p-value of 10 to the minus five, actually it's maybe way smaller. So yeah, you are too conservative. And for the D test, so the other statistic we propose, that is the other way around. So you are too optimistic. And typically, I mean, the true confidence level you should have is between the two. Okay, so time to wrap up with a small conclusion. So as a conclusion, I would say that physical signals are not likely to be independent across time due to capacitive effects, as example. And if you are able to run a uttering T-square test, it provides many features. So first, I mean, you have a straightforward implementation of interpretation of the results because once again, no independence hypothesis. And sometime, I mean, you will be able to detect faster. But I mean, the counterpart of that is when you have a trace with TVLA, you can see which points in time are leaking. So you'd see left that information and here we don't have that information anymore. If you cannot do that independence, if you have to do that independence hypothesis, there you have to rely on heuristics. So either you do TVLA where you are too conservative and D test where you are too optimistic. But I think it's time, I mean, the message here, one of the message is that if you are doing a whiteboard analysis, you can do many things. It's easy to deal with. If you are in a black box, if you are in black box settings, it's much harder to evaluate correctly the device. So thank you very much and I'll take any question. Thank you. We have time for one question. Yeah, thank you for the presentation. About the figures you're presenting in slides, 13, 14, 15. I have a question. If we fix for a same number of measurements and the same density, how do you compare the quality of the T square, T square test estimation and the quality of the auto-links T square test? Okay, so if I get it once, it's where would you put the T square test? Yeah, if I look at your figure, so if you just, I'm fixing the x-axis, the value of the x-axis and the value of the y-axis, so I see that I have two tests. So one is TVLA, the other one is a multi-tupor test. But how can I, I don't know if this is really an open question, how do you compare the quality of the TVLA test and the quality of the multi-tupor test? So it's what we call the beta factor, which is, I mean, it's one minus the probability that you don't detect why you should detect. And that's with a fixed beta. So it's as if you, I mean, with a fixed success rate in your... Okay, so this is, yeah, okay. So this is for common and fixed success rate? Yeah, okay. Completely. Okay, thank you. Okay, thank you for the talk. Let's thank the speaker. Yeah.