 So indeed, this work actually is from two guys from the security domain. So Eloua, who was the PhD, actually doing the work, he could not attend, and myself, and Olivier and Pablo, which are in the field of information theory. So this is the context. Five years ago at chess, we introduced a framework to derive some results regarding side-channel analysis. So it was at that time, Annelie, who introduced a way to model the side-channel analysis this way. We have some secret, like a key, and this key will be leaking somehow through an algorithm. So there is some plaintext or ciphertext, which will be mixed with the key. And then some leakage function, let's take the example of the hamming weight, you see. So like this is maybe the add-on key, S-box, the hamming weight. And then we have some noise, which is added when we do the measurement. And the job of the attacker or the evaluator is actually to extract the key, to estimate the key using some decoding function, let's say. So clearly, this is not meant to be an ideal communication channel because there might be some countermeasures there. And the noise is something we don't know. But what is pretty nice is that with this framework, we can rely and leverage on coding theory and information theory. So you see there is a direct relationship between the notion, like for instance, decoding will be a distinguisher, actually. So at that time, this was the concrete framework we introduced. And we derived many results, actually, like the optimal distinguishers in many situations, including masking, including collision attack, et cetera. Next, actually, we wanted to make this framework more general in a view to derive some results which can apply to more contexts. So now, actually, we still have the key. We try to guess the key here. But now the encoder is pretty simple. We make no assumption. So what's the setting? Like, for instance, you are in a test lab, or you evaluate your implementation. You will take Q queries. So in bold, it means that it is several measurements. So the intermediate variable, the sensitive variable here, which is leaked, is denoted Y. So it depends on the text. And the text is also known by the attacker. And so the attacker can take X and T and try to do his best to recover the key. So we make no assumption on the channel. The noise can be whatever distribution. And there can be some countermeasures. We completely abstract them away. You see in the encoder, there can be some random variables which are unknown by the attacker, like shuffling, masking, these kind of things. Just to show in practice, so the key is in this device. The measurement and the noisy setup is the probe here. And what we get, and the attacker will measure this thing and do some analysis here. So apply the distinguisher to enumerate all the keys, all right? So from that time, five years ago, we know that the best distinguisher consists actually in maximizing the likelihood, okay? So this is a way to recover the key with the best success probability. Also, we call that in the chess community a template attack, okay? And it can apply to many cases, for instance, humming wait I was mentioning. Or can be, if you have M, the mask, which is unknown. So the variable which is leaked is this pair, you see? Many attacks were suggested. And actually, there is an effort now also to do some clever attacks using artificial intelligence, for instance. And so in this context, it's a bit difficult to know what is the best attacker and to take some margin from the attacker, you see? What we know is that the attacker will eventually succeed for an infinite amount of traces, okay? But the question is how fast? Okay, so this is a question. We will assume the best possible attacker. So we make little assumptions just that actually the attacker knows everything. But the key, the noise, and if there are some conter measures, okay? This is unknown. It is not even modeled, you see? And so the question is, what is the least number of queries Q such that actually the probability to recover the key is greater than a given value, which we'll fix, okay? 99%, 99.9%, something like this. So why is this important? Because we know that no attacker will manage to extract the key with fewer than Q over PS traces, okay? That's, and actually we will not even be sure that if the number of trace is more than that, that there is an attack. Maybe, maybe not, you see? It will depend actually on the way we will compute this bound here, okay? So now I'm coming to the results, okay? So first of all, I just need to show to you a couple of notations so that you can understand our theorems. But it's pretty basic, so notions from information theory. The Shannon entropy H, like for instance, we will consider a key of n bits, okay? We write H2 when the random variable is binary, okay? It has only two values. So in this case, we denote this is the binary entropy, okay? So in bits, for instance, but binary meaning that we have only two alternatives. One with probability P and another one with probability 1 minus P. Then we have the Kullbach-Liebler divergence, okay? Same, it can be in binary case and in this case, we have this expression and so the mutual information between two random variables is simply divergence of the probability of the joint random variables and the probability of each variable taken alone, okay? We have also conditioning by variable, okay? So this is all the notations I will be using and I will support actually the results on one theorem from information theory, which is the data processing inequality. So it states that if we have a random variable which is processed to give B, then C and D, et cetera, we lose information while we process. So for instance, the mutual information between the inner variables B and C is more than between A and D, you see? Same for a kind of relationship for the divergence when actually we process in the same way the two laws here, okay? So we can notice a few things. For instance, if we are interested into the mutual information between the real key and the guest key, so it's a divergence like I mentioned and now we will apply on those variable this function so you take the distributions of the two keys. So usually the distribution of the key is uniform. Hopefully the distribution of the guest key is not uniform. It's going to be deterministic because you wish to recover the key, okay? With the probability of success as high as possible. So this distribution will be extremely biased in case of success, you see? And so we apply this function which just computes actually a binary variable which is equal to zero or one depending the key which is guest is the good key, you see? And so in this case actually the divergence we get is a binary divergence, okay? So we can compute it like that and we get an expression which you see you have the probability of success. This is one minus the probability of success or the probability of error. And actually this is well known. It is a Thanos inequality. It applies to any random variables, okay? It's a very generic result. Now we want to plug that into our framework and so we have a Markov chain here. So the key and the guest key but we also have the noisy leakage and the leakage before the channel, okay? So we have this inequality. This is also the data processing inequality, all right? So now you have this relationship. I will plug it here and it yields so our first proposition in the paper that you have this relationship and so this is bounded by a mutual information. You see the bold letters are only here. So this means that actually this quantity depends on the number of traces, okay? And now we have all the ingredients to try to see how many traces should the attacker use to break or at least to find some bounds on this quantity, okay? So example, when there is no measurement at all, what happens? Then this mutual information is zero. Actually this quantity is always positive. So it is equal to zero and the only solution is that the probability of success is one divided by the number of keys, which is what you expect actually, you see? What will be interesting is when the number of traces will be larger and so we will take the probability of success high enough because you know that to extract the key we do a divide and conquer approach. So we need to have a fairly high confidence for each byte to be extract, okay? And in such regime actually, Thanos inequality will be quite tight. This is what I will show you, okay? So now let's explicit what is this mutual information. We will come with two bounds. The first bound is what you call the linear bound. Why is that? Because when we repeat experiments, what we say is that the information I collect is at most the sum of each individual information I can collect, okay? So this is a very trivial bound. The proof comes from the fact that Chanel is memory less. So the noise is IID, if you wish. And clearly this bound is not good because it could be additive like this. It could be tight. If each time we send a new orthogonal information, for instance, but actually no, what do we send? We always repeat the same key, you see? So it's nice to notice that, but definitely it will not be the best bound we will have, okay? So a way to see it is that actually this quantity in practice does not go to infinity with the number of traces, but actually in practice it is bounded by N, you see? So at some point this bound is useless. So let's go for another bound, okay? We call that the divergence bound, which is why I introduced all the definitions before presenting the results. And here it is, okay? So it's a novel and non-trivial bound. This K prime is actually a random variable with the same distribution as the key, an independent copy, okay? So we can prove that it is done extensively in the paper. And here we see that this bound actually will converge to N, and it is much more reasonable, okay? So let's see in practice. Linear bound is this one in green. Then we have the divergence bound, which asymptotically goes to N. So it's here we deal with bytes, okay? So eight bits. And the two bounds actually they complement because actually you see the trivial linear bound is still better actually than the divergence bound for a few traces, a few number of traces, okay? In practice actually we work also on some numerical estimation, okay, here. So it's good to have the two of them. Of course you have another bound, which is that the mutual information cannot be greater than eight, okay? So that's the reasoning about the mutual information. Now let's relate that to the number of traces. So we know in the case of Gaussian noise and additive noise that the mutual information is less than this quantity, which involves the signal to noise ratio. And so if you plug that into our final inequality, we get that actually the number of traces is more than this quantity, okay? Now if for instance we go for probability of success to one, we have this relationship. And so if the SNR tends to zero, the number of traces will go to infinity like one divided by the SNR, okay? The other bound is quite interesting because actually so still for AWGN noise channel, okay? Actually this divergence is a norm and this quantity you see, so it goes to infinity because there are Q elements in this vector, but if I divide by Q by the law of large number, it will goes to the confusion coefficient, okay? Pretty nice, which was introduced by Yunsi Fei. And so we have here an implicit bound now. So you recognize still the terms here and the Q is here and actually we are only interested in the confusion coefficient on the smallest value, which is nonzero, okay? Now let's compare with other works. So there is the work of Duke Faust and the standard at Eurocrypt 2015. So this is the attack with the maximum likelihood. So our bound is over there, you see? So we improve a little bit on the bound and the reason is we derive everything from the mutual information without using Pinske inequality, okay? Now some illustrations like here you have a single bit and you see it's fairly similar for Hammingweight for instance. The best attack is here. So of course our bound is on the left and you see they are not so bad actually, okay? So in several situations. So let time for me now to conclude. So we obtained some universal bounds on the number of traces. Universal meaning that we only reason about mutual information and there is no specific model for how our information leaks. We illustrated them in the framework of power line attacks but actually it could extend also to timing leakage for instance and so the results were found to be fairly tight, okay? So thank you for your attention and now I welcome some questions. And during the questions also I do some announcements for you just to read them. Thank you. Any question? Okay, Silvain. What I have, what I would like to ask you about probably very practical question, not getting into formulas that you have shown. About the mutual information that you showed some curves. How do you usually estimate this? To make the mutual information. Of course you need to deal with the probability distribution and there are some always freedom here which parameters you select. How do you estimate the probability distribution to get the final information value? I guess you are using MATLAB for that, right? No but so here the mutual information is the exact mutual information when we assume some distributions. You don't have any estimation for that. So let's say just to be clear actually on our work we don't study something like MIA, mutual information analysis, where we derive from the measurements some distribution and then estimate so histograms and estimate mutual information. The mutual information is completely theoretical here and it makes sense only because we manage to relate the, oops maybe I should point here. We manage to relate the mutual information to some quantity which can be estimated which is known theoretically in some scenario like for instance when the noise is Gaussian this is mutual information, we know it. Okay there's a formula according to the SNR. You don't have any estimation for that. No, no we don't estimate. Actually you see we do not come with a new distinguisher. We just come with some bounds and we know that even though we apply machine learning or complex attacks it's impossible for this attack to be faster than the bound in terms of number of traces. That's the point. Then instead of estimation you actually calculate the bounds for the mutual information. You say that this cannot be more than this bond for every point in the crave curve. Oh yeah, so to compute the bounds for instance in this case it is kind of tricky because this is implicit so we have to look the solution of Q which will solve this in equation. And so this is numerical and actually this is also why for some curves look it's not so accurate here. But this is the success rate. For this you don't have any bond means that you really deal with simulated data which is number of traces, right? No, no, no. What is empirical is the red one. For the red one actually we did like 100 or 1000 attacks to estimate the number of times we have success or not you see but those curves are analytic formulas. Okay, thank you. I think we don't have time for more questions and let's thanks in vain. Okay, thank you very much.