 Thanks for the introduction, everybody. So this talk is actually quite complementary to the previous one. In the previous talk, we've seen that it's actually possible to have strong security guarantees against high-channel attacks thanks to the masking counter measures, assuming that we have a sufficient level of noise in the implementation. And in this talk, we are concerned with the other question, which is how can we convince ourselves that one concrete device, with concrete leakages, actually has the sufficient level of noise so that masking is going to play its role. So this is the outline of my talk. I will start with a few words of introduction and motivation. I will then explain what leakage certification was so far and what were the limitations of previous approaches. We'll need to introduce a seemingly useless quantity and then I will show how this quantity can be used for our main result, which is upper and lower bounds for the mutual information. And I will finally explain how we can use these results in order to analyze concrete leakages with large number of dimensions and conclude. So let's start with introduction and motivation. So this is a paper about masking. In masking, what we want to do is to compute securely in a leaking device and typically what we are going to do is to split every sensitive variable like here the Y into D shares and perform all the computations on those shares only. And as a result, the adversary will be able to observe what we call a leakage trace which is here, so this is typically the power consumption of the implementation that manipulates the D shares. You have the time samples on X axis, the leakage on Y axis and what we expect is that there will be some leakage samples depending on the first share, some leakage samples depending on the second share and so on. And there has been a large body of theoretical work on this counter measure ranging from abstract to more concrete and I would say the most concrete view on masking is that what ultimately we can do is to bound the mutual information between the sensitive variable Y and the full leakage vector by the mutual information between one share, Y1, YI, sorry, and the corresponding leakage samples. And we raise this to the power D and as a result, if this mutual information is smaller than one, we have a mechanism that allows us to amplify the amount of noise in the implementation exponentially. And based on this view, on this theory, of course the practical challenge is how can we bound this mutual information between one share and the corresponding leakage samples. Here I give the equation for the mutual information, which is essentially a weighted sum of log probabilities conditioned on the leakage. So this is the equation for a discrete leakage function, which can happen for example if your sampling device or your oscilloscope is using 8 bits of quantization. Sometimes we also like to view the leakage as a continuous random variable and in this case we just turn the sum into an integral. I would say in both cases the problem is exactly the same and it is that concretely we have no clue about this leakage distribution. So we really don't know what's the distribution of this power consumption and this is because it usually comes from a complex device with millions of transistors and so on. So if I go back here, that means that I don't know the function F nor the function P and I cannot compute this equation directly. And unfortunately it's also hard to estimate this mutual information because it's known in the literature that the mutual information is in general hard to estimate meaning that all the estimators we have are biased and the bias is actually distribution dependent. In fact even the sign of the bias is distribution dependent so it's not even easy to bound the mutual information. So in order to mitigate that what was done so far before discussing leakage certification I would like to say a few words about leakage evaluation in general. So in the literature what will be done in a vast majority of the case when you perform a concrete sectional attack is that the adversary or the evaluator is going to directly estimate the leakage probability density function or the leakage probability mass function and try to attack with this estimated leakage model. The adversary it's perfectly fine if it works because then you recover the key. As an evaluator it's really not perfect because even if it works you have no guarantee that you are optimal as an adversary and I would say anyway the main problem both for the evaluator and the adversary is that when this approach doesn't succeed it's quite hard to interpret because when it doesn't succeed it can mean two things. Either it means that the leakages are sufficiently noisy or it means that the statistical models that you used in the evaluation is not sufficiently accurate and that's what we usually call a false sense of security in the literature. So in order to mitigate this risk of false sense of security a first step has been made with this idea of leakage certification and the idea essentially was that rather than just launching the attacks what you can try to do is try to distinguish between estimation errors and assumption errors in the statistical models that we use. By estimation errors I mean errors in the model that are due to a lack of samples by assumption errors I mean errors that are due to a wrong to a wrong choice of statistical model. So typically if you assume that your leakage follow a Gaussian distribution and they are not. So I will try to illustrate this with an example. Let's imagine that this is the leakage distribution that we want to evaluate, to estimate and that we made this Gaussian assumption. So we built a Gaussian model for this distribution what we can then do is just to sample both the model and the true distribution and built a non-parametric estimation of the distribution for example with histograms. This is what we have here and the red samples are the model samples, the blue samples are the leakage samples. So if you have this situation where typically the number of samples is not sufficient you will conclude that estimation errors dominate and that you need to measure more. If you do measure more at some point you will reach this situation where clearly you can see that there's a discrepancy between the model and the true distribution and then you will rather conclude that assumption errors dominate and that you should look for another statistical model. And based on this the main idea behind leakage certification was to say that in practice you can consider that your leakage model is good enough as long as the assumption errors are small in front of the estimation errors for the number of traces that you could measure. And that typically means if you are an evaluation laboratory you can measure your chip for one week you use a statistical model and this test is going to tell you whether it's enough or whether you have to look for another statistical model in order to efficiently exploit the information that's available. So there's an information theoretic view for this problem and for this I need to introduce this idea of perceived information that comes from a couple of years ago and if you look at the equation it's very similar to the mutual information the main difference is that right of the logarithm we replace the true probabilities that we don't know by the probabilities is assigned by the statistical model that has been estimated. So intuitively this is really the amount of information that you can extract from the sample with a given statistical model, possibly biased by both estimation and assumption errors. So if you look at the equation it may seem that we are not so much advanced because there's still the F here that we don't know but the good news is that these times this is something that we can actually compute in a two-step process, so what we can do here is we first estimate the model and once we estimated the model we consider the model as fixed in the equation and then we perform the integration by sampling the true distribution which just means that you pick up a lot of test samples from your real measurement apparatus and you do this sum of log model probabilities here and the main reason why this is correct is intuitively this blue thing here can be viewed as the empirical distribution that converges towards the true distribution. Okay and what's important to see here is that this perceived information is going to be equal to the mutual information in case the model that you use is perfect and it's going to be different in all the other situations and for example the perceived information can even be negative if you have a model that is completely disconnected from the actual leakage that you try to analyze. So concretely these are the plots that we are usually able to build what we do here on the X axis we put in log scale the amount of samples that we used to build the statistical model on the Y axis we put the information theoretic metric which is this perceived information here for a Gaussian leakage model and what we see is that as we increase the number of samples in order to build the leakage model we can extract more and more information and at some point there's a kind of saturation and saturation typically means that you have estimated all the parameters of your Gaussian model sufficiently well. If we then launch leakage certification what's going to happen is that at some point the certification test is going to fail and when this fails at this stage for example what it means is informally that the PI curve saturates too far from the mutual information and despite we still don't know what is the mutual information so that really means the evaluator has to look for another statistical model. So there were limitations to this approach I would say one first very concrete limitation is that it may of course happen that we lack samples to be conclusive that's an example where I replaced the Gaussian leakage model where a Gaussian mixture model which is the dotted curve and in this context we see that for the amount of samples that we were able to measure certification didn't fail yet which of course doesn't mean that it will never fail if we could access to more measurements. So that's one thing I would say the most important problem or more important problem is that this certification test they are only qualitative they tell the evaluator if the model is good enough or not but if the model is not good enough we have absolutely no indication about what's the actual distance between the perceived information and the mutual information because in general we still don't know what is the mutual information we don't know what is the perfect leakage model and that's annoying because of course this means we have no indication about what is the security loss because of this incorrect leakage modeling. So in order to move forward I first need to introduce this seemingly useless quantity which is the hypothetical information so the hypothetical information is again quite similar to the mutual information but this time both left and right of the logarithm we replace the true probabilities by the probabilities that are assigned by the model and this time intuitively what this means is what this represents is the amount of information that would be extractable from the samples if the true distribution was the model. So very concretely this looks nice because of course this is now an equation where everything is known so it's much easier and also much faster to compute because we know the distribution sometimes we even have an analytical formula for the distribution what's much more negative is that at first sight it's completely disconnected from the true distribution the true distribution doesn't appear in the equation anymore and that means typically that this hypothetical information will always be positive even if your model is completely incorrect you could even build the model without using your leakage samples and you will have a positive hypothetical information so that means in general it's not very useful what we are going to show is that in some specific cases for certain families of distribution it can be quite useful and in particular the most useful distribution is going to be the empirical distribution and we are going to denote the EHI as a hypothetical information when you use the empirical distribution as a model so now what is the main result that we have it's essentially upper and lower bounds for the mutual information based on this unknown model so from a security viewpoint I think the most interesting result is the upper bound for the mutual information metric because this allows us to lower bound the security which essentially is that in expectation over the constructions of the models we can show that this empirical hypothetical information is an upper bound for the mutual information and it also converges towards the true mutual information so it may look surprising that we have this type of result because in general we don't have that for the mutual information in fact there are two ingredients that we leverage to make that possible and the main one is that in our mutual information in the context of key recovery one part or one of the two random variables has a constant and known distribution the secret variable is a constant uniform distribution so you don't need to estimate this one and as a result the mutual information becomes biased upwards everywhere a little bit like the entropy would be so that's one thing that we use the other thing that we use is the nice properties of the empirical distribution that converges towards the true distribution in a monotonic manner we also have a lower bound for the mutual information metric which is essentially that the perceived information is always lower than the mutual information this is a bit more technical it's obtained by solving an optimization problem but it's also quite intuitive because it shows that whenever you use an incorrect leakage model you can only lose information now concretely what does it mean as usually in this field we perform experiments and we usually start with what we call simulated experiments so in these simulations we generate leakages ourselves and we perfectly know what is the leakage distribution so in this case we can actually compute the mutual information as well so here in this graph you have this we can support again where the x-axis is a number of samples used to build the model the y-axis are the IT metrics and we have the perceived information in blue the hypothetical information in red and the mutual information in black we see that we as expected have a bound and one first nice observation is that the hypothetical information which is the bound that we are always going to use is converging faster than the perceived information and that's essentially because it's a simpler quantity and it doesn't embed a kind of causality step in its estimation and what's also nice I guess from an evaluation viewpoint is that the bound is becoming tighter as the number of samples that you have increases and that means if you're an evaluation lab the more you measure your chip the better the security claim that you can make will be so these are four simulations we as usual next move to concrete experiments so these are the results that we obtained with an FPGA implementation of the AES we again have the hypothetical information in red the perceived information in blue both corresponding to the empirical model empirical distribution we have the bound aspect again so what's mostly different in this case is of course we don't know where the mutual information is because this time we have no clue about the real distribution and also the total number of samples that we could use is a bit lower because it takes much more time to measure the chip than to generate the leakage ourselves and what we did in addition is to look at other statistical models in particular the Gaussian leakage model because it's always convenient and this is what we have here so here we have the Gaussian perceived information and the Gaussian hypothetical information we see that they converge much much faster because the models are also much simpler and what's nice is that despite the model is simpler they still fall within the bounds so that typically means for this particular device which is not a general conclusion the model, the Gaussian leakage model is actually good enough and that leads me to the last step this good news that we can actually use here a Gaussian assumption is interesting because there's one thing that I've been hiding a little bit so far and it's the fact that in general what we want to do as evaluator is not to bound the amount of information that you have for one single leakage sample but it's to bound the information that you have for all the leakage samples so typically this is what we call multivariate analyzers and this is an example where I compute this hypothetical information for all the time samples corresponding to the manipulation of one share so to make the link with the previous figures the conversion plots that I've just shown actually corresponds to this single sample here which has the maximum amount of information and this full figure here corresponds to the older time samples corresponding to the manipulation of the first share of my masking scheme so what we want to do is not to know what's the individual information for every point in time for all the points in time and this is something that we can quite efficiently do under the Gaussian assumption so again this is what we call multivariate Gaussian HI and we see here the information progresses as you are using more and more dimensions the interesting bit is that there's actually a factor of 5 between the best univariate attack and this best multivariate attack that we have so it really means that if you ignore this fact that you have actually many time samples available you are going to overstate security by a factor of 5 raised to the power d which can be very substantial and what's also interesting is that there's another factor 10 between this bound that we've just defined and the naive bound where you just sum the information of every point in time which is essentially assuming that all the informations are independent and it turns out it's not the case so in this case it would be much more in fact too conservative and the last thing I would like to show is why the hypothetical information bound is useful in this context and for this I again show the convergence plots for the multivariate analysis so again number of samples to build the model information theoretic metric this curve is here for the hypothetical information and the green one is for the PI for 10 dimensions we are still able to estimate the PI for 40 dimensions you see that you have more information in the bound but PI becomes harder to estimate so the bound becomes increasingly useful and for the 250 dimensions there's no way we can estimate the PI we can only use the bound and that's essentially because there the PI will be stuck to minus infinity all the time and so it's time to conclude so to the best of my knowledge these are the first quantitative tools that we can use in order to avoid a false sense of security in such an L security evaluations and by this I mean avoiding this problem due to incorrect assumptions about the leakage model when we have low number of dimensions we have a formal bound on the material information which is based on the slow convergence of the empirical distribution when we want to analyze larger number of dimensions we show that it's usually needed and useful to have simpler models and that suggests as a typical strategy when you have a leaking device to first try different models for low number of dimensions and compare with the empirical bounds so you know whether the model is good enough and then generalize to multivariate attacks the main advantage of all this is that performing this type of text is much much faster than mounting an attack both in the amount of data that you need and in the amount of time that you need to compute the matrix for example it's really the first time that we are able to bound the material information for a leakage vector with 250 dimensions and what I think is also interesting is I believe it's potentially applicable in other contexts like for analyzing privacy leakage in particular I would say one nice feature of the bound in the context of open data publishing is that the tightness of the bound increases or becomes better if you reveal more information that this typically means that as a database owner you have no incentive to hide the data that you collected in order to show the bound and this concludes the talk thank you Any questions? I have one question are you familiar with the quantitative information of theory? I don't know much myself but it sounds vaguely related to me not enough not enough I would say because from what I know very little they also they have techniques to quantitatively bound this mutual information associated to leakage from the communication channels and I just wonder if they are in a relation but I don't know much about it sorry ok thank you let's thank the speaker again