 First of all, I would like to warn that this talk might be a shock since the topic is much different to the topics of the other talks and also to the other papers of the conference. And the reason is it concludes or it includes much more engineering than the others. And I know that most of you are tired today, this is late. And I avoided any formula here and I decided to just visually show what I have done in this work. Okay. This is the outline of my talk, we'll try to explain something about the background of Sergeant Attacks and Sergeant Analysis. And I will move to our collision-based Sergeant Attacks and the problem solutions and the issues which have been addressed during the last six, five years. And then I will show what is new in this work and what is the contribution actually or novelty. And finally I will show some experimental results to support the claims of efficiency of the attack that I present to you. Okay. What is the story? And we probably have heard already about Sergeant Analysis or Sergeant Attacks. We are trying to perform the attack on the cryptographic devices, we are trying to recover the key from a cryptographic device, we are not performing attack on the algorithm itself. What we do, actually we have a hypothetical model for the leakage, let's say the leakage can be power consumption and then we compare the result of the hypothetical model with the actual leakages that we have measured, for instance, via oscilloscope from the cryptographic device. You may have heard this a couple of times, but I would like to show you usually what we do in a classical Sergeant Analysis. Suppose here that this is a part of an AIS encryption at the first round, that you have a byte of the plaintiff and byte of the key that they are explored at the first round and they go to one S-box. We have, for instance, a couple of different plaintiff's byte values and for each of them we measure the power consumption like this in time. And then we look at one specific time instance here and we make another vector corresponding to the plaintiff's byte values and we make another vector like, which is called power vector. And then we try to check for guesses or the hypothesis that we have for K or part of the key and for each of them, which are these guesses that we have, we compute some intermediate values like a S-box input, a S-box output based on the plaintiff that we know here. And using the hypothetical power model that we have, we just compute another vector of a hypothetical power values and we make a vector similar to this vector that we have measured actually. We do the same for different other guesses that we have for the key and we again get another hypothetical power vector and so on till and for instance here since A is an 8-bit values for the key, we have maximum 256 hypothesis here. And what we have to do, we have to check now whether or which of these vectors or hypothetical vectors is close to the actual power vector that we have measured from the device. By means of correlation, which is a known way, we can compute or check the linear relationship between these two vectors or each of these two vectors and finally we get a vector of correlations here like this. And if the time instance that we have selected here and the model is correct for one of these candidates and we will see strongly higher values compared to the others and we say that most probably this key or this guess is the correct guess of the part of the key. This is a part of the attack. We have to perform the attack on the other. Plain text bytes or the key bytes to completely recover for instance 128 bit of the A-s encryption. Okay, what is the problem now here? This is a classical DP in it. It should work. The problem is here the hypothetical model actually. Mutual information analysis try to relax this assumption that this hypothetical model or dependency of this attack on the hypothetical model. But it has some limitation and in some cases a steal it. It uses a model and then in some cases this may not work correctly. Because of this, I'm personally fan of side channel based collision attacks that they are applicable when you have a part of the circuit in a shared way. It means that you have for instance a Sparks or module of Sparks in your design like hardware or software. And then in different time instances the same module is used for performing the sub bytes operation and one plain text bytes and later in another plain text byte values. The story is the way that we perform the attack or correlation or sorry, collision attack is in this way that we measure the power consumption or collect the power traces for every of these plain text bytes that we have. And we do it the same for another one. We have for every plain text byte values, we have a corresponding trace or power consumption trace. And then we try arbitrary to check whether each of these two traces are the same. It can be done completely exhaustively and check all possible values. And once a collision found and we set up for with a high probability a collision happened and two Sparks has computed the same values. Then we can easily write that the difference between the key or two key bonds are the same as the difference between the corresponding plain text bytes that we have selected or we have found the collision for instance here. This is actually called a linear collision attack on AES, which is well known from roughly, I think, eight years ago. And the problem is dealing with false positive collision detection. Since this is not the only case that we have to attack. Therefore, we have to later perform the attack under, for instance, P3 and K3 and so on when we recover a lot of different equations to reduce the number of candidates that we have for the full key. If one of these equations is wrong, the chain will be completely wrong. And then the correct key is not amongst those candidates that left. To solve this problem, a couple of heuristic methods and systematic ways have been introduced and proposed during the, let's say, five years. But we, roughly two years ago at chess 2010, we came up with another idea, which we call it correlation enhanced collision attack. The same scenario as before, we have many different plain text bytes and for P1 and P2 and we measure the public consumption for here. First is boxes can put in another time instance, second is boxes performed. We look at one time instance of when the first is boxes performed and we make one vector of power traces or power values here corresponding to plain text bytes again. And we, sorry. And we do the same for P2 at another time instance here. Then what we do, and what we have proposed to do is to get the average. Average based on the plain text byte or first plain text byte values and we get a vector of mean, mean power values here like this. That you see from, from zero to FF for all possible values of plain text one. And to do, to, to, to get progress, we need to guess a part of the difference of the key. Means that we select the, we can see the, the second set and we have the hypothesis of the difference between the keys. And based on this difference that we have made, we made another, we make another hypothetical or let's say mean, mean vector of the power values based on plain text two x or delta k that we have guessed. We do the same for all, all other possible values for this delta key or let's say difference between the key bytes until again, 255. And we get 255 mean vectors here with another mean vector of the first set. Again, what we do, we compare them or we check whether this mean vectors are the same or not. The reason for this is that if, if the, for instance, one of the delta keys is correct, then let's say in all cases, in all situations, the collision happens. Means this one should be similar to this. The second one should be similar to the second one, and so on till end. If we do the same from computing the correlation, again, we get another vector of correlations. And one of them should be much higher than the others. If we have selected the correct time instances in both Spock's computations and also if we had enough measurements or traces to accurately or good estimate the, the mean values. But what is the problem? Okay, this attack worked well. But what was the problem that I came with this paper now? When we have a countermeasure in our system, when we attack this or we use this scheme to attack some, some real devices. When we have a countermeasure, for instance, like masking, which, which I can call a secret sharing scheme. And then all computations and all shares are performed at the same time. And then the leakage of the power consumption or leakages are, let's say, univariate way of the leakages are proportional to the secrets. But to recover such a kind of, such a kind of univariate leakage and mutual information analysis may work, of course. But this attack that I have present on the last slide, which is a correlation enhanced collision, may not work. The reason is averaging. Suppose here we have different probability distribution functions. Probability distributions and when we get an average, we will see that the averages are the same. Although these distributions are much different to each other. But if we look at the averages, we cannot distinguish them. This is exactly the same if, if this kind of, let's say, countermeasure exists here, then we cannot, and then the averages are the same. And then we are not able to perform any attack. But how about the higher statistical moments? For instance, variances, if we look at the variances, for instance, we can distinguish these two from these three. Or we look at the third statistical moment, skewness, which shows that the direction of the distributions, we can see here that these two can be distinguished from the others. And also for the, for the statistical moment, or for the other statistical moment, which is called, which is called kurtosis, which shows how sharp is the distribution. We can distinguish these two from each other. That was actually the idea and we tried, or I have tried to use this higher statistical moment, the same attack. The, the way of the attack is exactly the same as before. We have the plaintext as plaintext bytes and public values at two time instances. And then instead of getting average, we compute the variances, second order moments. Very straightforward, and we do the same for all possible candidates that we have for the difference of the key. And again, we compare them by correlation. We get some vectors, or one vector of correlation. And again, if the second order moments are available or are leaking through these time instances that, one of these values for this correlation vector should be significantly higher than the others. It can be done again like a skewness, the same, exactly the same, just changing from variance to skewness, or even higher statistical moments. But what we can do without considering any specific statistical moment, we can go for a general form. For instance, here, instead of getting any statistical moment, we can go for computing the probability distribution functions, or estimate the probability distributions. We do the same. Again, we have, for the second set, for any possible values for the delta key, we go for computing the, or estimating the probability distribution functions. For all possible things, finally, we get 256, again, vector of probability distributions. And then we want to compare these two probability distributions, for instance, for the first one and for the other one, with a guess part of the key, or guess part of the difference between two key bytes. The way that I have used to compare or check the probability distributions was Jeffrey divergence. Okay, I left one formula here, sorry. But I have to say that this divergence, or this Jeffrey divergence is a tool to measure the distance between two distributions. And then, if we compare all of them here, we get a vector of divergences. And since, as I said, this divergence checks or measure the distance between two distributions, the lowest number should check or should show the correct guess that we have about delta key, or difference between the key bytes. But there are some practical issues performing these kind of attacks. First of all, when we go for high acetic moments, we need usually much, much more traces, or let's say, much more samples to accurately estimate the acetic moments. And according to my experiences, I would say that going or checking more than a teratocestical moment may not be useful or may not make sense that much. This is the one point that if we want to use or look at the higher statistical moment, we may need much more traces than the other schemes. But also when we go for a general form by estimating the probability distribution functions, we need to estimate this in the discrete form. One of them is, of course, histogram. I know this is not a perfect way, but this is one of the ways, the handy ways that we can use to estimate the distributions by histogram. And again, since by histogram we lose accuracy, we may need again much more measurements or samples to perform in one attack successfully. Also, something about GFRI divergence is based on the Kubla-Kleibler divergence. But since Kubla-Kleibler is not a symmetric measure or symmetric tool for measurement, I decided to use a symmetric way of Kubla-Kleibler, which is a JFRI divergence. But anyone can use any other scheme to compare or check the similarity of the probability distributions. Also, I had some experimental results that I want to show you, which are based on, some of them based on RITX-2 Pro FPGAs of Zionlix and Cebo boards, and also one microcontroller, an actual microcontroller embedded on a smart card. First of, let's say, first experimental result is present implementation of, let's say, threshold implementation of present here, present here, which is a work that we have done roughly two years ago, published at Journal of Cryptology. And here is one secret or, let's say, secret or shared way of computing the present S-box. As you see here, there are some shares, some part of the functions that they are working on different shares. And then this computes S-box of a present and different timing stands in the same module as used here for another part of the plaintext or part of the key. The same as before, we measure the power consumption and we try to perform the attack using the averages. Let's say first order moments, nothing was obvious as I said before. It might be not possible to perform the attack using the classical correlation collision attack. But when we look at the variances or second order moments, clearly we can distinguish the correct candidate that we had. And here is another figure to show how many traces we need to perform the attack successfully. Here is around, I believe, seven million traces. Of course, it's a lot million traces because the devices or design is very small and it includes a lot of noise in the design. And then, of course, we require a lot of traces to perform attack successfully. If I look at or when I look at this Q-ness or third order moments, I saw the same. Again, the attack was possible, performed with roughly less number of traces. And when I look at the PDF or let's say probability distribution functions, of course, the attack works again, but we need much, much more traces than as Q-ness. The second experimental result was a threshold implementation of AES that I presented last year here at Eurocrypt. And it's the same, the same scheme as before as present. But a lot of more steps to perform there or compute one AES as box. And measuring the power traces again locked before using one FPGA, design leaks, vertex pro 2 FPGA. Looking the averages, the attack was not successful, of course, as I said before, since the threshold implementation, which is an awesome technique to contract power analysis, has this goal to prevent any first order leakage. And I believe that there are some evidences that the first order leakages are completely prevented. But when I look at the variances, again, the attack was working perfectly using 20 million traces. And by as Q-ness, the attack didn't work, surprisingly. And finally, by PDF or probability distribution way or general form, the attack was possible. But we need much more traces, roughly 50 million. Again, it was another evidence that when you use for PDFs or we go for a general form, we may need much more traces because we are losing the accuracy again. Okay, this is my last slide. I had another experimental result which was based on the software, means a market controller which is performing mask AES. Here in this step or this time instance, a complete mask as box table is reconstructed. And then another time instance, the mask as box is used on the mask data. And then here later in the same module is used to perform the same mask operation under another part of the plain text. And this was actually the reason, I mean, I did, sorry, I did this or put this experimental result to show how to extend this scheme to multivariate case or multivariate attacks. We need to, in a multivariate attacks, we need to consider two different time instances of the power values. And then combine these two to perform the attacks. Since we are getting some information here about the mask or and here we are getting some information about the mask data. When we combine them, then theoretically the attack should work and actually the higher order power analysis attacks are based on this phenomena. Of course, we can perform the attack on a general form by computing or having the joint probability distribution functions. And also, we can go for joint statistical moments. But what I have written the formulas was clear or was very clear that when we go for joint statistical moments, it will be exactly the same as doing the one pre-processing a step by multiplication, means if we multiply these two points together. And then perform a classical DPA. And then it has this meaning that when we perform a higher order attacks that is very well known to the, let's say, chess community. We are doing or we are looking at the higher statistical moments. At the end, I would like to stress that you may ask why this attack should work, or why we should go toward, let's say, side channel based collision attacks. The reason is for some cases, for some applications, especially when we want to evaluate one design as you, as a designer, design something and you want to evaluate it. You give it to an evaluation lab and the evaluation lab will check whether your design is appropriate or secure against known attacks. Of course, classical DPA attacks should be performed or some kind of mutual information analysis. But we believe that if the situation is in this way, that the collision attacks are possible, means that one module is reused in one computation, for instance, or one round of the cypher. Then, of course, correlation attacks should be considered since they are avoiding any model, any hypothetical model. And they are checking, they are able to check any statistical moment that they might be involved in the leakages. Thank you for your attention. I'm ready for any question that you have.