 The second top is an idols linear cup analysis of EES with ace matches. The authors are Angel, Petrona, Philip, Venet, and Philip will give us the top. Let me have a chat with you. Yeah, so this is about TES. So let's talk a little bit about that. So a little teaser for you guys. What we did was we wanted to revisit the linear cup analysis of TES. And we did this by building some new statistical models for both the right key and the wrong key distributions in this attack. And this is the first real improvement of the attack since the Suisse attack in 1994. So there was a little bit of some new analysis by Chanel in 2001, but this was just re-analysis of the old attack, so there was no changing the actual attack technique. And also this is the first successful attack on TES that uses multiple linear approximations. So there's been some attempts before that weren't really successful. And what's most interesting is that this attack exploits some asymmetries in these distributions that we'll have a look at. So linear cup analysis was introduced in 1993 by Matsui specifically to attack TES with great success and it was actually the first experimental break of TES. So what it does is we have these input output masks and we look at the probability of the parity indicated by these inputs and output being equal for the plain text and the standard text. And this probability here of course depends on the key. And if this correlation is large we can use this as a distinguisher. So even though there have been quite a few improvements of linear cup analysis and a lot of extensions, none of these is really applied to TES so far. So the key recovery part of linear cup analysis works as follows. So we obtain some plain text, cyber text pairs, and then we have some good linear approximation over say R minus two rounds. And then we get some other round key bits. We encrypt here, we decrypt here, and we check the correlation here. So the idea is if for a right key guess we get a high correlation indicated by our good approximation. And if we guess wrong well then we get some different mapping from P star to C star which hopefully has a low correlation. So we use this as a distinguisher to indicate whether this guess was right or wrong. Now the important part here is then how do the correlation distribution for a right and a wrong key look. This totally determines how powerful our attack is. And our main tool is this equation here. So this is what is known as the linear hold basically. This sum is a sum over all linear trails starting with alpha and ending with beta. And it's a sum over all the correlation distribution, correlation contributions of each linear trail adjusted by some sign. And this sign is key dependent. So of course the linear hold is quite large and we can't look at all trails. So what we'll do is we'll use this model by the time of a t-shout, where we consider some subset of the trails. So we'll call this the signal. And this subset should be a subset of strong trails to get a good indicator of the sum here. The unknown part will model as this model distribution. So this is basically just the correlation of an ideal cipher. So we'll just kind of show noise here. So yes, this term won't matter too much because we have very strong dominant trails, but we'll include it for completeness. So what previous attacks in DS is considered was just that or assumes that DS has one dominant trail. So it has one trail that is very strong in the sum. So we can approximate this sum by just a single term. What this means is that the correlation value practically only takes two values, plus or minus this correlation contribution, which gives us this assumption about right key equivalence. And what this basically says is that the expected value of the absolute correlation is fixed. So there's only one absolute value that we'll observe. And the question or our question here was is this assumption actually true? Well, let's have a look. So we enumerated some thousand trails. For a good approximation, one of the approximations used in the original attack. So this is our signal. And this signal actually does have one very dominant trail with this correlation contribution, but it also has a bunch of other trails that contribute something to the sum. And we look at this sum formula in keys. And we get this. So this is a picture of the probability distribution of this linear correlation. And we do have these two peaks around the plus minus this value here, which is fine. But if we look at it closer, this is the positive half, we see this kind of weird distribution. So here is the value of the dominant trail, which is fine. But what is interesting is that we actually have a higher probability of getting a value that's lower than what we would expect under this key equivalence assumption. And we also have this peak down here, which is even higher. So we don't have strictly just one value. So what we propose is to model this. So you can see this kind of like some overlapping normal distribution. So we propose to model this as a mixture of normal distributions. This basically means this is a distribution where we pick a normal distribution, pick between normal distributions with some probability. So here we have fitted a model with three normal distributions. The one you see in the middle here corresponds to the dominant trail, so what you would assume under the old model. And the interesting thing is this only accounts for about 30% of the actual correlation value. So what you see here in red are the components of this mixture model. And the green is the full distribution, which fits quite nicely to what we measure. So we can get a good estimate of our right key distribution using this model here. So for the wrong key model, well, what with Sui kind of reason was that for wrong key guess you get something that looks fairly random. So you should get a correlation very close to zero. Put an open t-showser, redefine or refine this thing and set, well, maybe it looks more like a random permutation, which we know doesn't have quite correlation zero, but instead the correlation is distributed like this normal distribution here, where n is the block size. But again, this might be a little bit optimistic for a cycle like DS that has extremely strong trails, even if we guess the wrong key, we probably don't get something that looks completely random. So we see what actually happens here. As I said before, if we guess wrong key, then we have this mapping or some other cypher. And what does this cycle look like? Well, it looks like this. So first we have a decryption, then we have an R round encryption, and then we have a round of decryption again. So instead of looking at just a random permutation, we'll look at this cypher here, this related cypher. And if we do this, well, we can again do our trail search, find a signal, and if we then look at this correlation distribution, it looks really weird. So I'm not saying that this is necessarily what the real distribution looks like, but it's probably something close. But the important part here is that if we look and compare this to the distribution of a random cypher, then the variance is quite a lot larger than what we would expect. So obviously this random distribution here is quite optimistic in this case. Okay, cool. So now we have a model for the right key and the wrong key, but in most cases we're not interested in actually sampling the whole code book so we can actually measure this correlation accurately. We want to only use a small part or relatively small part to code book. And if we do this, we can show that we just need to add this normal distribution with a variance 1 over n, then we go to our actual under-sample distribution. And that looks like something like this. You kind of lose all this weird structure that we've been seeing and this is very close to the model that Matsui actually uses. But I want to stress that this only works if your n is quite a bit smaller than your code book size. So if you have these very marginal attacks, then you should be careful about how these distributions actually look. Okay, so we can also extend this to multiple approximations. And the models are totally analogous. We just have a for the right key. We have a mixture of multivariate normal distributions. And you have a multivariate normal distribution for the wrong key. And what it visually looks like is this. The blue here is the right key and the red is the wrong key. And here we have two approximations. So we have four of these mixture components here. And in this model we just say that they occur with the probability. So in a bit we'll look at what happens if you don't have exactly this picture where all the components are present. So now we want to know how can we distinguish between this blue distribution and the red distribution as best as possible. So some work was done on this by Pewdy Cove in 2004. And they proposed a model that basically uses a test statistic B here, which is just the probability of X, which is the correlation that we observe, that this correlation occurs in this distribution. So this only uses the right key distribution. So I'll show you what this looks like in a bit. But what we need to know is the success probability of this thing. That's pretty obvious. It's the probability that we classify right key as right key. And then the advantage which is related to the false positive rate. So it's a measure of how many wrong keys to be classified as the right key. So let's look at this visually. The classifier will look at this screen outline here. And it'll say that anything inside this area is probably a right key. And everything outside is a wrong key. Well, intuitively, that's not the best idea because what if I have an observation all the way down here? Well, I probably shouldn't classify that as a wrong key. So for fixed success probability, this has an advantage of 3.1 bits. So I'll get back to what that signifies in bed. But this doesn't seem like the best idea. So what we propose is a new classifier which takes into account both the right key probability and the wrong key probabilities. This is basically just a likelihood ratio of an observation occurring. And if you look at the picture for that, then this distinguishing line here looks quite differently. So anything in here will be classified as a wrong key and anything outside will be classified as a right key. And I guess this is something like what you would draw if I told you to draw this intuitively. So anything out here will also be classified as a right key guess which intuitively makes sense. And we'd see for the same success probability we get a boost in advantage. Higher advantage is good. So we actually get a couple of bits just by changing our classifier a bit. Okay, so now to the actual attack. We identified two sets of four good linear approximations. And these also include the approximations that Matsui originally used. What I would put here is the dominant trails for these approximations. And what we can see here which is interesting is that this approximation and this approximation for example have the same dominant trail. In fact, dominant key trail. What I mean by that is that the sign of these approximations are defined by the same key bits. So if we look at their distributions here, so again Louis's right key and the right is wrong key, then the joint distribution of this gamma one and gamma three here is not symmetrical. So if one, just one correlation here is negative, then the other is always positive and vice versa. And that should give us some extra distinguishing power. This may be easier to distinguish than in this case here. And the same down here. By the way, we actually verified these distributions. This is the model distribution we verified each practically by doing experiments. And they match very well. So let's see what happens with our likelihood classifier in this case. Well, now we get these lines here. So anything in here would be a wrong key and everything out here would be a right key. And we see that we can get a boosted advantage. So it's much easier to distinguish in this asymmetric case than in the symmetrical case. And that'll give us some boost in our attack power. So let's see what the effect actually is. So we have here data complexity versus advantage. And we down here have the old patient model and up here we have the likelihood model, our new model. And we see for both of them when we move from a symmetric case to an asymmetric distribution then we get a boost in advantage by one and a half bits or so. And by going from the patient model to the likelihood model we get something like five bits higher advantage. So both of these things can help us out. So let's see what that actually translates to in attack. So here we have our data complexity universal versus computation complexity. And we can compare so we have two different types of attack here. We choose two points in these curves here. So one compared to Metsui's original attack uses slightly less data and has a better time complexity. And one uses much less data, which in some cases can be more useful, versus slightly higher type complexity. So the point here is that we should be careful when we look at these linear models. Be careful just to make assumptions about what the correlation distribution looks like, especially for very marginal attacks. And you should also try to look for these asymmetric distributions. It would be very interesting to see if these weird distributions can be used to gain stronger attacks forward for other sites as well. So thank you for your attention. It is very reminiscent of what people in machine learning are doing, time to separate. And it's known that linear separators are not going to capture certain divisions of the multidimensional states that are using kernel methods. Have you looked at kernel methods in machine learning in order to see what is the best way how to distinguish between those various collection of things? No, not yet. So I looked at a bunch of different methods to try to come up with something nice for this case in particular. And here the likelihood ratio seems to be very good. Theoretically, the likelihood ratio should also be optimal if you have an accurate picture of the probability distribution. So here is one idea that might work or might not work. You insist on classifying every measurement as either a right key or a wrong key. But in fact, when you are far away from all those peaks, it is not clear that it is a meaningful measure because you are a very borderline. So you may want to classify your measurements as three things, right key, wrong key, or undecided, meaningless. And you may find it better to not force the classification on far away points. Yeah, probably. So one thing you could say is that even getting an observation out here is of course extremely unlikely to begin with. And so I'm not even sure if that can actually happen because there's not actually no distribution. It's an approximation of a discrete distribution, right? So yeah, it might work. You could also say that this might not entirely make sense either because just like I pointed out here could be either. Yeah, that's worth looking into. Did you try to find any impacts on real application? I know DES is programmed for some applications with very, very low security level. It might be still used in some way. So did you try to find a real application in some real usage? Not for DES in particular. So I know that a triple this is still useful in many places and I wanted to look at that but that seems to still be outreach even with measures like this. So I haven't seen any real use days. But I have seen some other examples of also differential proof analysis where you have some really extremely weird distributions that might be interesting to look at. So that's also a suggestion for other people to try to look at this for differential that would be really close. Any more questions? Ignore the same thing.