 to the second part, which is entitled characterisation and estimation of the key-ranked distribution in the context of site channel evaluations. It offers, uh, Daniel Martin, whatever. These are what's possible in Martin's stem. Okay, that's cool. Starting from the beginning. Okay, so this is a joint work with my colleagues from Bristol, Dan, Elizabeth, and Martin. And the general theme for this talk is how to do site channel evaluations better. So the claim I want to make is that we're not evaluating the resistance of a device to non-invasive types of site channel attack, such as DPA, as well as we could be. So hopefully I want to convince you why that is and what we can do about it. But first, just to kind of motivate this why this is important. So resistance to these attacks is encoded in several evaluation processes already. If you're manufacturing to a certain level within FIPS 143 or common criteria 3.1, you have to give evidence that you're resistant to these kind of attacks. There are billions of devices out there that have to or should be following these security guidelines. And the reason why an accurate evaluation in particular is important in this case is that SCA is almost always probabilistic in nature. So sometimes the attacker gets lucky. And as evaluators, we need to be able to say with some confidence how lucky that attacker can be. So the underlying motivation for this work was there's been a recent trend in literature which suggests that we need to change how we view the outcome of our attacks. So I'll circle back around to what I mean by that. But the rough plan for this talk would be to explain how we do attacks at the moment, how we evaluate at the moment, and then how we need to change our view to include something that we call the rank of an attack. And then consequently how we should modify our evaluation strategies to include this notion of a rank. Okay, so a very, very like 10,000 foot view of a science attack is this. You have an adversary that needs to gather measurements that we call traces. It needs to be power consumption, EM radiation. And the attacker hopes that those measurements contain some information leakage on the secret key used by the device. Then the attacker has to essentially define their strategy. And that consists of two main things. The first thing is a model for the leakage for the device. And that model can be guessed, derived, estimated. But in general, the adversary hopes it matches well what is going on inside the device. And the second thing the adversary needs is what's called a distinguisher, which is essentially just an overloaded term for a statistical method on algorithm that compares the model leakage with the real leakage inside the measurement. So when the attack is run, you have a key spit out, which is the attack's best guess for what the key is, which the adversary can check using a known plain text or ciph席 pair. And in this current model, if this key is incorrect, then that's failure, the attack's failed, you have to start again. So I quickly want to discuss what affects success for an adversary. So some things are kind of systematic. If the adversary doesn't model the leakage correctly or picks a distinguisher or a technique that doesn't quite capture the dependencies in the leakage, then the attack's not going to do as well as it could do. But there's also random sources of error. There's environmental noise, there's countermeasure noise, and maybe measurement quality also matters. So the way evaluations happen at the moment is what I'd call an attack-based approach. So if you're a manufacturer and you have a device, you'll give an instance of the device with the cryptid deployed on it to some guys in the laboratory, which can be internal or external. And using their kind of best knowledge, their best expertise, they'll define a list of attacks, and they'll run those attacks on the device and see what happens. And if they all fail, if they all don't produce the key, then you're going to complete the device secure. If one succeeds, then you kind of judge its strength quantitatively, which in general, there's a few properties assessed, such as the amount of time the adversary took, how much money they had to spend. But kind of the big one is how many traces, how many measurements the adversary needed to get because these are time-consuming and expensive to acquire. Okay, so back in 2012, some guys at SAC noticed that the adversary doesn't need this attack to be perfect. And they essentially defined a strategy for the adversary in which the adversary does the exact same thing as before, same measurements, same attack configuration. But they essentially, and I don't want to go into the detail as such, but they essentially described a method for the adversary to make use of some auxiliary information contained in the attack result. And the adversary does what we call enumeration work. The adversary is able to essentially assign a score associated with the likelihood of all the key candidates being the correct one. And once you've done this, you're able to check the key candidates in order of their score. So essentially, I think a good way of viewing attacks in this model is essentially an enhanced brute force search. You assume the adversary is going to check keys, and you assume that the sideline information, what it gives you, is a bit of an ability to check keys in a clever order than just randomly. So going a little bit more formally, we say the rank of an attack is the number of keys the adversary has to check before it hits the correct one. So I'll have a kind of really simple illustration on the slide here. You assume we're attacking a block cypher with a 128-bit key. The adversary does this enumeration step and generates the keys in order of likelihood checks. It keeps going until they find the correct one. So in this example, the rank of the attack is 2 to the 57. And the adversary has to enumerate and verify that many keys. I want to point out this is absolutely not the same as doing 2 to the 57 executions of AES. There's a lot more work involved. A kind of rough idea of what happens is an attack produces several lists or several subsets of information that give information on the likelihood of certain portions of a key. And the adversary has to be smart in how it grabs those little bits of information and combines them to give a likelihood for a single candidate key. Okay. So if we circle around to evaluation, and we consider rank now, we have some interesting questions. And an example of one is this. So if you have an attack that requires 10,000 measurements and some rank left over afterwards, and an attack that requires five times more measurements but requires less brute force work afterwards, which is better. And I think the answer actually is it really does depend firstly on how long it takes to get measurements, how difficult it is, and also on how difficult the enumeration is. So at this point, there has been some work done by people to do evaluations on how to make use of rank or how to incorporate rank into evaluations. Where we started with this work was to say, kind of do the obvious thing and say, well, the rank is a random variable defined over the randomness in a fixed number of measurements. If you pick an attack strategy, pick a fixed number of measurements, and run the attack several times using different freshly gathered acquisitions, you'll have different ranks come out. And the interesting question is, well, how different will these ranks be? And then you might ask, well, can we analytically compute this distribution? Because that will be nice, nice and straightforward. But the answer is, unfortunately, no. If you're happy to make a lot of assumptions, then you could do this, although no one has yet. But in practice, if I gave someone a device tomorrow and said, compute this distribution, let's see what it looks like, they wouldn't be able to do it. And a second interesting question is, in science and literature, we normally care about averages or expectations. So success is normally defined as an average. How many times will I succeed on average? And so a good question is how it's looking at the expectation of this rank. A good idea. Okay. So if no analytical approach is available, this essentially collapses to a pure statistics problem. The only option you have left is to estimate the distribution via repeated sampling. So what this means in practice is you have to fix your strategy, fix number of measurements, gather a fresh set of measurements each time when the attack can estimate the rank. And so questions we wanted to kind of answer were what does the distribution of the rank actually look like? And secondly, in science analysis, I don't know if you've ever been to a chess conference, but you'll see hundreds of different ways of constructing attacks ranging from very, very complicated machine learning techniques that only apply in kind of niche scenarios to very, very simple, general attacks that require as little as subtracting the mean as two data sets. And so we wanted to ask, well, does the distribution to rank look the same for attacks that follow wildly different strategies? Okay. But before I go into results, actually the first thing we want as people doing evaluations is a way to... Well, essentially a way to not have to check every single key in turn. As evaluators, you don't want to be doing this every single time you want to work out the rank of an attack. But because we know the key, we can use that information to basically get a fast approximation for the rank actually is. So starting from, I guess, Europe 2013, people have tried to define methods for doing this. The majority provided an interval estimate. And actually Asia Crip last year, one of my colleagues, Dan, described an algorithm that provides a point estimate. And this is one we like. We think it's fast and accurate. And as part of this work, we spend a bit of time trying to make it better. So to quickly describe what we did, we made essentially several observations that reduced the one time of it. And this allowed us to achieve quite a few orders of magnitude more precision and no additional one time. And actually the interesting thing here is that the observations we made didn't actually reduce the algorithmic complexity, asymptotically. So it kind of shows that constants are important. And what this means in practice is that you can get a very accurate point estimate in most a few seconds on a reasonably powerful workstation CPU. Okay, so the first thing we found, the first thing we did was essentially run hundreds of thousands of millions of attacks on our university computing cluster. And we basically varied everything from noise levels, attack strategies, numbers of measurements, and so on. And the kind of interesting thing we found was that the distribution of this rank doesn't really change or is independent of all these things. And essentially, if you take an attack strategy A, which uses a small number of traitors, attack strategy B, which is cleverer, which is less clever but uses more traitors, those attacks will produce a very similar looking distribution if they kind of do equally as well. So good attacks that produce low ranks produce distributions that look the same. And bad attacks that produce high ranks that have high ranks also have distributions that look similar. We don't know why this is, this is where this analytical route might be useful. So in this work, we stuck to trying to improve the evaluation process, but that is definitely something to look into. So this is an example of what the kind of evaluation that you can do. So we took a data set produced by one of my colleagues at Bristol, which was published as Chairs 2015, which is a DPA attack on a fairly difficult target, so a device implementing AES and hardware, and where the attack strategy was for those that are interested is just a straightforward correlation, power analysis using EM measurements. So we had maybe, I think, the order of a few million traces. We took small subsets of those traces randomly, ran the exact same attack that these guys did and estimated the rank, and that's what you can see plotted on this graph. So if I were the kind of hat of an evaluator under the kind of why, except it was the current way of doing things, the interesting part is kind of down here or here, which is where the attacker starts kind of winning perfectly. On average, this is around 80,000 traces, but the adversary starts getting a luckily perfect attack around about 50,000 traces. And in fact, this is what my colleague Jake reports in his paper. Now if you take rank into account and let's assume that, I don't know, we want 80 bits of security in some sense, which is this horizontal line here, the adversary actually starts beating that, starts winning on average here, and I kind of wish I drew on lines now, but that's about 20,000 traces. And actually 10% of the time actually wins somewhere between 10,000 and 15,000 traces. So that's kind of, there's quite big gaps in the number of measurements, which may be easy to define success. Another interesting to look at is the kind of distribution side-on. So this is a histogram of repeated attacks where the average rank after the attack was 2 to 16. So obviously that's good for the adversary. I mean, that caused nothing to enumerate. And you can see that the distribution looks kind of normal. You have a little bit of censorship going on on the left-hand side, because obviously you can't do better than rank zero or rank one depending on how you define it. But you can see there's, there is still a fair bit of variance going on. On average, 2 to 16 is really easy. A lot of the time I win perfectly, but sometimes I have to do more on 2 to the 40 work afterwards. And if you make these attacks slightly worse, you can see the distribution starts to spread a bit more. I think this is the most interesting one, where on average, the attacker has to do 2 to 64 enumeration work, which is a lot. I think the most we've ever done in our lab is 2 to the 50, and that took 700 calls for one day. But some of the time, I don't know what the percentages are, but maybe 10% of the time, it's actually quite easy, with even maybe 2 to 45. And, I don't know, 25, 30% of the time, the adversary is totally screwed. It has to do more than 2 to the 80. So this suggests, well, it seems pretty obvious that you perhaps shouldn't look at averages. And essentially, the best thing to do is to define a threshold and look at when your distribution tail crosses that threshold. And if I keep going, you can see how this normal distribution shape continues. Okay, so our proposal for how evaluations should be done now, if you care about rank, is essentially to collapse it to a statistics problem. It's the only kind of way to do things, which requires repeated sampling from the distribution for an attack. In terms of statistics, this large variance, which is, I guess actually I didn't mention this, kind of in the positions, the unfortunate positions in which you're most interested. So, as I'm guessing, as a manufacturer, you're probably going to be interested in when ranks are around 2 to 60, 2 to 70, maybe even 2 to the 80. And this is where the distribution has the most variance. So consequently, we suggest that nonparametric order statistics, such as percentiles, kind of make, let you make a lot more actionable conclusions, such as the one I've kind of said here. One thing I haven't discussed here is, but is in the paper, is how many repeat experiments you should do. And we kind of heuristically arrived at 30. Ideally more. This isn't great, because it does require these evaluators to gather a lot more measurements. You might be able to do something slightly clever with repeated, with reusing some of your measurements, but you also need to be careful. But unfortunately, this is currently the only option we have. So I'd just like to point out, we implemented this estimation algorithm and the improvements we described. And we also have an implementation of the enumeration step for an adversary, which is designed to run on a lot of cores. And that's available at that URL. It's written in C++. It's a header in your library, so you can integrate it fairly easily. It has tests, comments, examples. And it's free to use, so please do. And any questions about it, please talk to me. And so with that, I'll conclude. Thanks for listening. Thank you, Speaker. I'll have any questions or comments. In many cases, you can take both the first subkey and the last subkey by doing encryptions and decryptions inside channel attacks. And you get two different rankings. Now, assuming that they are related, the first and the last key, for example, there are permutations of bit positions. There's only difference between the first and the last key. Could you devise a better algorithm for doing ranking if you have a ranking over the key bits in one order and the ranking over the key bits in another order? You see what I mean? So essentially, the structure that you can exploit given the algorithm you're targeting. For example, in DES, the first subkey and the last subkey are just permutations of each other. Yeah, that would be interesting. But I have a dry idea. Any other questions? So follow up on the question. So I wonder to what extent your analysis would apply and recommendations would apply to offer ciphers? The implementation of the analysis was done for AES, right? So how do you think would apply to offer types of ciphers in a lightweight tone of ciphers? So off the top of my head, I think everything we tried was AES. Given existing non-rank-based research into other ciphers, I would not expect any difference. I would still expect this kind of shape just to shrunk, depending on the key size. And that would literally be all I would expect to see. Let's thank the speaker again. Let's move to the fourth talk, which is entitled...