 Hi, welcome to my talk. My name is Sika. Today I'm going to talk about how to construct complete leakage models and how does it apply to such a non-text and responsibly engineered simulators. This is my joint work with Elizabeth. I now work with Huawei, but this work was entirely done last year when I was working for the University of Clark and Philadelphia, funded by an ERC funding called SEW. I guess most of my audience today are already familiar with the concept of such-and-or analysis. Such-and-or analysis takes advantage of some information leakage, such as the timing, power consumption, etc., and potentially can recover the secret key within a really short time, at the cost of having some physical observations, presumably from the power traces on an oscilloscope. So let's take a moment and think about, well, as a realistic such-and-or attacker, what you have to do. So if we take the common correlation-based attack as an example here, as an attacker, I might assume, the first thing I need to assume is what is my target intermediate state. So usually we say, perhaps this is the first Xbox output. So we are attacking the first Xbox output, that is the X here. And then later on, I also have to assume what my leakage looks like. Perhaps I will say that my leakage, I assume my leakage looks like the Hemingway. So the leakage approximates the Hemingway of the Xbox output. So overall, my assumption about my leakage is the leakage should approximate the Hemingway of the Xbox output, plus perhaps with some additional Gaussian noise. So what happens later on is I got some observed leakage from their oscilloscope and I'm going to compare this with my assumed leakage with the Pearson correlation. So if the key guesses here is correct, I will get some non-zero correlation, which means, well, this is probably the correct key. Otherwise, I will get some close to zero correlation, which means this is probably not the correct key. So by partially enumerate the key guesses here, I can get which one is correct, which one is probably not. But bear in mind, what we really got on the oscilloscope, I don't really know what is this leakage. So perhaps if I got the secret key correctly, I know the Xbox output does contribute to the leakage I'm observing. But there's also a possibility that there is also something more on this leakage. So for example, if we take some sort of God view, we assume this leakage contains not only the Xbox output, but also the current Xbox output and the previous Xbox output transitions, the transition from the previous to the current Xbox output. This is actually quite common in software platforms. For example, if we take some AMP or learn the memory bus or the micro-architecture registers, it's quite common to produce a leakage like this. However, unless you really understand all the micro-architecture features, finding this or knowing this beforehand will be a big trouble. So this really means anything. Well, if we actually take the attacker's perspective, if you tell the attacker, come on, you are only taking advantage of part of the leakage, then this is probably not an issue because the attacker's final goal is recovering the secret key. So whenever I already got the secret key, as an attacker, my job is already done here. So finding out this term might take a lot of effort, and effort will in the end calculate into the attacker's effort to find the key, which might not be worthwhile at all. So for attacker, I don't really care if I can find this. But if we jump into the evaluator or certification labs, shoes, then the situation is completely different because in this case, we would like to verify our security countermeasure is secure in terms of this leakage. But if you assume take the attacker's assumptions, so this leakage is solely about the Xbox output, then your masking schemes might be secure under the leakage of the Xbox output, but not really secure with this transition term, which might be a big issue. So what we are trying to propose and clarify in this paper is when we talk about leakage models, we should actually consider both the intermediate states X and the leakage function L. So we have actually quite some experience with the leakage function L. So if you don't really know how the leakage looks like, you don't like any ways you can use profiling stage or you can use more advanced statistics like the general distinguisher. But if you miss something from the intermediate states, then that's the end of the story. So we're going to emphasis on that and propose a test to help you find whether you have found a set of states that contains all the relevant states for a certain sample on your power trace. And we are going to propose to use F test to verify if you have selected a set of intermediate states, whether it's complete with respect to the observation you are having now, or it's not complete. If it's not complete, it means for this specific sample on the trace, you are missing something. So with the complete power leakage model will help you with a few things. For example, for attacks, it might reveal some new unexpected attack vector or attack potentials, or for leakage simulators, it means you can find leaks that would otherwise miss by overly simplified models such as the handmade model. Okay, let's start our journey, see how we can find the complete set of intermediate states. So first of all, what we need to do is we need to construct a full model that captures all the data dependent leakage. And then with this full model, we're going to estimate this model from the real list trace. And taking from your assumption, perhaps we're saying from the attacker's assumption, we hope the leakage is solely about this export output. Then we are going to take this as our assumption and estimate a model with the realistic observations with this. And then the next step, we're going to compare these two models and figure out whether there's a significant difference between those. If there is a difference, then perhaps this model is not as good as this one. And my means you are actually missing some factors here. Or in other words, it's not complete. Otherwise, we say the model is complete with respect to the number of traces you are providing here. Okay. So first of all, how can we define a model that captures all the data dependent leakage? This my sounds mission impossible at the first glance, but if you really think about it, let's assume we are taking an unmasked AES. So assuming all the key is fixed now, then every single intermediate states or whatever it is, wherever it lies, it will always be a deterministic function of all the input plain text. So as a consequence, all the data dependent leakage will also be a deterministic function of all the input. That means if we can build this model with all the inputs, this will automatically capture all the possible leakage. But this is obviously not something we would like to work with because it will require much more than two to the 128 traces to work with, which is clearly invisible. So our trick to do with this is we're going to build, bound the input into a much smaller space say for AES 128, we says each input byte will collapse to only one bit random state, or each input byte will become all one or all zero. So with those tricks, what we are going to get in the end is the input space of AES 128 will become two to the 16, which is much easier to work with. And the second step involves how to compare two models. So we're going to compare the full model with the model we believe is correct, say the leakage is solely about the Xbox output. So fortunately, for this specific purpose, we have some well established technique in statistics called the F test for the analysis of variance. If we want to check if the later is good enough, if the later is missing something, then we can use the F test. If the F statistics is larger than some threshold, then we say the later model is missing some factors, you don't know with what factor it is, but it's missing something. Otherwise, we say it's complete up to the statistic power defined by your provided number of traces. Okay, so put it together to verify whether your assumption is correct or not. Then we can construct a full model and the assumed model based on your assumption. And then we can compare these two models in the F test. If the F statistics is larger than some threshold, we say your model is not complete. For the trivial example I mentioned just now, it will of course be rejected because your model misses this heavy distance term, which is clearly not ideal. Okay, so now let's move on to some slightly more complicated applications. First of all, how it works in a text. Before getting into any technical detail, the first thing I would like to mention is, although this is how it impacts on a text, it doesn't necessarily mean this is for the attacker. So if you think about this specific term, the xbox output transition term as an attacker, it actually takes quite some effort to find it. And as I said, all the effort well calculated into your effort to find the key, which means this might not be worthwhile. And also this term involves two relevant key bytes. So even if you know this is the term, you might not want to include this into your attack. And the last thing, what this work really can contribute is we can perhaps review some unexpected micro-architecture features, which might help us have a better understanding of your micro-architecture, your implementation of masking schemes, and later on help you to build better masking schemes or better implementations. But overall, this is basically a pre-attack analysis step, which is close to what we usually do in a perfect setting. Okay, so the target implementation we're going to test on is the Affine Masking Front NC. The Affine Masking will code any unmasked state x, byte state x, into this form with a multiplicative mask RM and the digital mask RA. So the xbox will, the masking xbox will be pre-computated and stored in a table. So the table looks like this. And with the input mask, RM and output mask are out. So the additional masks will be different from byte to byte, but the multiplicative mask, the input and output mask are shared within one encryption. So now we would like to verify, so we are going to take the choices for computing the first sbox and then our trivial assumption is computing the first sbox. You will only leak everything related to the first sbox. So if we are computing the first sbox lookup, then all the relevant terms for this will be x0. And according here, you might have RM, RA, 0, R in and R out. And we're going to test whether this model is good enough, or if we have to include everything. And we got everything here. If it's above the dashed line, that means you fail the test, you are missing something. And we can see the blue line here. This x0 only model is missing a lot of information that includes in your observed leakage. And why is that? Why computing x0 leaks much more than x0? So if you think about it, your Cortex-M3 is most likely a 32-bit core, your memory buses are most likely 32-bit. That means even if you are loading just a byte, what is possibly happening in your core is your memory bus is still loading a word, while your CPU is discarding all the unnecessary bytes. That means your leakage will still be work-wise instead of byte-wise. So if we add all the work-wise leakage into our consideration, then we got the sign that here, which tells you, okay, this is more or less complete. And let's further verify this is actually the case if we test all the concurrent four bytes within one word here. Although we'll only calculate the first xbox lookup, the xbox lookup for x0, simultaneously, we got all the concurrent four bytes here. We can see the leakage of all the four bytes here, which kind of verify what F just said. And does it really have the impact on attacks? Well, if you think about it, previously, if you were looking for one leakage about Xi and one leakage of Xj, you are going to find two different samples on your properties. But with this case, if you want Xi and Xj lying in exactly one word, you can just pick this point. This point automatically gave you both Xi and Xj, say X0 and X1. And this means for some attacks, some second order attacks, you might be able to go from bivariate to univariate. So more technical details about this can be found in our paper. I won't really go through all the technical details here. But one last thing I would like to mention, this brings us new insights, new understanding of the architecture and how the leakage might behave. But unless you were really forcing to corner, you got no other working attacks, then this is probably not really the most optimal attacks options. So again, the goal of this analysis is not really finding the most optimal attack, but rather have a better understanding of the architecture. Okay, now let's take a look at some more complicated application, leakage simulators. So usually when you are developing some masking code, you take the masking schemes and code it somewhere and then deploy your code on a certain device and you send your product device to certification centers. And they're going to say, this is okay, you can enter the market. Or they're going to say, well, this is not okay, you need to fix it. But the problem with this is, well, first of all, you probably won't really test yourself. You probably as a crypto engineer, you don't have access to the device or the oscilloscope. But second of all, this will be really late. So your development cycle might already be finished for several months. So your colleague, yourself might already forgot what this is about. And your colleague developing it might already perhaps leaving their jobs. So makes the whole process much more difficult than it sounds. A better way to do it is using this leakage simulator, which can provide some early feedback right after you have your code. So after you have your code, it will tell you whether this is okay, or this is not okay. But more than that, it will also tell you why exactly this is not okay, where you which instruction is problematic, where is problematic, and how you can patch the security threat much more, well, develop much more targeted security patches. And of course, within this workflow, the quality of the leakage simulator plays a vital role. If you capture perhaps 5% of the real leakage, then everything, all the issues will still be here, which means you didn't really ship much. But if you think about our complete test, it can actually verify whether a leakage simulator captures all the relevant states which can be taken as sort of a metric for the quality of a leakage simulator. We can try this with various existing tools, but unfortunately, not all existing tools can be applicable here, mainly because many of the tools actually lies in many of the tools lie in the arithmetic value, sorry, arithmetic level, which means they won't really correspond to any executable code. As a consequence, they won't really match with any realistically measured traces as well, so there's no way to actually compare them. In our case, our platform will be called Cortex M3, so we are only picking existing tools that can run in a binary level, so we can match them with the measurement, and we are only picking tools that can work with Cortex M3. In that case, we're picking the ELMO family and the MAPS, so ELMO family is actually built from profiling models from the real measurements, and MAPS is built from the Cortex M3 RTL code. The gadget we're going to test on is this very simple bitwise RSW multiplication code. You can see there are only like 10 instructions here, and the model we're going to use here is I will call a linear-accented model. The model actually includes a bit more powerful than both the ELMO model and MAPS model. It's a superset of both ELMO and MAPS model, but if we really verify it with the complete test, we find it almost failing every single cycle, so let me say in every single cycle, you are missing some leakage, which cannot be explained by ELMO or MAPS. We can recursively add in the missing factor, so this is basically a manual effort finding the missing factor and add it to the linear accented model, then we get this currently best model we can possibly get, but this model is, of course, much better than the linear accented model. You can see the straight lines here, but still we are missing a little bit here and there, and we don't really know what causes that. It turns out this inadequate not-complete leakage model actually contributes or affects the following leakage detection, so in real-listed life in a real M3, we find this piece of code did work as five cycles of leakage, so you got all those five cycles being leakage, and in ELMO you only identify two of them. They are true positives, but you only identify two, and with the best model we got, we can basically recreate these five-six cycles, but still we don't really know if you find some other garbage, this might still be a problem someday. Okay, so the last thing I would like to mention is the ethical consideration. So both ELMO and MSTAT, the ELMO family, belongs to a class of simulators called proportional simulators. Their goal is the simulator output will be as close as to the realistic measurement, so the good thing about this is it's actually good for attack estimations, and it's also good for if you want to do energy consumption estimation, but the dark side of this is, well we always believe the profiling attacks are perhaps the most powerful attack a set channel attacker can do, and the only restriction of that is for profiling attacks you have to have this identical device where you can show everything, including the masks, so you can build your templates, you can train on it without any limitation, so LUTs itself, getting the access of LUT devices is the restriction of a profiled attacker, but with ELMO you sort of get this for free, so the attacker can just take ELMO and using the templates built from ELMO as free templates, they can do profiling attacks without actually having access to a profiling device. With the modeling we are proposing here, well we basically found a complete state, but we're not going to estimate what is the leakage function L. The good thing about this is there's no way you can use it as free templates and for what most of our go like leakage detections, it's okay because leakage detections, whether you are doing chi-square or t-test, they're both qualitative only, so our work and so you don't really care the value of L, but if you're looking for attack estimation, obviously you need the value of L because you're looking for some numerical meaning, so that concludes my talk. If you have any questions, please ask me during the live session. Thank you.