 Okay, sure. Okay, thanks again. So this is my second talk. In the previous talk, I talked about how you can construct complete leakage model and how does it, well, how does it applies to several leakage simulators while how they are not. So I do, you know, leakage testing. So in this talk, I'm going to talk about how you can build a better leakage simulator so reverse engineering micro-architecture features. So let's do micro-work with Elizabeth, but this time I'll also get our previous colleague from Bristol, Denmark, who will guide us through all the micro-architecture mysteries. Okay, so I already talked about this in my last talk. This is what you do in your deployment, masking scheme deployment. And it's beneficial to have this leakage simulator here. It's early feedback and it tells you why it's leaking. So there are two routes here. One is going through in the existing simulators. There are two routes. One is going with the gray box route and as a representative, the ELMO family will target on the Cortex-M0. It will always build on this instruction simulator called Thumbulator and it will train its leakage model from the profiling trace. The traces last actually measured from the Cortex-M0 we're using. The leakage model will actually focus on the ALU. The target for is actually from STM32-F0. There are actually a few existing extensions. There are extensions on the memory bus. There are also extensions on the extending to none ST or extending to another M0 for manufactured by NXP and there are also extension sending this to Cortex-M3. Or you can also choose the white box route as a representative maps takes the RTL code from our system licensed through an academic lessons. In this case, you actually get to see all the micro-architecture features. You know what's happening in your micro-architecture and you don't really need any measurement because they also decided to take the leakage as the timing distance on the register. So you don't really need any measurements here. And a bit recap from my last talk, I didn't really talk about this in detail. But yeah, in general, if we take the subset of both maps and almost leakage model and verify in our completeness test you may notice that almost every single cycle we fail the test means almost every single cycle there is something missing in your leakage model. And why is that? Mainly because our leakage model are always relatively simple. So if you talk about Elmo, Elmo only focus on the ALU versus the courthouse M0 is actually a three-stage pipeline. You are actually focusing on only one of the pipeline stage, the skill stage. And moreover, the almost leakage model are actually built on the two adjacent lines here, the ALU input buses. Both of them actually lies in the micro-architecture. So for example, if we talk about this ad and structuring here, do we actually know which one R0 and R1, which one goes through bus A and which one goes through bus B? We don't because they lies in the micro-architecture. So the Elmo's model actually represents the outsourced gas. And for maps, the situation is different. Maps get the RKL source code from ARM, but the question will be whether this is the same as the product on the market or whether the manufacturer get this from ARM whether they will do their own revision or not. The other issue is the maps paper already stated, they also already stated, they only take care of the register transition leakage, basically all the red circles here. So if your leakage is actually coming from the ALU or all the max slides here, they are not necessarily covered by maps, okay? So this is actually another piece of ISW bit-wise multiplication, not the same as my last talk, but we are gonna evaluate all the Elmo family maps and realistic test. So what we are observing here is with this realistic trace on protectant three, we see two cycles being leaky. One is cycle nine and one is cycle 15. And in Elmo, you miss both of them. In Elmo star and extension of Elmo, you not only miss both of them, you also produce a false positive. And for maps, you find cycle 15, but you miss cycle nine. So as I said, you are missing these mainly because your leakage model are already simplified and there are a lot of micro-architecture existing in your circuit, but not in your model. So this actually motivates for reverse engineering the micro-architecture features from your leakage and adding that to your leakage modeling to create a much more micro-architecture enhanced leakage simulator, okay? So our starting point is always something public from ARM. So protectant three is always specified as a three-stage pipeline for fetch decode exclude. So that's the three stage. And the only thing interesting here in the graph is in the decode stage, actually said there is this register red here, which means you not only do instruction decoding in the decode stage, you also pre-fetch your operand in this stage. And perhaps you do need some pipeline register here to temporarily store your results. And you also need perhaps two reading ports in your register fail, okay? So fetch stage fetch instruction from your instruction memory to your instruction registers. So everything is written by PC, everything is clear. There's no ambiguity. And we also know most of them are not at, sorry, all of them are not data dependent. So we can completely ignore it. And for the code stage, we decode the instruction and create all the control signal. Here everything before the register fail are not really data dependent. They are perhaps instruction dependent, but that's not what we are looking for in our leakage analysis. So everything after the register fail, we do care about it. Then we do care about for each instruction, since there are multiple reading ports here for each instruction which operand enters which reading ports. And we're gonna test it with some customized code here with this code, I sent A and B through this XOR and then with the target instruction, C and D with this instruction. And what I'm testing here is whether I can observe an interaction between A and C. If so, A and C share the same reading port. Otherwise, maybe I can observe B and C. So briefly about the results here. So for two operands instructions add or multiple locations here, you always see A, C, the blue line and B, D, the red line. For one instruction, sorry, for one operand instruction, you only see A, C, but no B, D. And for maybe this one, these three register additions, you see all the three operands, but you only see B, D here. So we assume A goes to E and C goes to the third port and this might be wrong, but this might be due to glitches, but this is the best we can get. And there are also other instructions that don't really load anything. Okay, so from the code to load, we already know what's happening here, but we do, we also want to know whether which operand enters RS1 and which one enters RS2 and also whether they will be updated or not because they are registered, they don't really have to accept the new value in each cycle. And here I will skip all the technical details, they're already giving you this table. Here we present how each operand goes through which port and each operand goes through which register if it's a slash here, it means it will not be updated. Okay, so for the memory part, it's a bit of mess. It's often ignored by most existing tools for a good reason because the memory part actually lies a bit far away from the core itself. So to make it worse, the memory part is your really self-time which means it has its own timing. So if you're asking a memory to fetch you something 10 times, they will have different timings because they can say, please wait for me. But in this case, we cannot really align our listed trace with the instruction you are executing now. So there's no way you can do this complete test anymore. So we have to go back to what previous existing tools are using. So relying on existing knowledge, we assume everything is work-wise then we follow a few specifications from ARM. So shared data bus, address bus and write buffer. This is of course not ideal. Okay, now we know what's happening in each instruction in each micro-architecture wires, then let's try to do liquid modeling. So the general idea of this is for each wire or register here, so like this, so previous states is A, now we flip to A prime, not the new states is A. We often assume they take the Hamming-Waiting distance leakage. Here we do a bit more conservative. We assume A prime and A are jointly leaking. If you have a combined tutorial logic like here, then we assume this can be affected by glitches. So we assume both inputs will be taken into consideration. So A, A prime, B, P prime or jointly leaking. Okay, so fetch, as I said, not data dependent, ignore it. Decode, we only care about B.5 to B.7, others are not data dependent. So we know what's happening on those wires or on those buses. So we just following the bus leaves, the previous value times the current value of the current leaking. And for the excuse stage, we have this ALU, this is the binotorial. So we assume it's leaking the previous value in this register times the current value of those registers. Memory, those are buses or registers. So apply our rules accordingly and then overall adding them all together. Okay, so adding them all together, we have our overall leakage model and then we're gonna test it, the quality of the model without completing its test. So within this all six instructions, we find most of them seem to be okay. There's one of them. You see something above this dash line, which means you are still missing something. So I will directly tell you this, this is what I call the glitchy register test. It shouldn't really access this register, but there's some glitches in your decoding stage. If you're adding labs into your consideration, then this will be below this record. Okay, let's go back to our original example in the beginning. So our reverse engineering information will help to explain what is leaking here and why isn't captured or not captured in ELMO or MAPS. So in cycle nine, we say this is the ALU output bus timing distance. This is not presented in ELMO, ELMO takes ALU input and not in MAPS because this is not a register. Cycle 15, this is a pipeline register. So MAPS got it, ELMO didn't get it because M get it wrong. Okay, so let's briefly summarize our achievement here. We have successfully, leakage-wise reverse engineer the micro architecture of our target M3 core. This is of course a leakage-wise reverse engineering. This is not even close to the binary code level. And we didn't really reproduce the M3 core like it'd be running on any device. We are building a micro architecture intense leakage model. And we have shown this impact on various masking implementations. So I only present you one of the implementations here. If you're interested, please read our paper. And for filter works, as I mentioned, we don't really have a cycle upgrade by memory emulator, which can be a big of a problem for memory simulation. And we are using our own information to explore more subtle micro architecture leaks. This is an ongoing project for one of the PhD. And I've done some higher-order testing. Everything I've done in this talk is first-order. I've done some higher-order testing, but it's far from true. And we are also working on some flexible frameworks that can work for other architectures. Well, for example, with five. And last but not least, the leakage model presenting here can also be used for more verification. So if you're interested in that, then that's also a future study direction. That's the end of this talk. And if you have any questions, I would be happy to answer. Thank you. Any questions to see on this or the previous talk? No questions online? You have a question or not? Okay, so no questions. Thank you very much for the presentations. And before we leave, there is some ring which was found in the women's bathroom. If anybody has lost it, I will probably leave it in the reception. So if you hear anybody lost it. Okay, thank you very much. This is the closing of the session. And I will continue after lunch.