 Hi there, I'm Dan Page and this is a presentation for our chair's 2020 paper that's titled Fennell, an ISE to Mitigate Analog Micro-Architectural Leakage. This is joint work with Seagal, Ben Marshall and Thin Fam and at the time when we wrote the paper we were all at the University of Bristol. So given the time constraints and so on I'm going to try and focus the presentation at a fairly high level and on the concepts really rather than the technical detail, which of course you can find in the paper itself. I hope that makes the best use of the time. So as the title suggests we're interested in information leakage, which is basically the topic that underpins things like side-channel attacks. We're not so interested in the attacks themselves but rather a focus on the underlying source of information leakage and in mitigating that or preventing it in some way. I think it's fair to say that this is a well-established topic by now and one that's probably familiar to the majority of people. The basic idea is that if you have some computation being performed by a target device then as an attacker if you monitor this computation going on the information that you can collect goes beyond the by design API for that computation, so beyond the input and output that's defined for example. By monitoring the computation you might for example be able to capture the execution time, the power consumption or the electromagnetic emanation during the computation occurring. The problem of course is that if the computation itself is security critical in any way or involves security critical information at least then potentially as an attacker we gain information about that security critical information whereas we shouldn't do. So the example on the slide here is that the computation is an AES encryption that involves some key material. By monitoring for example the power consumption of our target device we might learn something about the key material K whereas the definition of the computation on the API says that we should never do so. Information in leakage itself can be characterised in all sorts of different ways it depends on the form of leakage for instance but at the bottom of the slide here I've tried to write down some ways by which that characterisation might be performed. So for example the leakage itself might be presented in a scalar or a vector quantity, it might be discrete or analogue in nature, you might have to collect it via local or remote means or in a standalone way or it might require some additional equipment. Collecting the power consumption of our target device for example would require for example an oscilloscope. So we're particularly interested in analogue information leakage that puts us towards the right hand side of my diagram into forms of leakage that include power consumption and electromagnetic emanation. The question of course is really from our perspective if we want to prevent this leakage from occurring we have to understand what the source of the leakage is in the first place. The answer to that question is problematic insofar as it's fair to describe information leakage up to a point as stemming from more or less everywhere. To illustrate that fact this diagram shows the typical layers of abstraction you might find in a computer system. This is something you might show to a first-year computer architecture student for example. Towards the top of the diagram it's fair to say that information leakage stems at least in part from the choice of algorithms we're making in the first place. Reading from top to bottom then it also stems from the software we use to implement those algorithms and towards the bottom of the diagram from the hardware platforms that are used to execute that software. From the perspective of this paper the important layer within the diagram is in the center the unlabeled instruction set architecture or ISA. The ISA is a fundamentally important computer systems interface that separates hardware from software. It includes a definition of the hardware resources that software can make you soft and also a set of instructions that the software has to actually be comprised of. One way to view the ISA is as some form of contract. So for example what it's saying is that as long as the software is written against the interface correctly then the hardware guarantees that that software will be executed in some well-defined way. As an interface however the ISA should be opaque in the sense that although it includes information about instruction execution semantics so the meaning of instructions it doesn't include any information about how those semantics are actually realized concretely. This decoupling of hardware and software is so important that the paper referenced on the slide here by Dunham and Beard really makes the argument that it's a requirement rather than just a nice property to have. The reason they make this argument is be basically because it allows diversity in underlying micro architecture. By having this opaque interface we're able to implement different micro architectures to satisfy different market pressures for instance. We might have one micro architecture that's for a performance oriented market and a different micro architecture that's for an embedded market but they can execute the same software because they both comply with the same ISA. Although this is a compelling argument up to a point when you look at concrete instances you can find examples where the ISA is less opaque than you might first imagine. So for example if you look at concrete instances concepts such as branch or memory access delay slots or fences already mean that there's some exposure of the underlying micro architectural implementation to the software that's being executed by that micro architecture. Once you go down this route you might wonder whether or not making exactly the opposite argument could also make some sense that is making the ISA less opaque and more transparent still. Certainly in some situations it does make perfect sense to do so and one example would be where you're trying to defend against or mitigate so-called micro architectural side channel attacks. These sorts of attacks are now 20 years or so old but in modern times think about examples such as meltdown inspector. Guy Yaron and Heizer argue that in order to produce robust mitigations against these sorts of side channel attack one needs to have a detailed knowledge of and control over the underlying micro architectural implementation and so they present a concept called the AISA or augmented ISA which selectively exposes features in the underlying micro architecture to the software that's executing on it. On one hand this means that the ISA is now less opaque or more transparent than it would have been before and from a traditional computer architecture point of view this is less than ideal but they would probably argue that only by doing so are you able to produce robust secure software and therefore the disadvantages are outweighed by the advantages that you then get. We're going to take an approach that follows the same sort of argument and therefore the same sort of concept as the AISA but pitched in the context of analog micro architectural leakage. Following on from that then I want to give you some more detail and an example of the specific type of problem that we're trying to solve here. So think about trying to mitigate information leakage in a general setting. We're lucky in that sort of setting in the sense that we have a number of techniques available to us that are increasingly well understood. One example of which would be masking. In order to apply a given masking scheme to a given algorithm the first thing we need to do is change the representation of variables within that algorithm. If you consider a simple first-order Boolean masking scheme the idea would be to take each variable x and split it into two shares x0 and x1. Whereas with the original algorithm where an attacker might be able to recover x directly they are now tasked with recovering both x0 and x1 and then recombining them in order to recover the underlying x. They can't recover anything about x with knowledge of only x0 or x1 alone. Of course we have to make a corresponding change to the computation involved so that it can be applied to variables in this shared representation. The example on the slide here shows a secure AND operation and it illustrates both a disadvantage and an advantage. The disadvantage is the computational overhead involved because what was once a single operation is now up to eight operations in the secure alternative. The advantage however that we get from this is that we can reason in a fairly robust way about the security of our secure alternative. For example we can reason about the non-interaction of the two shares of x, x0 and x1. The fact they don't interact with each other at all within the implementation of our secure AND means that no information should be leaked about the underlying x. Within specific security models and based on specific security assumptions then we can actually make security proofs about our new masked algorithm that would be impossible otherwise. Clearly this is a big advantage. The problem or challenge then is that a gap exists between that theory and what we see in practice or to put it a different way as soon as we take our masked algorithm and we implement it concretely in software and then execute it on some concrete hardware platform some of the assumptions that we made originally and would rely on in terms of our security proof maybe don't pan out. What I want to show you is an example that was constructed by LaCaul, Groshaddle and Dino in their paper that related to the execution of masked implementations on an ARM Cortex-M3 core. This is a processor core that implements the ARM V7M ISA and it does so using a micro architecture that has a three stage pipeline. So this is a block diagram of an ARM Cortex-M3 core which we're going to use as an example. On the left hand side you can see a set of general purpose registers that we're going to fill with some symbolic variables one of which X is security critical and so we've used a shared representation so we have X0 and X1 in registers 3 respectively. On the right hand side we have a data path that you might expect to see within the micro architecture albeit in a fairly simplistic form. What we're going to do is feed instructions into the pipeline and try and reason about how they're executed step by step micro architecture. So the first instruction is an ARM instruction this is fetched from memory and it and together the registers 4 and 2 and store the result back into register 6. The instruction progresses along the pipeline through the decode stage and finally into the execute stage and this is where the first problem really starts to crop up. The problem really stems from the fact that the semantics of an ARM instruction are such that a battle shifter will be applied to the second operand before the ARM operation itself is applied. That second operand in this case is X0 the 0th share of our security critical X. Now the way that a typical battle shifter will be implemented in hardware means that it's plausible we see some interaction between the i-th and the j-th bits of that second operand so the i-th and j-th bits of X0 in this case and that has three implications. The first is that we're going to observe some information leakage and that leakage is going to be some function of the i-th and j-th bits of X0. This is bad of course because X0 is security critical and so we might have expected our masking scheme to have prevented that interaction and therefore the associated information leakage. The second is that one could argue this leakage is an artifact of our micro-architectural design decisions and not our software implementation. For example it might be plausible to reorganize our data path and thereby prevent the information leakage in the first place. The software would remain the same but the leakage may or may not exist in either case. Finally we've got an example here of the gap between theory and practice because it's unlikely a feature like this would have been modeled in our security proof so now we've got some information that's potentially useful to an attacker and therefore invalidates our security proof in some sense. The problems don't end there though because the effect of pipelining is such that we've already fetched and decoded a second or instruction that's going to all together the contents of registers five and three and store the result in register seven. Notice that at the moment the pipeline register rb has the contents x0, the zero-th share of our security critical x. When we advance the pipeline the result will be that we overwrite the contents of pipeline register rb with some new value and that new value is going to be the second operand of our or instruction. In this case that second operand is x1 the first share of our security critical x and the overwriting operation has the effect of causing information leakage that's the hamming distance between x0 and x1. As before this information leakage is really an artifact of our micro architectural design. The fact that we have a pipeline micro architecture necessitates this pipeline register. As before this is catastrophic from a security perspective because now the direct interaction between x0 and x1 means that the attacker likely recovers x in a fairly direct way in fact. Keep in mind that it's difficult for software to mitigate this problem bearing in mind that it stems from a micro architectural design decision. So the argument is we need a different approach. The approach that we've investigated is called fennel and the easiest way to explain what fennel is and what it does is by analogy I think. So consider the sequence of instructions shown on the slide here. We're going to divide that sequence into two halves a green half on the left hand side which are those instructions before instruction i and a blue half on the right hand side which are those instructions after instruction i. Instruction i is a fence or a barrier instruction and if you take the most general definition possible the idea of an instruction of this type is that it controls interaction between the left hand side and the right hand side or the green half and the blue half of our instruction sequence. Instructions of this type can be identified within existing ICES. For example they're commonly used to control memory access in some way. For instance you might want to ensure that any memory access that exists before the fence instruction has completed execution before any memory access after the fence instruction starts execution thereby synchronizing or ensuring some form of consistency with respect to the memory content. Phrased in this way then fennel is basically a fence for leakage. What we want to do is ensure that there's no interaction between instructions before the fence with those after the fence in terms of their information leakage properties. We can use fences of this type in order to solve the problems that we saw previously. The way that we realize this concept concretely is to follow the recommendation of the AICER by selectively exposing resources in the micro architecture to the software that's executing on it. More specifically we add three elements to a baseline ICER. The first element is a configuration register, the i-th bit within which maps to a micro architectural resource or logical grouping thereof. So for example this could be an individual pipeline register or a group of pipeline registers within a particular stage for example. The second element are some access instructions that allow the transfer of data between the configuration register and general purpose registers. For example to set the value of the configuration register in the first place. The final element is the fence instruction itself whose semantics are such that when an instance of the fence instruction reaches execution stage j each i-th resource that exists or is used in stage j is flushed if and only if the i-th bit of the configuration register is equal to one. So basically when the fence instruction reaches execution stage j the configuration register decides whether or not a particular micro architectural resource is flushed or not. Notice that we're a little particular about our terminology here. For example we use execution stage carefully because the meaning of that might depend on the micro architecture itself. For example between a pipelined and non-pipelined micro architecture. Likewise we use the term flushed rather than reset because reset has a particular meaning within digital logic often. It means set that value to zero. We might prefer for instance to set the value of that micro architectural resource to some random value rather than zero. Let's re-examine our motivating example where the hypothetical processor core equipped with an implementation of fennel. You can see that the block diagram for our processor core is the same as before except for two details. On the left hand side we've included the fennel configuration register and on the right hand side in our data path we now have an additional input to each one of the multiplexers involved. This additional input labeled row is going to be the value with which we flush each one of the pipeline registers r a and r b in this case. If we start by initializing our general purpose of registers as we did before we can then move on to fetch decode and execute instructions. The first instruction that's fetched decoded and then executed is a write into the fennel configuration register. This is meant to model setting the bit within the configuration register associated with pipeline register r b. The next instruction to be executed is the and instruction and versus the previous example there's no real difference here. The end result is that we write the value x0 into the pipeline register r b. However you can see that between the and and the or instruction we've placed a fence instruction. This has been placed between the and and the or instruction intentionally in order to disallow interaction between those instructions with respect to their information leakage. You can see that having set the fennel configuration register appropriately when the fence instruction reaches the execution stage pipeline register r b is flushed using the value row. So the previous value of the pipeline register was x0 the new value is row and so we observe information leakage that's basically the hamming distance between x0 and row. If the flush semantics that we use mean that the value of row is random then the attack gains no information from this. Crucially when we execute the or instruction basically the same argument applies. The previous value of the pipeline register r b is row the new value is x1 so we observe some information leakage that's basically the hamming distance between row and x1. Again the attacker learns no information from this whereas previously they would have learnt the hamming distance between x0 and x1 which we argued was catastrophic from a security point of view. So although this is a very specific and somewhat contrived example what you can see is that we've used this fence instruction when properly configured to control the interaction between execution of instructions with respect to their information leakage or specifically we've prevented information leakage that would have been evident otherwise. We developed a prototype implementation of fennel in two different risk five compliant micro architectures and explored its use in a range of different software workloads you can find details of this within the paper. Overall I think it's fair to say that our results are positive in the sense that fennel as a concept is relatively general purpose we're able to apply it in different micro architectures with different sets of micro architectural resource for instance. Likewise the overhead both in terms of area in hardware and execution latency in software were relatively low. We were able to use fennel in various different ways for example to localize or find leakage within an implementation or to reduce or control leakage and so amplify the quality of existing countermeasures like masking. Having said that I think it's sensible to view what we've done as a first step rather than a complete solution because there are plenty of important or interesting next steps that we could take. One example is the study of more complex micro architectural designs and micro architectural resources therefore. Focusing the paper is on micro controller class cores but when you consider for example out of order cores various questions naturally arise. Likewise we'd like to investigate how to automate or at least semi-automate the placement of fence and configuration instructions in order to reduce the manual effort involved. Okay so that's it obviously I'd like to encourage you to read the paper for the full technical details but hopefully this presentation was useful to get across the motivation and the concepts involved. I look forward to answering any questions that you've got at the live session at Chess 2020 but if you can't attend drop one of us an email using the addresses on the first slide.