 My name is Alessor Nioti, I'm a security researcher at IBM Research and today I'll present some joint work with EPFL and Northeastern University on speculative execution attacks. Before I give you an outline of the talk, let's introduce first the main character of the presentation, which is speculative execution attacks. The starting point for these family of attacks is when the CPU pipeline encounters a branch instruction for which the target is not immediately known. At this point, the pipeline has two options. It could either wait for the target to become known, but that would be bad from a performance standpoint because it would have to stall the pipeline. So what happens instead is that the CPU makes an educated guess on the target and this guess is aided by a set of micro-architectural predictor units. So basically execution restarts immediately at the predicted target and any side effect from the speculated instructions become visible. So for instance in yellow you have the side effect caused by load instructions. More interestingly still is the case of what happens when the prediction turns out to be wrong. So in this case the CPU has to undo all the side effects of all of the instructions that have to be rolled back and from a programming perspective, from a program correctness perspective this rollback is perfect. Otherwise you'd have a CPU bug. But from a micro-architectural standpoint however, there are certain traces that the CPU does not erase. For instance the data that was brought in the cache will remain in the cache after the rollback. The CPU does not go ahead and undo that. So the central point of speculative execution attacks is that if the cached information brought in as part of transient execution is some secret of a victim process, an attacker has the chance later to extract it and read this secret. So with this in mind we can take a look at the breakdown of most speculative execution attacks. There's a first phase during which the attacker trains the victim. The training has the purpose of maximizing the success rate and ensuring that the victim leaks interesting data, so secrets and the likes. And in the second phase the attacker causes the control flow of the victim to be hijacked speculatively to an interesting location. It has to, something interesting for the attacker must take place. And in particular typically you'll have the send operation of a side channel that extracts data from the victim and sends it to the attacker so that in step four later the attacker can execute the receive end of the side channel to retrieve this secret data. So now that we have introduced the main character I can walk you through our results. This presentation in fact due to time constraints is going to be just a teaser of the papers that I referenced below so if you're interested go ahead and check them out. So at first we'll discuss some methodology and tooling. It might sound boring but it's actually a very interesting question. How can we find out exactly what the CPU is doing while speculating? When by definition the CPU will undo most of the things whenever it has to roll back. We come up with a new methodology and tooling and method and tool that we describe are instrumental for the subsequent findings. At first in particular we describe new speculative execution triggers that is new ways in which an attacker can influence the speculative control flow of a victim process and get it to speculate interesting sequences of instructions. And finally we present new side channels that is new ways in which the attacker can extract interesting information from the victim. So good let's start with the methodology and tooling. The question here is how can we study this type of attacks? Take the case of memory corruption bugs and related vulnerabilities. There's really a ton of tools like PICGDB you feed it with a crash that is related to memory corruption and then you take it from there. You figure out what's going on if you can exploit it and so forth. Unsurprisingly there is no such thing for speculative execution attacks there's no GDB. In fact there's not even an efficient way to observe what the CPU does since all you're supposed to see at the architectural level by definition is what the programmer wrote in the code source. So one obvious way to observe this would be mount the full attack and then use the side channel that I described earlier to see whether the attack was successful or not how often and it's successful and so forth. This of course leads correct results but it's very slow and it's very noisy so it's great from an attack perspective because it works but it's not great to develop the attack. And so what we propose is slightly different is to use performance counters. Performance counters are available on most modern CPUs to count or to get timing information for certain micro architectural events and they're designed and available for developers to profile and optimize any code snippet that executes. And what we found out is that if used correctly or rather I should say misused correctly because they're designed for something else they actually represent a great way, a much faster and noise free way of detecting speculative execution. The way in which we do this is by using what we call speculative execution markers. So a speculative execution marker is an instruction or a sequence of instruction that is detectable by performance counters even when the instruction does not retire. So here's the central idea. We can introduce these markers in a snippet that we want to analyze and by so we execute the snippet with the appropriate markers inserted and then we read performance counters after execution and we can tell things like which target was speculated, how deep did speculation reach before it was stopped, how often a branch instruction was mispredicted and how efficient a training phase was and so on and so forth. So very powerful and useful information. So around this methodology we've built a tool which we call speculator. We've released it in open source, it's on GitHub, go ahead check it out. There's plenty of documentation so I won't go too much into the detail but the big picture is that you supply it with a piece of assembly or C code and the tool will basically add the necessary instrumentation and execute it however many times it's necessary and then produce results that give all this information back to the developer. Good, now let's put speculator to good use. At first I'll show you new speculative execution triggers. That is basically new ways in which an attacker can influence the speculative control flow to get the victim to execute interesting gadgets. So traditionally the speculative control flow is hijacked by an attacker when the attacker meddles with some micro-architectural components, these predictors that I mentioned earlier. For example the attacker can hijack a backward edge, so a return, by tampering with a return stack buffer, that's what takes place in Spectre RSV or it can hijack a forward edge by tampering with a BTB, that's what takes place in Spectre V2. The new class of triggers that we present instead is leveraging overrides of metadata that influences the control flow in an architectural level. So I'm talking about metadata such as the return address that is saved on the stack for backward edges or a function pointer for a forward edge. And such metadata can be overwritten both architecturally and speculatively and this creates basically four variants that we describe. So let's look at some examples. The case of architectural overrides of a backward edge happens when you have a classic buffer overflow and you have a program that is compiled with stack-smashing protection. So by definition the buffer overflow will not be exploitable in the classical memory safety sense, however speculatively let's see what happens. Basically you can see the SSP epilogue from LLVM in the slide, there's basically a conditional branch that will lead the program to an orderly crash if an overflow took place and was detected or just otherwise the classic return. Basically however the conditional branch will be trained to assume that no overflow takes place if the application has been executed already a few times and so whenever there is one again speculatively the victim will bypass the SSP protection and it will just resume execution at the malicious return address and if there's a gadget there the attacker has a chance to extract interesting data. Let's look at another case. Here we have a speculative overwrite of a forward edge and the example that we looked into is the snippet of a go code consisting of a store into an array followed closely by an interface call. So go as we know is a memory of safe language and so it will introduce a bounce check on that array operation. However even in presence of an out of bounds index the bounce check can be trained to be speculatively bypassed basically to assume that the index is in bounds and this way the out of bounds store will take place speculatively and so what the attacker gets here is a write what where primitive that can be used to transiently overwrite for instance in this case the function pointer for the following interface call. And then again you have this speculative control flow hijack. Now we have a few ways in which the attacker can control the speculative execution. Let's see what we can do with this. Let's see how we can use this to actually leak some secrets. So traditionally specter and related vulnerabilities adopt user gadget like the one in the slide so basically a double array dereference. However this is executed speculatively if the attacker controls X and X could be out of bounds this gadget will leak in the cache under certain assumptions some bits about array one of X. This is great but the problem with this gadget is that it turns out it's relatively infrequent in progress. In most POCs in the literature it's assumed that the attacker can inject this or a similar gadget either through EVPF or JavaScript or what not. So we investigated a very different side channel here one that is based on port contention so let me introduce the concept first. So execution ports are used by the CPU to schedule instructions to execution units. Different instructions use specific ports for instance here you have shift left using port zero and pop count using port one. Here's the idea. The attacker takes timing measurements when it executes pop count instructions in a tight loop and the idea is that if the timing measurements are slow based on some profiling that the attacker can perform beforehand if execution is slow it means that the attacker is picking up on port contention caused by a co-located victim that is say also executing pop counts. Otherwise if execution is fast it means that the victim is executing something else so for instance shift left on a different port. So that is our side channel through which we can leak information from a victim and we call this side channel smother from SMT the Intel's hyper threading technology. So how can we use smother in the context of speculative execution attacks as a spectre side channel. The solution is relatively simple so the idea is we find in the victim code a conditional branch that depends on a secret and the target and the fall through of this conditional branch should have and should generate measurably different contention on some execution port. So again in the port zero port one example that we have before you can either time contention for both victim and attacker executing pop counts or no contention and so fast timing on the attacker side if one is executing shift left and the other pop counts. So this is the full smother spectre side channel in operation. You have a victim that has a secret. The speculative at first the speculative control flow needs to be hijacked using whichever way in the literature is suitable here. This hijack leads the victim to execute the conditional branch that I discussed before depending on the secret and based on the secret different port contention is generated and the attacker on its side it's just in a loop gathering timing information. As we discussed before the timing leads to two nicely disjoint probability distributions of how long it takes for the timing sequence to be executed say if the secret is one it takes longer than if the secret it's zero. And the information from the distribution can be used to determine a timing threshold from which we can finally extract the value of the secret. So for instance if we get a timing of 101 clock cycles we can assume that the secret was one. If we get 82 we can assume the secret was zero and so on and so forth. And so that is how we extract secrets from the victim. Thanks very much and enjoy the rest of the conference.