 Okay, guys, we're gonna get started so our last talk for today is exploiting my modern micro architectures by John masters, so please help me welcome him with a rock warm round of applause Well, thank you very much. We'll we'll see if you're still clapping at the end So my name is John masters. I'm a computer architect with red hat some of you know me from the Linux kernel podcast and From some work I do with the arm architecture, but I'm actually here today to talk about something completely different Put your hand up if you have never heard the words meltdown or specter Okay, so so I'm here today to talk a bit about a little project that we were running inside red hat So I was running a mitigation team for the past few months dealing with This event and a bunch of us didn't really get a holiday as a result And so over that time we learned a lot about these particular exploits And also had a time to reflect on these new classes of attack and what they mean And so the purpose today is to both introduce these Two exploits that you've heard in the news or the three variations of them But also to walk you through some concepts in computer architecture that they exploit Have some time to think about where we're going as an industry and then we'll take questions toward the end Now I did a preview version of this talk a few days ago at Stanford Over in California, and it took two and a half hours to get through the slides now The good news is that I have only 45 minutes here, and I've added five slides since so What you're going to see is a various points me skipping over Slides that contain explanation For diagrams that I've showed you right the idea is that if you want the long-form version In particular if English is not your first language, or this is the first time you've seen a concept You can go and read the long-form explanation And you can download the slides right now if you follow at John Masters that Twitter handle right there You'll see a link and we're going to do a load stress test on people dot red hat calm We're going to see how well it stands Everyone trying to download these slides at the same time So feel free to pull them down and look ahead and speculate About what's coming in the future Thank you Okay, so and I would like just to ask a couple of quick questions So hands up if you have studied or looked at computer architecture in the past Okay, good. Good. So the first half of this is going to go really quickly Hands up if you have read the paper or papers relating to the exploits. I'm talking about today It's a good show of hands hands up if you think you understood it Okay, that's still a good set of hands there and hands up if this is all fairly new to you And you would really benefit from me walking through it in some level of detail Okay, good. So about 50% of you think you know everything I'm going to say and 50% of you I'm not sure so that's that's a good mix So today today's class is going to cover. I did take this from the deck I showed at Stanford so it was a lecture at that time Today I'm going to cover the difference between architecture and micro architecture Some some variations of micro architecture. So what that means in the real world I'm going to talk about caches and virtual memory and branch predictors, which are some of the pieces that are You need to exploit to make these kinds of attacks successful I'll talk about side-channel analysis, which is an interesting topic in itself And then I'll look at the actual vulnerabilities that you've seen The mitigations we have for them and then finally some related research In into hardware exploits that maybe you have not seen before so Architecture versus micro architecture for those of you who are at the the risk five Sessions you probably saw similar content already But when we talk about computers we have this concept of architecture and then architecture describes at an abstract level How a machine operates and behaves so it describes the kinds of primitive instruction That the machine actually executes the ones and zeros it executes So when I see this particular sequence do this right? describes How to load and store values from memory it describes the registers the machine state that I have And it describes various modes of operation privileged and unprivileged modes of operation We'll talk a bit more about in a moment and then there's more detail as well It describes memory models and other things that that you might find very interesting if you pursue architecture We also have Some software concepts we care about so if we are Linux programmers or BSD programmers We care about running programs known as processes or tasks when they're running And we care about the fact that they execute at different privileged levels, right? So applications are less privileged than the operating system applications run in something we called user mode and we have various abstractions that protect them From corrupting one another and from corrupting the kernel, right? So we have this virtual memory environment we define so that our application sees this nice flat view of memory It thinks it's the only thing running on the machine unless we tell it otherwise and it has It requests services from the OS kernel when it wants something performed It doesn't explicitly have knowledge of other programs running on the machine. I can find that out It could ask the kernel, but as far as it's concerned it has a view of memory. That's all to itself The OS on the other hand has a privileged set of architecture Instructions and it uses these to manage the state to manage the context of the running programs and to switch between them So when you get what's called a hardware interrupt When you get some event coming in or otherwise need to switch from one program to another to give the user the illusion That lots of things are running at the same time. That's what the kernel is doing It's using these interfaces to save and restore the context of programs examples of computer architectures Obviously, I have to mention x86 first, right? I think everyone here knows what x86 is I could have put risk 5 as the second one. I decided not to I think you guys have heard of that too but These two these are two examples of architecture. So, you know x86. It's a bit older than the 64-bit arm architecture They both have instructions one of them takes complex instructions one of them has simpler instructions They both operate on registers and they both have a 64-bit memory model They can both use large amounts of memory and provide that to applications. There's some differences But you know at a high level we can compare the two Let's talk about micro architecture So little detour I'm a very bad graphic designer, right? So, you know, don't ever come to me if you want something pretty Here's a picture of a chip And the thing I'm trying to sort of represent here is that a modern processor as we think of it Is actually, you know, it's not just one little CPU doing something. It's not how we used to think of it years ago A typical chip and even your laptop or your phone will have many cores We used to think of these as processors back in the day But there are many different cores and each one might be running a process or or several of them together Might be running threads in a process. They're all connected together on the chip and they have these high performance interconnects So on here you can see I've got one two three four five six seven eight cores And they have shared access to some of the resources so there's a memory interface on each side of my chip and Whenever I want to load something from memory Into my one of my cores to do some processing It's going to come in through the external memory interface and it's going to work its way up through levels of cash We're going to talk more about that in a minute, but just have this have this picture in your head and we'll talk a bit more So programmers think of processors, but they really mean cores We have these systems formed from, you know, many different pieces and they all have to work together Microarchitecture refers to an implementation of an architecture Right, so we've defined a high level x86 or ARM Instruction set architecture. We've said this is what an x86 machine has to comply with and a specific x86 machine Might be implemented differently from another one as long as they can run the same instructions The implementation can differ And some example differences can include what we'll talk about in a moment in order machines out of order machines Lots of differences can exist at a micro architecture level for example simpler processors are often described as being in order machines And if you're not familiar with computer micro architecture, this might be how you think of a Processor when it's running your code So when you are when your program is running Every single instruction every single operation in your program What it will do is one after the other will fetch the instruction It will decode figure out what it does And it will execute that instruction and it will rinse and repeat that one after the other The this kind of example is it's kind of your classic risk machine if you were to take a simple risk-5 machine for example It probably will start out. There are some more more performant implementations, but it will probably start out as a simple in-order machine and What we might do is we might add some features like pipelining so Instead of having I mentioned the different stages there before We might overlap them a bit. We might say I fetch one instruction I start with figuring out what it does and I'm already fetching the next one and I can get a little bit of parallelism here You can see here. I might have five instructions Working my working their way through the machine at different stages, right? That's called pipelining. That's when I split out How my machine executes instructions into smaller steps. It still does them one after the other though In order machines are easier to implement And they're much more efficient in some ways from a power perspective, right? So I'm not going to get very high performance But I'm going to need potentially less power So that's why you tend to get those in in little widgets and they'll also use less area They physically are smaller to build but they're susceptible to things like pipeline stalls so if I'm Working my way through the different stages of running instructions, and I'm trying to load something from memory I might have to stall my machine while I wait for some data to become available They've got a limited capacity to hide the latency of instructions as a result Now what I can also have is an out-of-automachine right now This is very different from what you as a programmer are thinking of so when you write a program you think I Do this and then I do this and then I do this But what the industry has done is spent the last few decades working out how to take your program that's has a defined sequence of operations and To automatically work out dependencies inside your program. So here's a simple program. I'm a load two values I'm going to add them together and here I'm going to load two values and add them together Those of you who are familiar with assembly language We'll see that I'm using registers inside my machine. I'm said I'm loading register one register two I'm storing the result of adding them together in register three and here I'm doing the same load register one load register two add them together But these two sections here. They're actually independent. I could renumber these and use different registers There's no reason that I have to have the same register numbers here that you see above You know in a simple machine. It might run through that program And execute it exactly as you see here In a more complex machine what it might do is work out. Well, actually these two sections are completely independent And the only thing they share is that they're using the same registers, but I could actually change that so behind the scenes What an out-of-automachine will do is it will reorder All of these instructions. It's called dynamic execution What it will do is it will say actually the moment that these values are available I can run this instruction and Then what it will do is something called in-order retirement So you start with your program that does one thing followed by another you turn it into What looks a bit like a data flow machine and then you do in-order retirement So at the end when you've worked out your results you keep track of where you are and you say Okay, I have I may be I may actually be executing this stuff here before this stuff here But I'm gonna wait and I'm gonna retire Only in the sequence the programmer expects so you as a programmer are not aware of this happening And then there's some complex machinery that's added for exception handling So if something erroneous happens during execution, I might have to back up What's happening inside the machine and present it to the programmer in a consistent order and say well you had a failure here? And that that is actually I'm presenting it to you in the order. It would have happened from your point of view so This is very complicated. I want you to understand though that Machines that you have even your laptop in front of you right now May actually be executing programs in a completely different sequence From how they were written or how you imagine that they would work Out of all the machines are Very common in high-performance microprocessors the concept was invented by a gentleman called Robert to Tomasulo who unfortunately passed away a couple of years ago, and it would be interesting if he could see Kind of a media attention recently And and get his insight right because what people have done I'll talk more about speculation and how that builds on this in a moment But what people have done is they've said well this whole thing doesn't work actually it does it's working exactly as designed There just can be some flaws in specific implementations of this But Tomasulo invented this back For I'm keeping on the time because this is gonna go long otherwise He invented this for the mainframe the S3 the 360 model 91 so quite a long time ago And then over time it's worked its way down into computers that you have on your desktop on your laptop and in most phones as well and as I said instructions are Dispatched from an in-order front-end as we call it They're executed in this out-of-order machine, and then they're retired back in sequence and the size of the structure I alluded to here Dictates how many of these instructions I can have out of order at a single moment And these can be quite large so in a contemporary machine For example a Skylake processor recent x86 Implementation from Intel on the Skylake micro architecture has a realtor buffer that's 224 entries long So that's quite a few instructions. I might be ahead at any one moment That's a lot of housekeeping. They have to do that's why these things are very very complicated You should also know the average one of these processors costs about a billion dollars takes about four years and Needs at least 300 people just for the basic design Right, that's why you don't have open source You have open source out-of-order machines, but one reason we haven't yet seen a Xeon class open source design is frankly the amount of cost that someone has to throw at doing that It's not to say it won't happen, but it's very very complicated and very expensive I'm gonna skip I'm gonna skip. Oh well and this last slide here just talks that there are lots of questions You can ask about architecture, right? So I have an architecture specification for example x86 I then talk about how I implement that Right when I'm executing my instructions, do I do them in order? Do I do them out-of-order? These are all design choices The machine ultimately does runs the same programs, but I have trade-offs I can make choices based on how complex and how performing I want my machine to be Examples of implementations of architectures. So here's the Skylake in my laptop. That's why I'm using that example. It's a little bit older And you can see it's got 224 instructions that it can have in that rob and Then you've got an IBM power 8 which also can have 224 instructions But they call it a global completion table because IBM is different In lots of ways good ways And you also see, you know, how many instructions the machine could have in flight So on an x86 machine, typically it's a couple at a time On your on your laptop It will take these x86 complex instructions and decode them further into these macro and micro ops and all kinds of things You can read more about later. It'll run a few of them at a time on the really big servers It might run as many as eight or ten. It may dispatch it eight to ten at a time Okay, now Store that what I just said. Let's talk about virtual memory Okay, so I talked about the separation between applications. You have user space You have user applications running. You have the operating system And we try to isolate the two for obvious reasons, right? We don't generally want any old application being able to interfere with the operating system We have a defined interface between the two so when an application wants to do something it uses a system call Interface an API through which it requests things from the OS Applications when they're running as known as processes they use system calls. I think I mentioned all of that Here's an example of a Program when it's running So if I were to just on my laptop type this command cat proc self maps I could see the view of memory that that cat program has because it's catting its own memory map And I might see various memory ranges But I really want to draw your attention to a couple of them Okay, so Every process every running program will have a range of memory that represents its own Text its code and its data and its stack and some other stuff And then every process until recently will come on to why that changed until recently Would also have this range of memory at the top of its address space which contained all of the kernel And all of the memory that the kernel has access to and you might say well, why is that? Well, we had we have Mechanisms that are supposed to Protect the application from being able to see or touch that range of memory and it means that whenever we want to Whenever we want to go whenever we want the kernel to do something on our behalf we can have a very lightweight Entry and exit from the kernel it already has access to all the memory it needs It's already set up. We just jump into a different execution state that can access that memory We do something we go back to the application So application Jim be able to see any of that memory we maintain this separation using something called page tables Which take the view of memory the application has and they translate it into the view That's seen by the hardware. So I'm trying to access, you know, this address Whatever address up here. It's going to go through some page table that tells me Where where in physical memory that address actually lives? That's an expensive operation Doing that there are some there are things we call hardware walkers in our chips that actually have to go down through these tables And they have to work out this translation. That's an expensive operation So we don't do that every time What these chips have in them is something called a translation look aside buffer actually probably several And what these do is they store these translations So if I want to touch a piece of memory, I can actually look up very quickly the last few translations and again by keeping kernel memory translations in place While my application is running Again, I get some performance because I want to go into the kernel to do something These entries are already populated. They're already present And so typically what I will do is I'll leave these in place until I switch from one process to another that I have to flush this stuff out and Switch to another process because its view of memory is different Okay, I'm gonna skip that slide Skip that one and okay so I'm going to throw caches in here as well So I've said that you have these ranges of memory your application sees You translate them before they hit physical memory. Well, you also have a caching Hierarchy that sits between Your program accessing some memory and the actual RAM chip in your machine and there are multiple levels of this cache memory All right with names like level one level two level three level four things like that but basically what they are are Ways of accessing data that I'm using frequently faster Right memory chips in my machine is slow relatively speaking The course inside my chip a much faster and The laws of physics tell me that I can't have both I can have Big and slow or small and fast. So what I do is I have some memory on my chip It's a bit faster or a lot faster and it caches Valleys up and using recently. So when I when I touch a piece of memory what will actually happen is it will get pulled into the caches and so For example, I may have a cache entry For a user a piece of data from from a user application I may at the same time in my cache have a piece of data from my kernel Again, I've got protections in place that should mean that there's no way of ever accessing a piece of kernel data From my application code All right, my page tables say that's not accessible. It doesn't matter if it's in the cache when I try to access it That's not accessible Probably skip this slide But this is an optimization actually of how modern high-performance caches are implemented that you can read a bit more Later on if you're interested. This will tell you for example if you ever wondered You know why is a level one cache 32 kilobytes in every CPU with 4k pages? This will tell you why you can read it later Let's keep going Okay, I'm gonna skip how caches work. So let's talk about side-channel attacks so side-channel attacks are based on deriving information By exploiting the physical implementation of a machine right, so we have our instruction set that describes how Any x86 machine should operate for example Then we have an implementation The side-channel takes advantage of the fact that an implementation Might have some vulnerabilities into it classical things that we've done in this space have involved Electromagnetic emissions right put your hands up if you've heard of Tempest you guys heard of Tempest right the sort of secretive governments Agency is watching your screen from afar right that's based on Analyzing the emissions coming from your machine All right, there are similar attacks with differential power analysis So I can monitor how much power a chip is using and I can infer what it's doing Another thing that I can do is I can measure how long it takes for certain operations if Different operations take different amounts of time and if I can actually perceptively measure that So caches can behave as side-channels because They're a shared resource as you saw from my diagram earlier Whenever I pull some whenever I want to use a memory location, it's going to go in through my cache hierarchy Meaning that the cache is shared by everybody And I can actually measure a Difference in the time it takes to access a piece of data Based upon whether it's in the cache or not And in fact it gets even more scary. You can actually work out what level of the cache it's in Well that just rebooted so All right, if you're watching the video stream and it cut out your video streaming machine just rebooted, but okay So anyway, so I can measure the amount of I can measure where or if not in the cache based on how long it takes to read a Piece of data. In fact, there are even more exciting attacks on some Architectures like x86. I have a special instruction called cl flush and I can say as a programmer Unprivileged any code can do this Flush this location make sure it's not anywhere in the caches so I can guarantee it's not in the cache and Actually what I can do is if I do two flushes I can actually measure whether when I flushed it Whether it was in the cache to begin with so I don't even have to load something to measure whether it's in the cache There's some really exciting attacks I can do Here's an example So I use I Have these interfaces on on most architectures I have a way of measuring time on x86. It's called our DTSC read timestamp counter I can read the current timestamp counter. I can access a piece of memory. I can read it again, and I can work out the difference and Based upon that amount of time and some calibration I can work out is that memory access is that thing I'm accessing in the middle was that in the caches or not And as I said a lot of a lot of architectures provide instructions to let you do this You don't need an instruction. There are other ways to count time And some architectures provide a way to guarantee you flushed something from the cache that gets very useful a bit later on But there are other ways to do that. You can look up displacement flushing If you're interested and as I said you might even be able to optimize it With some of these other variants as well to see if data is in caches. Well, why is that useful? Let's let's think about that. We'll come back to it I'm gonna skip prefetching Okay, now we're gonna talk about branch prediction. It all come together in a minute. This is a complicated topic guys You know, you're getting a deep dive here, right? Let's talk about branch prediction. So when I'm running code On a machine I may need I may hit points in my execution where I'm trying to decide is my program gonna go one way or another If this do that or do that instead, right? Now when when I hit a branch in my program I'm going to test for example if it's raining do this thing Well, I may not actually have the value of raining Available to me at that moment for a couple of reasons. It might be in slower memory I need to pull into my caches that might take a bit of time or it might be some calculation I have to perform and for those reasons There can be cases where I hit a branch in my code to go one way or another And I don't instantaneously know which way it's going to go so I can Stool my machine in wait or I can continue Running I can guess which way my branch is going to go And I can build on this build on my out-of-automachine and I can build this concept of speculative execution What I can do is I can say I have this condition here if R1 is zero Do this other stuff? I don't yet know the value because I'm loading it I'm waiting for it to load So what I'm going to do is I'm going to go into a special mode of execution called speculation And I'm going to keep running these instructions I'm going to guess it's going to go this way. I don't know that So I'm going to tag each instruction and say that it's speculative right if Later on I discover which way that branch is supposed to go and I'm wrong. I will flush Everything that's purple here. I will forget about it because I've tagged it specially I've not retired it. I've only kept its interim state The idea is that you're never aware that I did this. It's an optimization if I'm right The machine keeps going. It's a bit faster if I'm wrong I have to throw away some state, but I'm no slower than if I just waited To find out the result of that conditional check So speculation is something that we build in out-of-automachines. It's part of our branch prediction hardware And we use it to get a performance optimization When we're speculating if we have any erroneous conditions in our program What we will do Is we will also tag them here Right, so if I try to read if I try to perform an illegal instruction or do something that's not permitted I won't actually Take an error do take a trap. I won't do anything about it because I don't know if this is actually supposed to run I'll just mark it and later on if I decide that that was supposed to run then I'll handle that later So as I said if I'm correct when I hit a branch It's called resolving a branch if I'm correct Then I continue and everyone just gets a speed a speed up if I'm wrong I have to do some housekeeping, but the idea is that you can never observe That machine you can't observe the fact that I did this speculation. It's supposed to be visible to you I can talk a bit more about conditional and Indirect branches, but I think I will just skip to how branch predictors work a little bit So if I have two different applications running My machine how does how does the branch predictor actually work? What it does is it has a data structure in memory and it will look at the actual Memory address of a of a potential branch instruction And it will in different ways because implementations vary record the history of That branch so the last ten times I saw this branch. I went that way Probably means the next time I'm going to go that way as well And in fact in some hardware I even have fancy stuff like loop predictors. They can work out not only You know is this is this branch probably going to go that way, but I even know it's a loop I can just magically work that out using some complicated hardware But there may be many different components to my branch prediction and fundamentally they will use some structure that tags the history of branches and I want you to think about the fact that this tagging that I do It could be expensive I could I could need a lot of memory if I were to try to store The address of every branch my program ever took So instead what I do is I optimize this structure and I may only use a little bit of the address For that branch so consequently I could have two different programs with two different branches and My branch prediction hardware may not be able to tell those apart Okay, and then I have a variant of those conditional branches. I talked about called indirect branches That's when I have what you would call a virtual method or some kind of function Pointer I don't know where I'm going to go to I also have hardware that a bit like what I described before can guess indirect branches in my programs I'm going to skip through the optimization All right now. I'm going to talk about these particular attacks because we're that's a good bit of time. Okay, so You learned a lot. That was a whole semester's worth of various computer science stuff. I'm glad you're still awake Let's talk about these two vulnerabilities and how they layer upon that So, you know, these are branded vulnerabilities They were discovered by both academic researchers and also by Google project zero and Because they were discovered by researchers. I love the researchers, but you have to give it a cute name, right? So, you know meltdown inspector it is Because variants one two and three don't really sound sexy. Do they, you know? Now we were by the way, we were actually tracking these guys for a while And I knew that this was the research team working on the project And of course there are websites you can go to where you can track everybody's domain registration So I was using a side channel for some time to monitor the researchers To see what they would name it So we discovered the meltdown inspector domains the moment they registered them in December and consequently had a little bit of time to You know figure out how they would position it And what these attacks do is they exploit the things I just described to you To bypass normal system security boundaries, and let's go through how they do that Well, firstly if you're on a Linux machine don't panic don't let your machine panic because Very recent Linux kernels and certainly those from the distros will very soon start to have this directory sys devices System CPU vulnerabilities, we are thinking there may be more over time It's good to leave room right and and you will see entries in there For these attacks and then potentially future ones along with what your machine is doing to mitigate this right? that's not to fix it because Fixing it would require that we change the hardware in some cases, but we can mitigate it We can take a performance hit and do something to Remove the ability to exploit these attacks so meltdown relies upon some implementations of speculative execution literally Following what Tomasulo did and the key piece is that they handle Exceptions they handle Problems from accessing data. You're not supposed to right at the end They allow you to speculatively do something, but then they say Before I retire before I ever complete that operation I'll I'll just make sure I'm supposed to and I'll throw it away if I'm not So you might see a piece of code like this. Don't worry. We'll talk through what it does in a moment In fact, I think I have it on the next slide Okay, so so I might have some secret data Who knows what it contains some magic data in my Linux kernel? I want to read and If I can arrange for a little piece of code to run speculatively that means it's not actually necessarily going to be part of my program I put it inside a branch that may or may not run What I can do inside that piece of code is I can read that pointer quite happily Now if my program ever Retires those instructions. I'm going to get an exception and it's going to crash. That's not useful, right? But What I can do While the speculation is happening I can use the value of that data to access some other data that I do have control over And I can actually influence which data I access Remember I said before I can also Determine whether something is in my cache Based upon the access time for it. So I've got all the pieces I need. I figure out the data. I want I Mask out a little piece. I want to read I Then access some other piece of data I have control over and the offset I access is based upon the value I just read and Then what I'm going to do back outside of my speculation is I'm going to measure which of those two locations I loaded If I load one location for example 100 That means one thing if I load If I if my code loaded from the other location, that means something different the actual value I read isn't visible to me Right the speculation hardware took care of throwing all that state away But because that's because there's a shared cache and I can observe what happened From the point of view of the cache. I can see that value and I can there therefore Consequently use the same piece of code I Gave you before to work out which of those locations that zero one signaling I can reconstruct that piece of data I can do that in a loop And I can read out the thing Now the actual meltdown exploit you'll read online is a bit more detailed And then they've got some optimizations. This is the version that I put together in December because we had to mitigate this and we weren't given reproducers just enough to be dangerous and then some folks I Think I was told that we were not that sophisticated So don't worry. Well, I don't like being told that so I went and figured it out and made a reproducer That really annoys me when someone says that so that was great So this is what my code does the actual code the researchers published is a little bit more Optimized but you get the idea you can't read that secret data But you can observe what it was based upon what it did to the caches And here's here's again. Here's the example code You can read through the slides to kind of get that to make a bit more sense for you as well So when the right conditions exist, I can exploit this How can I mitigate for it? Well? There are certain circumstances required to make this possible to abuse for example In some implementations it might have to be in my very innermost level one cache I might be able to flush that cache whenever I leave the kernel There might be some people out there that are mitigating that way The other thing I could do is I could change how I do my page tables so that the kernel Memory is never visible when I'm running an application. I can do that It's just a performance hit because now every time I go into my kernel I have to twiddle my page tables around that's a technical term and I take a hit that Costs me time right? That's why meltdown mitigation with page table isolation has a performance hit There are optimizations. There's something called asets. You can you can read more about this Okay, let's do specter. I'm running a long time Specter so we have this concept of gadgets Gadgets if you read about return-oriented programming rops These are common kind of stack smashing attacks in security Arenas gadgets are pieces of code that already exist in a victim or target program And I'm gonna cause that code to execute. It's already there I'm just going to influence the environment so that that piece of particular code I found That does something I want similar a similar sequence to what you saw before, you know load some data I can infer what the data was based on the address it loaded that kind of thing I find a piece of code that's particularly interesting and I abuse it So here's an example specter variant one I might have a piece of code that Read some data in from the user untrusted data I might then do some other stuff and it turns out Some microprocessors will keep executing before they know whether they should And so if I find a place where Particular bad sequence of code exists. I can I can exploit that that's specter variant one It's difficult to do because I got to find just the right code It's got to be kind of on an entry point into the kernel. There's one reproducer. It's a bit messy Let's talk oh and the mitigation for variant one is well, don't do it. So what I do is I shove an instruction in here That prevents the code from continuing past the point of doing of doing potentially a speculative load and That requires that I rebuilt my my operating system my kernel I talked about branch predictors before so I've also got variant two of specter, which is the branch predictor poisoning And that is since I know how branch predictors behave there may be Not fully disambiguated addressing I can exploit that I can have one process running that's training my branch predictor to Guess wrongly when something else is running if that something else is more privileged or the kernel I can exploit what will happen there So I can poison my branch predictor in particular my indirect predictor to guess That it's going to make a jump into some code that it in fact is not It will then speculatively execute whatever gadget code I want So now I've got control over where that gadget code is that's much more interesting to me because in a very large thing like the Kernel I'm probably going to find a particular instruction sequence. That's interesting to me I set up the environment. I train my predictor And I can exploit it. I'll wrap quickly Mitigating it there are two ways. I've got a big hammer Which is expensive I can turn off my predictors when I go into or out of my kernel Or I Can use a technique that Google came up with called retpolines which you can read more about in here Which change indirect calls into fake return calls? It's kind of interesting. It's like if it hurts doing this, don't do it Right, so they have a way a code particular code sequence Where they will modify What look like? Indirect function calls and make them look like function returns So they won't use the indirect predictor. It's a cute hack It's not nonsense. It's it's very it's very interesting And it does unfortunately also require you change your compiler and rebuild lots of things So we are switching to that one because it's much More performant than turning off our branch predictors everywhere And it also does cute stuff like Put a harmless infinite loop in if you're going to speculate just speculate that and go away It's kind of fun Okay And there's there are other variants of this coming. I'm going to wrap now So related research. So this is just the beginning right architecture junkies like me, you know, I'm not going to say we're excited because that's a bit unfair It stops you very serious And you should update your machines straight away and all that kind of stuff But it is interesting That a lot more people are paying attention to these classes of attack now And that means that researchers will find more of them Hopefully we will make better machines as a result Um and other related research will happen right so you guys can can read about these later You can read about the row hammer attack And you can read about magic, which is my favorite one that one is writing special sequences of instructions Which when you execute them will physically age your hardware Right um The bottom line is everything you thought may or may not be possible Now is a good time to go back and think about it again Right In summary, we talked about a lot of things And you guys can go on twitter at john masters and you can find the links to all the slides I will happily take if it's time for one or two questions now Otherwise, I will be around after and I'd like to thank you very much Time for questions. So please put your hand up and we can bring the mic to you While you leave, please be quiet Stop talking. You can talk outside, but here please keep quiet. Thank you Thank you very much So, um How can two different research teams Find the same flaws at the same time? What do you think about this? And I think it goes also with What you said in the end why were the previous Flaws not so I don't know talked about yeah. Yeah, good question This part about keeping quiet while you're leaving if you're leaving, uh, please just shut up. Thank you. Thank you Usually people tell me to shut up, but but this is better. I like this so so The question is uh, how did so many several teams find this at the same time? Right. Well, it's very interesting actually that the most of the teams were in austria, germany kind of area And all they all the guys knew each other. I'm sure that had nothing to do with it um But I think more more specifically, um If you look over the last few years the guys at the technical university of gras who are absolutely amazing, um, they They found a whole sequence of different related exploits So this is built over time based on previous research So the time was ripe for this to happen And then one of the guys posted a blog last july that came very close To to to the meltdown attacks the first thing we did when we Were mitigating is discover all their research And go back and read all this stuff and that was very helpful um, so, you know, probably anyone else who You know was doing research in the interim and paying close attention probably would have seen that and that would have helped So it was just coincidence that does happen Um based on lots of other research that was all happening over time that just came together Uh another question I'd like to say again that both meltdown and specter from scratch in 54 minutes is pretty impressive. So well done um Do you think that this means that we need to be designing processes? Which are easier to change the way they work after the fact? Or do you think that adding that capability would have other negative consequences such as The ability for root kits and so on that would actually outweigh the benefits of being able to perhaps tweak them more in the field Very good question. Um, so firstly, I didn't get time to go into it, but on slide 80 There were 90 of these on slide 80. I have one uh on how Microcode mille code and chicken bits work in processors and how you can basically update some of them after the fact Or use these things called chicken bits Every processor that's built these days Especially the really expensive ones The x86 ones will have up to 10,000 Little knobs Called chicken bits so you can chicken out and you can say this little piece of the design I'm not sure it's going to work. So I might just make it possible to turn it off later Um, and and so you have about 10,000 different variants of that in most most the high-end chips I'm not joking and you can normally find a combination of those you can turn something off Um microcode is a little bit different. You can read about I've got a whole explainer on that Um, I do think we have to build chips that are easier to update Um, I don't think the answer is just let's build risk five machines everywhere There's nothing here that it that separates any of the commercial chips from anyone else's design Um, it's just uh, everyone building processors should think about designing for in-field Mitigation and in-field updates designed for security and also consider this If you are targeting a new market now, I'm very fond of arm servers. Some people know this Um, I always told the arm guys if you want to be successful, you're going to go after public clouds So that's now like 100 million, you know 10 million machines all homogeneous and all an attack target See better design, uh, those kind of processors so they can be very easily fixed So I do think there's a lot we can learn. Um, I think the industry is learning a lot Um, but but there's there's more to be done. Uh another question Yeah, um, should we be afraid of illegal instructions that may Uh appear in memory and what happens if they are speculative executed? Uh illegal instructions what what happens if we if we speculate those? Uh It will vary in the implementation, but you know, usually it'll just tag it and say, you know, it'll it'll it'll Probably before it hits decode so in the front end of the machine It'll see it and say that's not I'm not even going to send that because I know that's not a valid instruction But if you had an instruction like the example I would use is divide by zero Right, that's a more that's a more classic case. I'm going to tag that I'm doing a divide by zero. I'm not going to actually Send an exception to the program because I don't know if I'm actually going to do that Um, but actual illegal instructions. I probably hit it earlier in my decoder. I say yeah, I don't care about that You got one more one more Yes, up here. Thank you. So It's a bit of a political question up up here Where are yeah Do you think this could have been prevented? as in The architects who designed those things Kind of knew about it, but they were Also saying oh, we would accept the backlash like this diesel gate kind of thing So so it's a good question. So I I I tend to believe less in the conspiracy side of that And I'm not saying you you do What what I think is those of us who studied branch predictors in school and some of you probably did as well, right? You you you read about it at the time you think gee I wonder if the context no one says gee, but let's say I I said that the context from one State might still leak into another and you sort of oh in in class. They'll say well never never mind You know a little after a little bit of time those entries will just get displaced and you say oh, yeah Sure, that makes perfect sense and you don't think about it And there are generations of students who've heard that and said oh, yeah, sure no problem And there are generations of students by the way. He've also said. Oh, yeah, there's extra indexing bits I don't need a full set of bits. I'll just I'll shorten it again You know so after this came out I actually pinged a bunch of the the sort of well-known academics Uh and asked them to make sure that their classes have been updated Uh to make sure that they're not teaching anyone this now Um, but I think we've all learned the same things everyone who designs processors They all went to the same schools. They all know each other. They build things the same way They all think of the same kinds of things as not being possible or very unlikely. That's how this stuff came about Um, I do think we can learn a lot um, and we can prioritize security, but again while we are in a world where uh What will sell you a new laptop or a gaming device or anything else or a phone Is something that is fast and cheap While that remains the number one thing Designers will will fall over themselves to get you that 20 more performance or whatever they can get you So we also have to ask consumers Demand more in terms of security as well Yep, okay, I'm I'm very happy that I'm asking this question after this answer Because what you just said seems to me to that you're saying we need different incentives to incentivize security So my question is should amd until chip manufacturers be liable to some extent For what happened? Of course, nobody assumes malicious action on their on their part here, right? But if my screen stops working at some point, I am I am you know, I'm getting a warranty replacement, right? Um, I don't think everyone will get warranty replacements of their cpus. Perhaps they should So I think it's a good point. I think uh, this is me. Uh, this is that's your question over there. This is a 10 foot pole Uh, and this is me conveniently dodging it But I do think you raise a good point and I and I hope people discuss it. Um while I'm wearing this shirt. I'd rather not Take a position on that Um, but thank you for asking it and and you know, let's let's debate that right Okay, uh, is it time for any more?