 Next speaker is Daniel Moghimi. I believe actually this work is going to be presented next week at ACM CCS in London, really top tier academic conference for security. So I really like it when academic work comes out to the hacker community, and you're all going to get a preview of that. This is a really cool attack. Similar on data side channels in processors. So take it away. Thanks for the intro. And thanks for everybody attending my talk. Today we're going to talk about the Zambilode attack. Zambilode is a CPU attack that can actually leak data from other processes, contact switches. Maybe you can use it to do privilege escalation. And this is a joint work with a bunch of people from four different universities and institution. So a little bit about me. My name is Daniel Moghimi. I've been doing some security work since 2010. I've been doing a PhD since 2017 on a slightly different topic since then, like micro-architectural security side channels and applying them to breaking crypto and different things. So and this is my first talk in a hacker con. So it's going to be exciting. You'll see. So before talking about our attack, let's talk about cache attacks, which is like a basic that we use in our attack in leaking data from CPUs. So what are cache attacks? CPU has to use cache to fill the speed gap between the CPU and the DRAM. In a modern computer system, we have a hierarchy of memories, like CPU register, cache, DRAM. And in the bottom, there are other things like peripherals, disks. When we get to the top of this pyramid, we have more expensive memory that you cannot have a lot of them, but they are much faster. So how do we use this cache? We try to access something from the DRAM. We load this, the CPU put it in the cache, and then we get the data to the CPU. This is slow because the CPU has got through a very long path. It take maybe 200 or 300 cycles, so we don't want that. The next time the CPU wants to access this, we're going to get the cache here because it has already been in a faster memory that is very close to the CPU. So the second time, this is going to be much faster. But the great thing about cache is it can actually let you to do some side-channel attacks. There are lots of different side-channel attacks out there since 15 years ago. But one interesting attack is called flush and reload. Flush and reload is one of the most simplistic cache attacks that can be used when we have a shared memory with a victim. Assuming we have two persons sharing the same hardware attacker and victim, let's say on a cloud environment or a browser. So what we do is, as an attacker, we flush this piece of memory that some other people loaded. And then we wait a little bit, and then we see if the victim is going to load that. If victim load that piece of memory, the next memory is going to be in the cache. And we, as an attacker, we try to reload that memory again. Then we reload that memory the next time. We're going to measure the time to see if it's loaded fast or slow. If it's fast, we know that it's a cache. If it's slow, we know that it's a cache. By this simple attack, the attacker can know about the access pattern of a victim. So we can, for instance, use this to see if a cryptography implementation can leak some secrets or other interesting things that we can do. For instance, we can profile users' behavior activity. But so far, we don't leak any data. We just learn some side channel information about what somebody might do on the same system. Then in 2018, we heard about Meltdown attack, or our CPU is on fire. We need to just redo everything. So how Meltdown attack works. If anybody, I think many people have program in C, if I try to execute this code, I'm going to basically get an exception, because I'm trying to access a kernel memory from user space. If I try to execute this code, I will get a segmentation fault. So I cannot print the value of the kernel. Can I? No. But with Meltdown attack, instead of trying to print that value, we're going to actually use some of the CPU features to actually leak that data outside of the CPU. And we combine that with a flush and reload attack. So how are we going to do that? Here, we have a CPU. The CPU has some registers there. Yeah. And a bunch of different cache lines. That CPU has many cache lines. And then we have a virtual address space that is assigned to our process as an attacker. So what we're going to do, we're going to allocate some memory. And this memory that we're going to allocate, we're going to call it oracle. And this will be assigned some cache line. This memory will be mapped to some cache line. Then we're going to try to, again, execute this code. When we execute this code, well, we will get a fault. We will get an exception. But what is interesting in, there is something called a transient domain where CPU executes things out of order without checking any validation or seeing if they are correct. And later on, it's going to just flush all of them. So in this transient domain, this secret that we try to access, which is a password, for instance, in kernel, is going to be actually loaded to some macro-architectural registers. And these registers are in the CPU. They are not architectural registers. It's not available to the software yet. And then when this happens, then we're going to execute another code. We call this code a gadget. The job of this gadget is when it's executed in the transient domain, it's going to encode that secret from the transient domain to some permanent architectural behavior. And here, when we execute this oracle, this secret is going to be encoded to one of these cache lines. Then the CPU goes ahead and like, OK, this was a fault. We need to flush everything and send the signal to the OS that this was an exception. When the CPU flush everything, that accessed cache line is going to be there. And that's a footprint that we can use later on to actually leak the secret. And basically, we use a flush and reload attack on each of these cache lines. We know that none of these are fast, so the secret is not there. But when we reach here, we know that, OK, this is fast. What is the index of that cache line? The index is equivalent to the secret. And that's how I'm all down attack board. So this was all background. This wasn't a zombie load attack. This was all previous work. Then we're going to ask some question. So we did them all down attack. We leaked some secret from the kernel. Good. But now there are protection. There is a KPTI protection that isolates the memory so we don't have access to the kernel. And more recent CPUs, like, for instance, Coffee Lake, this issue has been even mitigated in the architecture. But can we still do attack? Can we still leak data? What if we try different things? Like, trying to access, trying to execute other load operations that cause exception. Or do we even check where the data come from? In the previous slide, we just said, OK, the data came from memory. But data can travel through lots of different buffers and cache inside the CPU. So we looked closer to that. And we have, yes, the zombie load attack. So how does zombie load work? To discuss how does zombie load work, I want to explain how does a simple memory load operation on a CPU these days work, actually, on an Intel CPU. Then we execute a memory operation. Let's say I'm reading 8 bytes of memory by providing an address to the CPU. I use the virtual addresses. But the CPU has to read the memory from physical, using a physical address that actually maps the memory address to the actual bytes of data. So the CPU is going to translate this address using something called TLB. TLB is like a cache that holds previous translation, like the most recent translations. And TLB, if it doesn't have the information, is going to ask the page mishandler. And page mishandler is going to ask OSHA to a page work and give me this page table entry. What is a page table entry? This is the result of this translation, which includes a physical page number that can actually help the CPU to access the actual data. And a bunch of metadata there to explain what kind of memory we are trying to access. For instance, that PBIT is the present, being tells us, OK, the memory is backed by DRAM. It's not just metadata. And it's not on the disk, for instance. Or that USPID is the supervisor. That's how we know this is a kernel address. And that's how a CPU through an exception. Because if that bit is set and we don't have the permission, we're going to get an exception. So we know about this now. But what is cache? What are the different elements in the CPU that can leak data? So we mentioned that there is a cache that's actually helped the speed of the programs or memory accesses. But there is not only one level of cache. CPU nowadays use multiple levels of cache. In particular, Intel CPU use three layers of cache, level one, level two, and level three. Which level one and level two are inside the core. And level three is shared by all the cores on a CPU. But that's not the only element that works as a cache inside the core. There is something called line fill buffer. What is line fill buffer? Line fill buffer actually helps the level one cache to be more efficient, to not be a blocking cache. How does line fill buffer do that? We try to access a memory. We try to execute that instruction there. And when the data doesn't exist in the cache, the CPU is going to allocate an entry called one of the entries in the line fill buffer. This entry is going to hold 64 bytes of buffer, which is size of a cache line. And it's going to get the data from the next levels of the memory in the hierarchy, from the DRAM or from the L3. The good thing about line fill buffer is this data doesn't have to be filled inside the entire level one decache. The data can be just forward when we have enough data available to service that memory. So we're going to just get the data, and also the level one cache also eventually will be filled. But what is the problem with line fill buffer? The problem with line fill buffer is it acts like a bad memory allocator. When we try to, when we retire this instruction and we are done with executing this load operation, the line fill buffer is going to be actually deallocated. But this deallocation is just marking this as deallocated. So the actual data that's marked with x from a user is going to actually be there. But this is just deallocated. It's going to be marked by the CPU. Then some malicious person comes and say, OK, I want to execute a memory load operation. This malicious person, this zombie load, this is going to execute some memory operation that's going to get a fault. And before doing that, we're going to groom the architecture in a state that we don't have a cache for this malicious memory load. And then when we do that, we're going to be assigned some entry in the line fill buffer. Line fill buffer doesn't have so many entries. It only has 10 entries. So we have a high chance that the same entry that a victim has used is going to be assigned to us. And this is shared between the two hyper threads when you do a context to it. This is shared. And interestingly, on Intel CPUs, when we get a fault, the first thing the CPU does is this data is going to be forwarded here to this faulty operation. And we can use a meltdown style of flush and reload attack to actually leak this data. So by doing this, we can actually leak data from other processes, other contexts. So even if later on this data is updated, for instance, with zeros because now CPUs has protection, so it's going to update this data. So we cannot actually leak the kernel data. It doesn't matter because we have already leaked the data that was a stale inside the line fill buffer from previous operations. So Zambilot has multiple variants. Another variance that I would like to mention here is variant three, which instead of a supervisor fault, it uses actually the access bit, which is not the architectural fault, but it actually performs a fault inside the CPU. So how does this handle? So the access bit is bits inside the page table entry that actually allows the CPU and the OS to have some sort of contract on pages that are allocated. The CPU can use the access bit to actually tell the OS that this memory has been accessed for once. And the OS can actually tell the CPU that this memory is just allocated, so the access bit is clear. But the interesting thing is if the clear the access bit, if the OS clears the access bit, then CPU wants to access the memory for the first time. It's going to get an internal microcode access. And microcode access is just an internal handler inside the CPU that operates, that performs some of the more advanced, more complex operations using an internal handler instead of having a logic to handle that operation. So with the microcode access, we also get a similar behavior, so we can do a meltdown attack. Meltdown is a style of attack, which we call it zombie load. It can leak data from other processes and contexts. So what is the difference of zombie load to compare to some previous meltdown style of attack? So meltdown used the entire visual address to leak exactly the data that we know where it is. And there is another attack called foreshadow, which actually used the physical address mappings. Basically, it abuses that OS or a hypervisor can actually change the mapping and link data from L1 data cache. And there was also another attack coming the same time with zombie load called follow-out, which actually uses some address aliasing. So we have control over the last 12 bits. But zombie load, we only have control to the last six bytes of address. So we can only decide which byte of data to leak, which makes it harder to exploit. We only leak some random value from the line field buffer. But there are still some interesting things that we can do with this. We can do some cross process attack, cross VM attack, and also attack Intel's trusted execution environment. So how did we do that? We came up with some sort of encoding and error correction to actually get the data we want instead of some random noisy data. Let's assuming I want to leak some secret. I call it target secret. And it has some bytes of data. It's the AS key, for instance. If I try to leak this memory using the zombie load attack, I'm going to get three candidates because I may also leak other entries from a line field buffer. So what I'm going to do is then I'm going to shift this window a little bit further, let's say for four bits, and then leak another three candidates, and then do the same again for another three candidates, or two candidates. And if I check, I see that those values that are yellow, they actually match to each other. The nibbles of those values match to each other. And by doing that, I can figure out, OK, which bytes of this leak data belong to the same cache line. And I can extract that and then get rid of the rest because the rest don't match properly to anything interesting. Or I can also do other interesting stuff after I leak this cache byte. I can, for instance, check if I'm leaking something ASCII, for instance, password. I just care about bytes that represent ASCII. Or if I want to leak a cryptography key, I need to know that key materials are uniformly random. Using this post-processing, we can actually leak secrets. Here, I have a demo using Zombilio to actually leak the root hash password on a Linux machine. It takes some time here to go fast forward. So we can leak like 20 bytes of the root hash password like in one minute. So what are some other interesting things that we can do with it? In the research paper that we released, then there's going to be a more updated version of this next week due to some embargoes. In the research paper, we have some other attacks too. But one interesting that I want to mention is attacking Intel trusted execution environment. So Intel has this cool technology. It's a trusted execution environment that runs on the same CPU. And with SGs, we can assume that everything is malicious. The OS is malicious, the hypervisor is malicious. And the CPU is going to guarantee that nothing is going to leak outside of that module that we call it enclave to be leaked. But the good thing is for attackers, if we assume that way, then side channels can be much more powerful. Because we are a malicious OS, we can actually improve these attacks when we are attacking something like SGs. So we're going to use the Zomulo attack to actually leak register values from an Intel SGs enclave. How are we going to do that? Let's say I'm going to run some code that is executed inside an SGs enclave. And I know that after five instruction or some number of instruction, this register called LAX is going to hold some secret. We have a tool developed by one of the co-authors of this paper that can actually perform single stepping on SGs enclave. This tool can only do single stepping. We're not going to leak anything from the enclave because that's the security guarantee it has. But what we can do is we can actually put the architecture in a state that we are exactly after a certain instruction. After we do that, we're going to actually mark this page as non-executable. Because again, we are a malicious OS. We can at least control the page parameters. So we mark this page as non-executable. And when we try to execute this, we're going to get an exception. When we get an exception, we handle the exception as a malicious OS. And we try to do this again. If I keep trying to do this, I'm going to execute. I'm going to try to execute that calling instructions maybe so many times as much as I want with the same exact register values. And when I do that, I get lots of context switches. And with the help of those context switches, I'm going to make sure that the enclave process is going to always put those registers inside the memory. When these context switches happen, the register values are going to return to the memory and come back. And this forces the enclave to actually put those registers values inside the line feedweather. And then at the same time, I'm running the Zambula attack on another thread. And I'm going to leak the Intel S6 secure keys. So we just talked about attacks and breaking. Is there any solutions for these attacks? There are some shorter solutions that may or may not work, depend on the context. Intel suggests the instruction sequence card where W, that if it's executed, for instance, by the OS during context switch, it can actually fill all the buffers. So the next context, which could be a malicious process, would not get access to the same values inside the buffer. But for hyper-treading, there is not really any hardware solution. If you are using hyper-treading, the data can be leaked. There is no solution. If you are using Intel S6, you can actually verify if you are running the enclave on a hyper-tread or not using remote attestation. But you need to make sure that the attestation messages are verified properly. And the line-term solution, buy new CPUs, more money for Intel. And that's my talk. This is a poster for the ACM conference. If you want to check out the code or any more updates next week, that's the link to the website. I would be happy to take questions. All right, let's thank us. And we will do a Q&A outside in the beer garden.