 I'd like to introduce Vladimir, who's going to be discussing writing side channels for processor software, sorry I screwed that up, writing POCs for processor software side channels, which you know, leading off the previous talk around zombie load seems like a good planning. All right. Thanks. All right. Hi everyone. So a little bit by myself, I live in Portland, Oregon. I'm a security researcher slash hardware hacker. Well, it wasn't like that for past five years I guess. Yeah, past five years I was doing, basically started doing hardware and so on, so I built up the lab and then like this year I got the SCM to do more hardware actual research, instead of doing some software that they started off with. So how did it start, you know, like I got interested into one, when I heard about the vulnerability, I say L1TF, which stands for level one cache terminal fault. And the researchers, so the previous speaker talked about their independent research that they did, they found the vulnerabilities based on that. So when this vulnerability came out, I started to look at it and then got really interested because leaking L1 cache is a really powerful thing, as it was demonstrated before. Because well, not just that, but also Intel really relies on level one cache to do things like loading ACM modules, for example, and like other things. So they can mark cache as a RAM, for example, right, and do other things. So I wanted to create, make a POC for this to make a reproduction since there was no POC online yet when the vulnerability came out. So when we have memory accesses, I haven't made that diagram, you know, very simplified. We have virtual address, you get a page fault, when the processor receives the page fault, it looks up the page table, does the page walk, it looks up the physical address in the PTE. If you're running in a hypervisor, then that address is taken and then is looked up in the PTE, which is extended page tables. So that's how hypervisor enforces the memory mapping. Then that address is converted into the physical address and then physical address used to address the information in the caches. So like L1 cache, so L1 data cache, you know, and then you get your information. So what the researchers found, actually, when the fault happens, for example, when you like set, you know, you initialize the PTEs in the wrong way or you're trying to access invalid memory, the speculative execution engine keeps executing instructions in the background, because there is this synchronization issue, the signal did not, didn't get received in time, so you have a couple of cycles that when it starts executing, so it keeps executing for a couple of more cycles until signal is actually delivered, no, the fault is delivered. So and as we know from previous research of a meltdown, the speculative execution engine does not actually check anything, it's basically yellow, you just take the instructions and so it has only very limited set of instructions that can execute, so if it reaches the instruction that it doesn't know, like some hypervillage or very complicated instruction, it will stop executing, but simple instructions like move, add and other operations, it can execute, but it does not perform any checks, like as a typical execution engine, so what happens is the fault happens, the executing engine takes the page frame number from the PTE and treats it as a physical address directly without checking anything, and then from then on, you go and look up in the L1 cache, and if you have that information, now you can use it, and from what previous talk was talking about, you can leak the information. Same happens with the EPDs, which is even worse, which is even worse because when the translation happens, it's supposed to check the EPDs and that physical address, well since it's not going to match, you're not going to be able to get the information you need, and you get the execution going to stop, but since you're getting directly the physical address, it's treated as a direct physical address, you basically can read any memory without, you can read any memory as long as it's in the cache, so you can read it in SMM or VMM or any other mode, as long as it's in the cache, so I've been writing POCs, I did actually work for Big Blue, but I did not know about L1TF because I already left them, and so I've been writing POCs for quite a while, and so this is, I made a list, like what you kind of want to pay attention to, like if you're trying to make a reproduction proof of concept, you like either could use the, either could use already available POCs, like on the GitHub or not, or start your own, right, so like first thing is you're code going to be messy, like you're going to be experimenting with lots of things, you want to use like some source code version control, right, you're going to get bugs and like just stupid mistakes, so you, yeah, just be patient, right, and so the third one is really important, when you think that, oh, you found something interesting, you want to really test it against false positives or false negatives, you want to make sure you have a consistent behavior, so let's say you found something and they're like, oh, it's leaking some information, well, does that information really make any sense, right, so you want to look for patterns, like, you know, typically, so the cache organized in the cache lines, which is a size of 64 bytes, for example, right, or it can be like RSB, like a return stack buffer, which is 16, or any other, like, internal structure, typically, that is not necessary, but typically is in the power of two, right, so like typical bug you might have, like, okay, you're getting some results, but like it's like spitting out like infinite amount of like just random bytes, that doesn't make any sense, you know, yeah, you want to look up the CPU structures, like, okay, you see some, you see the leaks, but that information, not necessary is like power of two, or you don't see any familiar patterns, right, you want to, like, depends on what kind of you're going after, you want to make sure that you check the manual, right, so if you have a CPU that supports TCX, the transactional memory extensions, you want to use that, so what TCX does is it suppresses faults, so even if you do the, if you access some invalid memory, it will not actually raise an exception, CPU internally is going to suppress it and let you know, so you can do that programmatically without actually raising the exception, that will make sure that your sample cache or any other structures not getting polluted by other threads, yeah, so if you, let's say, got some results, you might want to do some visualization and graphing to get some kind of outliers or, like, averaging of your results and so on, run multiple iterations for your tests, like, you're not going to work just no single run, you want to, like, run it multiple, you know, thousands, 100,000 times to get better results and use, like, a CPU instruction to flush the pipeline, so as we've seen previously, right, like, Intel releases the patch, they introduce, like, oh, we introduce a new MSR, what actually it does, it invokes the microcode, right, and which is really expensive, so a CPU is going to do the same thing, it's going to actually flush the instruction, but see if the idea is you can call from user mode, so you don't have to worry about running, of course, I'm making development of the current module, right, so as I started to look at the information that Intel provided for this, I noticed that, well, they claim that the L1TF can be triggered only when the present bit is not set or some reserved bits are set, well, so like, just thinking about it, well, if I access invalid memory, for example, right, that means that the present bit is not set, right, but what's that say, if I do the right to the read on the memory, right, it should also raise the exception, right, so it should raise the page fault, why did not, they didn't mention that, so I started really curious, like, it's kind of weird, like, if vendor tries to provide the information, they should provide all of the information, I mean, I don't know if it's really, like, they just didn't see it as a attack vector or whatever, you know, like, for example, for me, attack vector would be really easy, you provide the read only memory to kernel, and kernel tries to read from it, and then maybe you can pivot or find the gadget in kernel that will do the load, it's for you, meanwhile, you're in the other thread trying to monitor that memory, so, you know, as I was validating my results, I found that, well, you know, like, if it's in the cache, right, I will get the really fast accesses, but how to make data not being in the cache, well, you see a flash, you flash the cache lines, right, so make it not being there, I found that, well, I'm flashing it, but I still see some leaks, so the first numbers are just number of those iterations that I mentioned before, you want to run your attack, you know, as many times as possible, the other one is just, like, page for old CPU number four, and for some reason, the same byte was leaking, and just information about the clocks, how many clocks did it take, right, so, in case of false positive, I shouldn't have seen some, like, fixed number, fixed byte number that has been leaked, right, like, if it would be just false positive, I would get, like, just random numbers, like, I don't know why it's zero two for me in this case, but, I mean, that's how it was, and then I was like, well, okay, this something is not wrong, like, I mean, something is wrong, like, how do I, you know, how do I confirm this? So, so I decided, okay, the CL flash is not working for me, so what do I do? Well, I mean, marking the memory as uncashable, obviously going to be the solution, because that's the only way to make sure that it's never in your cache, because it should not be in the cache, otherwise, we have a problem, right? So, and actually Intel mentions on their website that there's a mitigation, you should, if you're able, you can store some sensitive information in cacheable memory, since it's not going to be in the cache, right? So, it's like a perfect solution if you want to keep the secrets, right, and then this is from their optimization manual, saying that the speculative execution will never access the uncashable memory, or they're never going to take a trap or folder trap. So, as I was playing with the uncashable memory, I started to see, well, it's still leaking the information, and like, I used like, I just filled the memory with the, you know, EE pattern, and like, it, it would, it popped up there, which is like, it doesn't make any sense, right? So, I was digging to the manual a lot, like, I'm not a hacker, I'm not going to read 600 pages of like, just optimization manual, it was really frustrating to be honest. I found a, I found the, like a hint on their actually form, somebody was asking about the buffer, the these internal buffers, like you actually never heard of, because I'm not the security architect, or not the CPU architect, right? So, as it was mentioned before, you know, these, these buffers said basically, you know, between the register and the cache, but then if you have an cache memory, where does it go? Cannot go in, cannot go into the registers directly, right? Because otherwise it will be really, really slow for your uncashable memory. So, that's why, like, you know, they have, they have buffers there. You know, and it was making perfect sense, and especially because, well, these buffers are, you know, they're really tiny, right? I mean, they're not that many of them. So, that's why the results are really, really slow. So, you saw the video previously, right? The attack works, but takes time, right? So, finally made a proof of concept for what Intel called as the MDS, which is a microcredential data sampling. I made the proof of concept, the first DICI process puts the secret into the memory, then waits for the attacker to try to read it out. And they, they, they are on the same, on the same core. So, the buffers are shared between the physical threads, right? And so, you can, you cannot see it on the, on the, on the image, but I'm going to run the proof of concept right now after the next slide. The, the data that is getting leaked is actually equal of 64 bytes that exactly confirms that the, it was the size of the, of the cache line. I don't know if you can see on the, on the slide there. Oops. Sorry. So, I, I couldn't defend the screenshot with the debugger, but so, you can see the patterns like 7F, F6, 7F, FF. If you would compare 7F, F6 again, if you would look it up, these are actually system addresses of the kernel modules, or not kernel, but the user modules. So, these were actually, you know, like return addresses or something. They were matching, they, they, they are matching the, you know, the attacker process or the, well, the victim or attacker process. You know, so, so, right, so the next thing, okay, the school, we can do this attack, it, but it's kind of really slow. So what the, what the researchers did, right, they tried to leak the ETC shadow doing this. But how do you actually prime and make sure that, that, that what we saw the ETC shadow is in the, is in the cache, right? You repeatedly relaunch the, you know, password utility to make sure that it's there and get that race condition between these two threads and try to leak the, try to leak the hash. So like it is powerful, but I would say you're going to get detected. Like if, if you're like in the real environment, like it's going to be noticed like even your system, they're going to notice that. Like it's going to raise the alert, right? Other thing, in the, in virtualized environments, I don't know who does that anymore at this point. But nobody offers your VPS. Well, if they do, you shouldn't be using it. Nobody offers the VPS that schedules tenants on the single thread and they share the other thread with somebody else. That's basically just asking to get bound through using this. Like even, even if you have a mitigations like for KVM or Xen, when you, when the exception happens, you're going to go in the exception going to be processed by the host, right? You will be able to attack the actual host on the server because of like IRQs and other stuff that is getting processed. Yeah. So the POCs and the slides going to be available. I'm going to try to run this here. All right. Awesome. Nope. Nope. I don't see it here. All right. All right. So we have a first process just, I made this thing just for synchronization to make it easier. All right. So it should come up pretty fast. Yeah. Here we go. Yep. Yeah, I can see the secret string that I put in there, getting licked. And yeah, like you give me the garbage, basically, whatever is the rem, yeah. Yeah. And the rem just, you know, stale or remnant data that is being in the buffer. Yeah. Okay. So what do we have? All right. Actually, when I reported this to Intel, I said the the leak happens from like a load and store buffer, which is incurred, which it was incorrect actually. Because yeah, as I said before, I'm not, I'm a hacker. I'm not going to read 600 pages. It's too much. All right. And actually, the guys from TU grads, what they call the meltdown you see, founded much earlier before that but they have they haven't shared the information in the details. What was going on? Yeah. So yeah, you really want to validate the results that that that you're seeing, right? May you might see the CL flash, okay, it's flashing, it should be flashing, right? But it's hard to say the if you're still flush actually flushed it or not, or maybe it was already pulled by by the speculative speculative execution engine into the cash again, right? Because the CL flush is out of order. It's only in order for the other all of the other seal seal flush instructions. So I I put the link down below here. This guy actually found this behavior. And there's a whole thread about it on Twitter in January, even so even before the TU grads mentioned it in their paper, that while he's seeing the same amount meltdown you see, but he was using that seal flush. And I actually haven't seen him as being mentioned on the Intel reports. I'm guessing he did not reported to Intel, or Intel may have missed it in the in the TU grads report when they reported the meltdown research, or other reasons. Well, yeah, you know, like, after after the after the the same DS came out, I actually went back and looked Oh, why it's actually in the fill buffer. And then I found that I was looking in the wrong keywords, that I was looking for an un-cashable memory. But what actually also calls it, they call it as a non temporal store. You know, that that that that's what basically the un-cashable memory means. So they use, you know, terminology that is hard to follow sometimes. And the diagrams don't really make any sense sometimes. Yeah, so you don't want to jump to conclusions, basically, right? And then the last one is, I would take research was a little bit salt grain of salt, because we don't really have any tools how to check this and check any of these buffers or or internal structures. Even if you have if you even if you get the J tag, it doesn't allow you to do that. Like, you would need like, think what what Intel calls a red unlock. But so there are some details online that now available that provide some information how they read unlock can be achieved. So guys from positive security from Moscow that they do lots of research on the PCH and ME and so on. They were able to enable red unlock, which is basic factory unlock. But there is no software available. Like, even if you have it, you don't know how to talk to the CPU, because there is no information. So and then that's it. Thank you. And as always, if you have questions, but it will be in the courtyard in a couple minutes.