 All right. You're in track three for the Pac-Man attack with Joseph Rabichandran. So welcome. And he's a first-time speaker. Thank you all so much. All right. So welcome to Pac-Man, a breaking pack on the Apple M1 with hardware attacks. And so my name is Joseph Rabichandran. I'm a first-year student. Just finished getting my first year at MIT. I just graduated last year from the University of Illinois where I was a proud member of SIGPONY, which I believe is here. Let's go. Thanks, Ian. And yeah. So I researched hardware security at MIT. This project was a joint effort between me and my collaborators Juan, Jay, and Mengja, who is our fearless leader. And so we, in our group, what we do is try to focus on every aspect of hardware security from attacks, like we're going to do today, to defenses, formal verification, all sorts of good stuff. But like I said, today is an attack project. And so you might have seen our paper that came out about this a month ago in the ISCA conference. So pacmanattack.com, you can get all the details from that. So this was our first introduction to the world about what Pac-Man is. But it is my great pleasure today to explain or to introduce to you rather Pac-Man version 2. So this has not been publicly released yet. But we have a great new reproduction of the attack, lots of improvements, and lots of really cool stuff. And so ARM came out with this release about our project, talking about what Pac-Man is and what's affected. And it turns out that it's actually not just limited to M1, but it actually affects a whole lot of other processors too. And so the Pac-Man technique we think is actually quite a powerful technique and can be used for a lot of other things besides just pointer authentication. All right. So in the research literature, a lot of groups tend to focus on the software side of things or the hardware side of things. So you've got a great body of research in micro-architectural attacks and in software attacks, like return oriented programming, all that good stuff. But what's kind of lacking or what we thought could be improved is what happens when you bring these two worlds together? What kinds of new attacks are possible when you consider the cracks that lie between these two threat models? And so Pac-Man is our vision of a synergistic attack that combines both what's great about software and what's great about hardware. And so we call Pac-Man a hardware software co-attack. And so Pac-Man comes with three main contributions. The first is a new way of thinking about these compounding threat models. The second is a hardware bypass for pointer authentication, that's kind of the name of the game. And the third is a real implementation of this attack on the Apple M1 chip. And so today I kind of want to come up the story of Pac-Man from two different perspectives. So when we see dark slides like the ones on the left, we're going to be thinking like attackers. So when we see stuff like that, I want you to be thinking how can we break this. But when we see things like a light slide on the right, we're going to be thinking as CPU designers. So what kinds of choices could CPU architects have made that lead to attacks like this? So coming up the same idea from two perspectives. And so by the end of today's talk, I'd like you, the audience, to come up with an answer to this question. Namely, is this a flaw? Or is Pac-Man a consequence of multiple things coming together? And so instead of boring you with all the details about the attack and memory corruption and all that stuff, I'm going to give you the entire idea in 60 seconds. So stay with me here. A lot of the memory corruption attacks that you see, they kind of all follow the same pattern, right? You get the ability to read and write memory that you shouldn't. You change function pointer somewhere, so return pointer, vtable entry, something like that. And now you've got arbitrary code execution. And so the smart folks over at ARM said, hey, this memory, read write memory thing has been around for 40 years. We've tried to solve it with all sorts of techniques and we just can't. So let's instead add a barrier at the end here that blocks this step. So changing function pointers now doesn't let you arbitrarily change the pointer because we're going to sign the pointers with a cryptographic hash, okay? And so if you could find a way to forge these signatures, you could bypass this protection. And so you might be thinking, okay, why not just try them all? Like it's just a 16-bit signature, why not just brute force them? Well, the problem is if you try to evaluate an incorrectly signed pointer, so if you're trying to brute force this code, you're going to crash. And since today's target is the kernel, you're actually going to crash your kernel. Completely reboot your device. So that's not great. And our solution, this is the key takeaway from Pac-Man, is that you can actually avoid these crashes by doing your tests in the speculative regime. And so we're going to talk about what speculation is and how to exploit it in this talk. And so that's kind of an overview of your mental roadmap of Pac-Man. So we envision Pac-Man as a BYOB. It's bring your own bug. So if you look at back to this graph here, Pac-Man, rather the attacker is going to come up with this part. So find a read write gadget and change some function pointer. And Pac-Man is going to handle this end. So defeating the Pac feature. And so today it is a great honor to introduce four brand new tools that we've developed in-house for this research that we're going to be sharing with you all today, as well as two proof of concept implementations. So the first tool is Pac-Man kit. When we have Pac-Man finder, Pac-Man patcher, and Pac-Man OS. So let's start with Pac-Man kit. So Pac-Man kit is your best friend as a micro-architectural security researcher looking to do attacks on Apple Silicon. It's got everything you need. It's got the basics, kernel read write, execute, all the stuff that you're looking for as someone looking to cross privileges and do some tests. But it also comes with the other tools that you might commonly need. So if you want to read an address and see how long does it take the kernel to load this address, this can do that. If you want to translate a virtual address from user space to its physical address for creating better eviction sets, this can do that. So it's got a lot of really great tools and it's battle tested. So there have been multiple nights when we're looking at coming up with some new graph or some new result for some deadline. And Pac-Man kit has just been there. It's quite reliable and it does what it needs to do. So we think this is going to be a great tool for security researchers. Next up is Pac-Man Finder. So this is a gager script that lets you search any binary for Pac-Man gadgets. And so we ran this on the X and U kernel and we found around 55,000 gadgets, you know, some data, some instruction, and you can kind of tune your parameters. So if you're looking for only a certain type of instruction, you can specify that. If you want to limit it just to the system calls or functions that you can reach, you can do that too. So we think Pac-Man Finder is a great tool and we're going to see how to use this to find a real Pac-Man gadget in just a few minutes. Next up is Pac-Man Patcher. So we're going to talk about the timing features of M1 in a minute, but the problem with M1 is that you don't have really great high performance timers in user space. And so when you're doing those deep reverse engineering tasks, trying to build up a model of how the system works, you need really fine-grained timing. And so Pac-Man Patcher is a patch to macOS itself that enables these timers for user space. So no need to play any tricks, just run this once and there you go. And lastly Pac-Man OS is for those really, really deep reversing techniques where you really just need bare-metal control. And so Pac-Man OS basically says give me some rust code and I'm going to boot it right on the device. And so you're kind of left to your own, you can do whatever experiments you want, run right on the bare-metal, that's Pac-Man OS. And so together we think these four tools make up a really great suite of reverse engineering technologies that you can use to do Apple Silicon micro-architectural analysis. So one other fun thing that you might want to play with, this past semester, my lab and I just redesigned our secure hardware course at MIT. And so we released five brand-new labs that let you do all sorts of really great stuff. So you can do the row hammer attack where you flip bits in the physical DRAM controller, you can do spectre in the kernel, and you can also implement an AI accelerated website fingerprinting attack that still works today. So if you want to play with some of these, you can Google MIT secure hardware design and find any one of those five labs. And feel free to shoot me a tweet on Twitter if you're trying out and you have a question, just feel free to reach out. All right, so Pac-Man 1 was what we released in ISCA a few months ago, and today we're going to announce Pac-Man 2. So Pac-Man 2 is a complete re-implementation of the first attack without some of its shortcomings and it's significantly more efficient as well. So with Pac-Man 1, what we wanted to do was show that this attack is possible. So we built a proof of concept that proves the hardware behaves the way we think it does. But with Pac-Man 2, we want to show it in action. So what can this thing do when you really push it to the extremes? Pac-Man 1, we've got a very simple victim. Just, again, we want to show it works. Pac-Man 2, we're going to do it on a realistic victim and we'll see that in a second. Pac-Man 1 takes around three minutes or so to forge a pointer. Pac-Man 2 can do it in as little as 11 seconds. Pac-Man 1 is a bare-bones C implementation. So one of the things we wanted to do was make sure that our tools we provide to the community are extensible and can be played with. Pac-Man 1 is more of just an imperative C reference implementation. So messing with this and trying to change it is a little bit difficult. With Pac-Man 2, we've rewritten it all in rust with an extensible framework that you can use to build your own Pac-Man attacks without having to reinvent the wheel every time. And so the major shortcoming of Pac-Man 1 was that we assumed we could do this with a system call. And that's a relatively realistic assumption, but the noise that this creates is actually a lot. And so doing this in practice is quite hard. With Pac-Man 2, we've removed that restriction, we do a different way of timing. So Pac-Man 2 doesn't have that problem. So with that out of the way, let's talk about software. So we're going to look at this as CPU architects. Together we're all going to design a processor. And so our processor is going to have 64-bit RAM, which is the best way to do it. At least today, maybe in a couple years it'll be different. And so our design doesn't have 16 exabytes of RAM. We're not using that much RAM in our computers right now. And so what it turns out is that these pointers actually just have a bunch of bits that we're just not using. They're just completely unused. And so let's use these bits for something useful. Let's put them to good use. And so what we're going to do is take set aside 16 bits in every address as a signature that verifies and so we're going to compute the pack by taking together a hash of the pointer itself, a per object salt and some secret key. But for the purposes of today's talk, we actually don't even care how this hash works. We're going to assume it as strong as possible and cannot be forged. We have to brute force it. And so there's two instructions that you can use to operate on packs. You have the pack instructions that can insert a pack, and the odd instructions will verify a pack. And the important thing to note is that we do not want to use the odd, we actually crash on the use. And so let's walk through a really quick example here. So here we've got an object hanging out in memory, that's a sign pointer, and we'll see this exact object come back later. And so we're going to do a load to load that sign pointer out of memory into X16. We're going to pull the object's address and a per object salt, put that together, and that's going to be our object's salt. And do the authentication with that in mind. So take the pointer and its salt and let's see what happens is if the verification was wrong, that pointer that's been put into X16 is now just some corrupted pointer. And when we try to load it, this is where the crash comes in. All right, so let's talk about buffer overflows. So maybe you've got some, in this example you've got a buffer hanging out and then some function pointer afterwards. And maybe you forgot to use the right version of string copy or whatever. And the buffer overflow overwrites this function pointer. Now traditionally we're going to fix this with pointer authentication. And so the key thing to assume is we're going to assume that same software bug still exists because remember, these are hard to find. And so we're going to do that overflow and corrupt the function pointer, but since the pack has been changed, we can detect that and actually crash. So pointer authentication can keep us safe in this case. So today's goal is to reveal the pack for an arbitrary pointer without any of those crashes. And for that, we're going to have to turn back and solve the software problem that we have, which is using the wrong pointer crashes, and we're going to solve it in hardware. So we're going to start by guessing a pack speculatively to prevent a crash. And then we're going to leak the verification results of that test case through a side channel. And so we're going to talk about what both of those things are. And so let's switch back to CPU design mode. So we're going to build our processor with this feature called speculative execution. And the idea behind speculative execution is that we want to increase performance. We want our processor to be as fast as possible. And so in this example here, we've got if true do A, else do B. And so your in-order basic CPU is going to have to spend some time thinking about that branch and then just do the right thing, right? So it's pretty straightforward. But we can actually do better. So if we speculate the direction of the branch, we can predictively begin executing before we know we should execute. So in this case, we're going to do the right thing. And so we're actually faster than the in-order case. But if we miss speculate, if we get the wrong prediction, now we got to spend some extra time to undo the effects of that speculation and then do A. But you'll notice that this undo is only surface level. It doesn't undo every change that was made. It just undoes the ones that the programmer can see. And so the micro-architectural side effects of running B speculatively are not undone. So let's see what that means. So we have a design decision to make here. So how do we speculate on pointer authentication-protected pointers, right? So we want to do this pack thing. We want to do the speculation thing. How do they come together? And so it seems to me that there are three basic cases. You can ignore them. Just pretend they're not there. You can never load them. So if you hit a pack, just stop speculating. Or you could just treat them normally, right? Pretend like the program runs in order. And so these three cases have three very basic cases. The first one is that it also introduces a security problem. And so you can read our paper to find out more about that. If you never load, this is slow. We want our processor to be as fast as possible. And so treating them normally is what the M1 does, and that's how we're going to do it, too. And so let's take another look at speculation, but with pointer authentication in mind. So for this example, we're going to do, if true, we're just going to leave the function. We're just going to return. We're going to do the if, and then return. No leaks. We're all good. And the speculative system can be good, too. So if we do predict correctly, we still don't leak anything. We never run the code we're not supposed to. But if we mispredict, if we run that wrong branch, we're still going to do the check and the load. And if that load is incorrect, we're not going to actually load anything into the cache, but we're going to start a speculative exception. So there's two possible outcomes that we can do. We can undo the change to the X variable. We don't actually undo any of the loads that we just performed. And so that is going to cause a leak. So the value still being in the cache, this is going to leak some information. And so operating on pointer authentication speculatively can leak correctness without causing a crash. So let's take a step back here and look at this from a bird's eye view. So we've got our first bug. Remember, bring your own bug. We're going to assume that we're going to do a speculative load. If we're incorrect, we're not going to load. And so those are the two cases we want to tell apart. So how can we tell these two cases apart? Well, let's actually take a step back and go back to our CPU design mode. So memory is really, really slow. And we want to speed it up. So what we're going to do is put a cache in between memory and the CPU. And so this difference in the memory hierarchy, having addresses that can be cached or in DRAM, is where that difference comes into play. That's how we're going to tell them apart. And so let's just revisit how caching works really quick. So this is how we're going to represent caches today. You divide your cache into sets in ways. So in this case, we've got eight sets in four ways. And the way we put an object or an address rather into the cache is we take the address, pull out a tag, a set and an offset, and we just look at the set. So we say which set of the cache we map to. So an address with two addresses that map in the same set will go to the same row in the cache. They can go into any column. We can't predict that, but we can predict which set they're going to go into. And so let's switch gears and think about this like an attacker. What can we do with the cache to do some type of attack? And so, again, revisit the fact that these addresses, we predict which set they go into. We can look at the address and say which set it's going to go into. So let's fill up an entire set. We'll figure out how big the set is and fill every line in the set up with our data. Then we're going to let someone else run. So this could be the kernel or just some victim. And it's going to do its own load pattern. So in this case, it has kicked out a couple of lines and you can see one of them hit the set that we're looking at. And we can reload our data and see if any lines got kicked out. And in this case, one of them did. And so we can see that we're going to be controlling the cache. And so this technique is called prime probe and we're going to see it again in just a few minutes. And so let's switch back to our design perspective. So there's actually been a lot of confusion that we've seen on the internet about what is the cache hierarchy of M1. And so these are the numbers we pulled right from the silicon through the cache ID register. So Apple's cache hierarchy is split up into a level one data, level one instruction, and we've seen this before already, but let's formally define what we mean by a Pacman gadget. So a Pacman gadget is the speculative use and check of a signed pointer. And so in this example here, it's just if some condition, we're going to do a check, and then we're going to try to do a load. Any time you've got that, that's a Pacman gadget. And so now let's walk through in detail how the data Pacman attack works. And so we're going to begin by training the branch predictor to enter this Pacman gadget, say if true, we'll pass in the condition true. We'll do this check on a good pointer that we know is going to be fine, and then do the load, and that's all good. And so we do that enough to tell the branch predictor, hey, we're going to take this branch all the time, so now it's going to start predicting that that branch will be taken. Now we're going to reset the cache, so just flush it out with addresses we control. And we're going to prime it by loading an eviction set. So let's go ahead and make the condition false, and we're going to load instead of a correct pointer, we're going to load the pointer we'd like to guess. And so we're going to hit this condition, and enter the speculative regime. So again, this should not be being run, but since the branch predictor believes it should be, we're going to start speculating on this. And so we're going to do the verification of our guest pointer and then try to load it. And now one of two things is going to happen. So if the guess was correct, the load will succeed and kick something that we own and the cache is going to stay as it is. And so all we've got to do now is go back to our addresses we just put there and ask, did any of these get kicked out? If it was, we had the right pack. That's the attack. And so now that we've done the attack on an abstract machine, let's actually do it for real. And so one of the things I want to focus on in this next section is not just what do you need to do to do Pac-Man, but I want to really encourage a discussion of what are the techniques that researchers use to do these kinds of things. And one of the things that I want to focus on is to do your own research using these techniques. So two terminologies here for the rest of our attack. We have two operations we want to do. So differentiation, you give me a correct and an incorrect pointer, can I tell them apart? And so we're going to look at graphs that look like this where we've got on the x-axis a number of cache misses. So zero misses being we didn't lose anything, seven or eight being like we lost a lot of addresses, right? And the other thing we want to do is brute force. So try every single possible pack and figure out which one's right. And so we know we want to get here. We know that we, in the end, we want to be able to say a correct pack leads to a load and incorrect leads to an incorrect load. And so what do we need to do to make this happen? Well, first we need to know when a load occurs. And since we're attacking the kernel, we need to be able to tell when the kernel loads an address. And we can't look at that. And so we're going to look at the cache. Next we've got to contend with the kernel in the cache. So before we can tell if a load occurs, we have to be able to measure whether or not the kernel is loading. So we have to form what's called an eviction set for this contention. And before we can even do that, we have to know if something's in the cache or not. So doing this very basic is the cache, is it a cache header or cache miss? And so to start we're going to begin by looking at some of the things that we need to do to make this happen. So we're going to look at some of the things that we need to do to make this happen. The first one is way too slow. And the second one isn't even present. And so we looked at these last two, the Apple custom cycle counter and our own multi-threaded counter. And we did some analysis on them and we found they're both pretty good. And so what we can do is use the multi-threaded counter. We'll talk about that in just a minute. And so what you can do is go into Ghidra and search for any reads of this privileged timer. And what you'll find is that there's a couple places that do that. So we have PMC zero as our cycle counter, that's the one we want to read. And if you look at how it's set up in the kernel, PMC zero which controls whether or not you can use this register doesn't set the user bit. So this bit is not set. Which means that user space cannot read this pointer. And so if you look at these files that are masked, so you won't, if you do like sysctl list, you're not going to see them, but these are there. And this is actually what the private framework kperf uses. But these sadly they require root. And they also take a trip to the kernel. So when you want to know what time it is, you actually make a system call, go through all this stuff, and then by the time you get back, you've wasted so many cycles that it's not really worth it for what we're trying to do. So that's pretty cool. And these two of them, there's perfmon uncor and perfmon core. They're defined in this file here. These actually do give you access to the timers without root. So you can read all sorts of really interesting MSRs without any ever going to the root. But it still takes a trip to the kernel, so we can't use it. But here's all the timers that I was able to dump from my machine. We're not really sure what all of them do, but they are pretty cool. So if you want to have some sample code in the kernel. So if we just need timing, why not just use a kext? Just install a kernel extension and set the timers up. And so we tried this, and actually a lot of groups have done this. I believe it was Dougal Johnson who pioneered this. But when you do this, the problem is when you switch cores, X and U will reset those timer values, and now that test case that you're running is going to seg fault. And so you can get around this like what we used to do is install a signal handler every time . So what we did is we came up with Pacman patcher. And so the way this works is it actually patches your kernel cache to enable timers universally. And so all you need to do to run Pacman patcher is download the kernel from Apple, run patch on the kernel image, and then just restart with the new kernel installed. And now you've got timers permanently. And so this is patching your kernel. We don't recommend running this on a machine to use daily, but I actually, as a matter of fact, I like to do this very well. And okay, so let's talk about the multi-threaded counter. So the idea behind this is we just want to know how long something is taking. And so what we can do is just wow true increment, right? And this works surprisingly well. So this is our real implementation of it. For the counter we just inline assembly store, add, increment, run. And that way we're moving things very fast. And for the timing access this is where you've got to be careful. It works best in our experience if you want to be careful. And so this is how we do it. This works really, really well. And in fact we did an analysis of it and you can see that everything that timers can register from the Apple performance counter, you can still differentiate with the multi-threaded approach as well. So our conclusion is that the multi-threaded timer can do whatever the performance counters do, but you still want the performance counters for those high performance numbers when you're really just reversing stuff, right? So we're going to start by loading the address we want to measure in the kernel, do load a bunch of things from user space that might contend with it, and then reload the thing in the kernel to see if we were successful. And so we're going to slowly build up this eviction set to see how far apart the addresses need to be and how many of them we need in order to evict kernel addresses. And so what we're going to do is look at the memory like this. So we've got different pages and different cache lines, and we're going to have a stride that hits the beginning of every page and the beginning of every cache line. So the cache lines and the pages interfere. So you're getting contention in the cache as well as the TLB, which is another cache. And so this is how you compute those addresses, and if you do that, you get a beautiful graph that looks like this. And so you can see for three different strides we've plotted three different eviction patterns, and at 12 addresses when you've got strides that are big enough and you have a large enough stride. And so we're going to use this 12 address eviction set as our eviction set for the rest of the talk. And so great, we can actually kick kernel addresses out. If we do loads in a certain way, we can not kernel things into DRM. So how can we tell when a kernel load occurs? Well, to start we're going to talk about what happens when the cache gets full. So this is something called the replacement policy. And now most caches aren't going to use a least recently used policy because that's actually very hard to put into real silicon, right? And so if we do four loads, we load a, b, c, and d in that order, when we want to load address e, it makes sense just to kick out the thing we just least recently used, right? Because it's probably not going to get used again. And so let's try to load e, and we're going to swap a for e, because again, a is the least recently used. So now if we scale this up to see how this affects prime probe, if we do an attacker load of a1, a2, a3, a4, and then try to load a kernel address that can contend with these caches. So here's what's going to happen. So a1 is going to get kicked out because that was the least recently used. Now we're going to go back and ask for those same four addresses in the same exact order. And so what's going to happen is a1 is going to replace a2, and then a2 is going to come and replace a3, a3 is going to replace a4, and a4 is going to replace k1. So you can see that what was supposed to be one miss, we only had the kernel kick one of our lines out, has turned into four misses. Every address we put in the same direction. And so by priming and probing in the same direction, you can trigger this cascade of misses that results in a huge difference that can be graphed very, very well. And if you don't want that behavior, if you want to just see it only one miss difference, we actually have a version of that, too. And all you have to do is reload in the opposite direction. So Pacman 2 can do both. We set it to the cascading mode because it's a lot easier to tell apart. Okay, cool. So we're going to have three target programs. We're going to start with the basic one to prove this works. We're going to have an advanced program, which is a C++ object lookup. We'll talk about some restrictions on that. And then we're going to have what we call ultra, which is a legitimate system call in the real kernel, completely unpatched. And so our basic victim here is an if statement with lots and lots of instructions that take a long time to run, gives us as much time as possible for our instructions to run. And then we're just going to do a simple authenticate and load. And this is what it looks like. So when you run this, the data on the left and the instructions on the right, both look really great. You can actually tell with 99% accuracy which one is which. It's quite good. But when you try it on this other instruction, so there's this third type of instruction, this BLRAA, which is a built-in branch and auth at the same time. And then the C is actually this disjoint pattern where yeah, your correct packs look really really good, but half of the time your incorrect packs also look like they're doing loads. So what's going on here? Well, if you look at it exactly 50% of the time, we're going to do a load. And so what we think is happening is for this type of instruction, there's actually a race condition in the pipeline where the authenticate and the load kind of run in parallel. And one of these things is that if you want to do a load, half the time the authenticate loses and you do get a load. And so we classify this as BLRAA is actually vulnerable, but it's just a little harder to look at. So for the sake of our demonstration for the rest of the day, we're going to remove BLRAAs and just look at BLRs. All right. So the advanced victim. So in this case, we have a different if statement. This is a more realistic kind of branch condition. And so let's start by revisiting the C++ method dispatch. So every C++ object is going to have this vtable pointer which points to its table of functions. So we're going to have to do a data load from this table and then an instruction call from the table entry. And so there's two attacks that need to be done serially here. There's the data attack as well as the instruction attack, right? And so if you look at the assembly, this is actually the same object we looked at at earlier. And we load the contents of that data pointer and then verify them as well. So it's two verifiers right in a row. And so what we'd like to do is set up the memory scenario such as that we have a forged vtable hanging out somewhere else that is signed correctly with a vtable pointer to that to a signed pointer to our code that we'd like to use. That's the end game here. And so forging the vtable pointer is straightforward. We've just showed that with a pointer. We know that the original vtable is located where it is, but we cannot forge this instruction pointer and we can't do that because we don't know how to train the branch predictor on pointers in this location. Remember, moving a pointer breaks the seal. So we don't have a good pointer to put here. And so there's a little trick we can use instead is to actually train the branch predictor on the old vtable. So have the old training running with the old external method. And then when we want to do that, we can use the vtable pointer to do the trials in this forged table like this. And if you do that, it actually works. And so the other problem we have is that the speculation window is very, very short. And the reason for that is because when we're doing our test, we have five things we want to do. We need to do a load, a check, another load, another check, and then a call. And we have to do that all of that faster before the CPU realizes it's not supposed to be running that. And so we want to make the limit variable take a lot longer to load so we have more time. So how do we do that? Well we've just solved that problem earlier on with kernel eviction sets. And so if you imagine the cache, this is kind of your mental model of the cache, we've got the limit variable in the cache, and that's loading way too quickly. We don't like that. And we also have an eviction set for that target address. Because again, we need to do the prime probe to know when the load is successful. And so if we do that, we're going to limit out with a second eviction set. So make the limit go all the way back to memory and then it'll take forever to load. And again, we need to be in different sets for that to work. And when you do that, the load now takes a lot longer, gives you plenty of time in that window to do those tests and do the forage. Without further ado, let's actually do it for real. And so in this example here, we're going to have a destination which is kind of, you can think of it like a ret to win. It's just a giant attack. So here on the left, we've got a log viewer trying to see if the kernel is doing any logs. And on the right, we're going to do the Pacman attack. And so I said earlier, Pacman 2 is an improvement on Pacman 1. And so one of the things that Pacman 2 does is instead of looking at every pointer in detail, it actually does two passes over the entire space of possible packs. And so the first thing that it does is it says, is this pointer potentially the right one? I don't care if it is the right one, it's not the right one. It's not the right one. And so we can look at all the pointers really quickly by doing a cursory glance at all of them, comparing the ones that look promising and then figuring them out from there. So let's go ahead and launch it. We're going to start by forging that data pointer. You'll see we're picking up some hits and we're going to keep track of those hits and how many misses they've got. And you see we're going through the space really fast here and it's almost done. We're going to go ahead and do the old vTable and then swap it for the new one right when we need to do the test. And we're trying to forge to jump to that nov slide with our win method. And you'll see we got it right and the kernel can actually print that it won. So we were able to forge both the data and instruction pointer successfully. Thank you. And so now let's use Pacman Finder to do one last attack. And we're going to use this as a case study of doing this in the real kernel. So let's take a look at the BSD system calls. And the reason for that is because those are really easy to reach. So I can spend less time looking at the gadgets to see which ones are fitting my constraints. Now if you were doing this as a real attacker you might want to target more specifically to the maybe the text you're looking at or whatever. But in this case let's just look at the BSD system calls. And so these ones look good. And of these we pulled out this one right here. So this is your code comparison branch and then authenticated load. So it is your quintessential Pacman gadget. It's very, very obvious. And so the branch condition is this proc plus 560. And the thing that's being forged is proc plus 10. Which is kind of a bummer. We'll see why in just a second. But it is a forging things in the process structure. And so if you look through the kernel the pointer in question here is actually the proc.task and that's what we can forge. And so the branch condition is the memstat limit field of the process and again what we're forging is the task field of the process. But the problem is that these are in the same page so our limit trick doesn't work. If we try to evict that limit we're going to make both loads take longer and we've had a net zero change in the speculative behavior. So that's not a great choice of gadget. Another reason this is not a great choice of a gadget is because this is in a commonly used field. So it's going to be probably very high up in the TLB. Which means that you're not going to get super great eviction patterns and a lot of noise. The other thing is that since this is a very commonly used field across the kernel many many asynchronous threads might try to read from it. So you could have problems where your test case has an invalid pointer that gets read, dereferenced and causes a crash independent of the pacman attack. And so we did this 12 times. Here's a great differentiation pattern. Third one we got a kernel panic and then the other three we just kind of didn't see anything. And it's kind of the same story for some of the other tests we did. So this one, there is a signal there that's, there's some signal there but the problem is that it's not super reliable. Again for the reasons we discussed it's not a super great choice of gadget. But the important takeaway is that for a gadget like this when we win we win big. It's very easy to tell that this is the best way to do it. And we just report we didn't find anything. And so those are two great points that make this a compelling attack. But you do got to be afraid of these asynchronous accesses. So if you are running something asynchronously you could actually result in a kernel panic which is not super great. And so I want to return to this question we asked at the beginning. So at the beginning we said pacman is a vulnerability in a pack protected systems. And we asked the question what is the consequence of us being greedy for more speculative behavior as well as trying to combine that with a fundamentally incompatible feature. So is pac fundamentally incompatible with speculation or is there something we can do about this? So I want, I leave that to you to come up with your answers. And I'd love to discuss this further with you if you have any thoughts on this. And so with that all of our code is available on github if you go to this repo we've got everything posted with that further ado that's pacman.