 Hi, everyone. My name is Alexander Bulyakov. I'm a PhD student at Boston University, and I'm an intern at Red Hat Research. I'm excited to talk to you today about some of the work that I've been doing on fuzzing virtual devices in hypervisors. So I think we've probably all seen a picture similar to this, explaining what the cloud is, what the hypervisors job is. But to sort of give a brief reminder, the job of hypervisors is this piece of software that's running on a system, and it's trying to partition the resources of that system among various different tenants. And these tenants, they care about running applications, and the hypervisor is providing resources to these applications and also providing isolation between these applications. And traditionally, the way this is done is, you know, there's a fully virtualized operating system allocated for each tenant. So the OS that's running on the servers is virtualizing multiple additional OSes for each tenant. And I briefly mentioned isolation. And what I mean there is, you know, one of these tenants might have been compromised, or maybe they are malicious from the start. And the job of the hypervisor and the cloud providers to prevent this malicious tenant from negatively affecting the other workloads of the other tenants. And further from compromising the actual underlying server. So this would be the bad scenario where this compromised guest is able to break out of the isolation guarantees of the hypervisor and run code at the privilege level of the hypervisor. And to sort of understand how something like this might happen, well, let's rewind and let's remember so the hypervisor is running on a server that's running on real hardware. But these guest operating systems, they were also designed to run on hardware or something that looks like hardware at least. So all of them expect to have this sort of view that they are, they are the ones in fact that are interacting with the hardware and anytime application running on a guest wants to communicate with the outside world or communicate with the hypervisor, it has to do this through this device, this device layer. So the hypervisor provides these virtualized devices that govern the access and the resources that all each of the guest VMs share with the outside world. And since this is the critical interface, this virtual device interface, this is the one that we're interested in protecting these virtual devices. But the problem is that when you're building a virtual device, sometimes you're lucky enough and you have a spec for a real device that you're basing your code off and you just read a couple hundred or a couple thousand pages of spec and then you sit down and you write a bug free virtual device. Or sometimes you can write even your own spec for a power virtual device, which is interesting, but in the worst case you actually don't have a spec that you can design a device based on. You either have to reverse engineer the actual device or you have to reverse engineer a driver for that device and go from a driver to a virtual device driver. And as you can probably guess, each time you want a virtualized device that can be weeks or months of work and they're often implemented in fast but unsafe languages such as C. And in general, this is just a very error prone process. So it's a little scary that this is the code that's governing the isolation guarantees that we expect from the cloud. And unfortunately bugs in these virtual devices aren't just a theoretical problem. In fact, there's bugs found all the time in hypervisors and the common thread between most of these bugs is that they live in the virtual device code, which is the huge percentage of the code of a typical hypervisor. Probably the most well known bug of this type was the vendor vulnerability, which was discovered in 2015. And this one is actually quite a simple, you know, buffer overflow type vulnerability that you learn about in any class that in any class where you learn the C language. And what's worse, it was it was in a floppy disk controller, which basically exists is connected to a bunch of virtual machines and hypervisors such as Zen, virtual box and QMU for legacy reasons, right, because PCs originally had floppy disk controllers. So if these were the types, this bike's bug sort of made people wake up to the fact that, you know, the security of these virtual devices really needs to be taken a lot more seriously. And so looking at what a lot of the other the rest of the industry has been doing. Well, fuzz testing has gained a lot of traction. So when you do normal testing, you, you write some manual tests and you make sure that whatever the result of your test running over an application is, it's what you'd expect, right. Fuzz testing from a bird's eye view is basically the same exact concept, but instead of you writing a manual test you let the computer use a random number generator to provide some randomized data to your application and you make sure that, you know, in general your application sort of behaves properly and doesn't just completely crash. So that's basically that's basically fuzzing, you know, you provide randomized inputs to your application you make sure that, you know, your application doesn't crash spectacularly. Yeah, usually you can get more advanced with that, but that's, that's in general, what fuzzing is used for. And, in fact, there's a lot of fuzzing work that's, that's gone on and it's been combined with, for example, coverage information where you can track the coverage that each of your fuzzer generated inputs achieves over your application. And based on that you can sort of judge whether that input produced some interesting behavior, or whether it was just something that didn't completely didn't make any sense in the context of the program and if it was interesting. You store it for later and you use that saved input as the basis for further mutations. And, and you can also build your program with sanitizers to find entirely different classes of bugs that are typically hidden. So, simple enough, right. This fuzz testing has completely, you know, has has proven to be very powerful it's used in a lot of domains ranging from like image parsing libraries to even the kernel. So you just need to find, you know, where the virtual devices are and provide them with these fuzzer inputs right. Well, that's a little bit easier said than done. So if we look at the input space for a device, so how the CPU interacts with typical devices. First we have on x86 machines, a address space of 64 k addresses, each of which can be mapped to a particular device and when the CPU wants to interact with these port devices over port IO. So this is a special instructions such as inner out to read or write a value to this device and when that instructions executed, the device receives the request that does whatever it needs to and then the CPU continues that it's execution. And then in a very similar vein, you know, we're all used to programming with memory and using memory. Well, parts of memory can actually be mapped to devices as well with memory mapped IO. And in these cases instead of when you read or write from those addresses, instead of them, those requests hitting ram, they end up also getting routed to the virtual to the devices and then the device again does whatever it needs to to to handle that request and then the CPU continues execution. And right off the bat, there's some complications here. So, for one, a lot of these memory mapped import IO regions aren't actually mapped. As soon as you boot the machine, you actually have to go through, for example, PCI configuration to enable further memory regions. So when you before you actually start running any code on the CPU, you have no idea when and where a lot of these memory mapped import IO regions are going to be. So it's not something that's set in stone and they can they can shift around based on your interactions with the devices. And then what I what makes this even worse is that there's actually a third mode of IO that's very commonly used and that's DMA or direct memory access. So the problem with port IO and memory mapped IO is that for each, you know, byte or set of bytes that you want to communicate to the device. You have to, you know, the CPU has to run an instruction wait for wait for that request to go through and then, you know, run another instruction and so forth. So this is, this really wastes a lot of the CPUs time and it's, it's quite a slow for high bandwidth applications such as, you know, network cards. So for these cases, we have direct memory access where instead of writing the actual data to the devices. So all the CPU has to do is it writes the address and the length of or it just writes the address of some data to a virtual device or to a device over port IO or memory mapped IO. And then the device will asynchronously go ahead and read, read the data from that data from memory. So if it's done handling that, for example, it can like raise and interrupt or or something like that to communicate to the CPU that it's done processing that input. So the CPU is free to run whatever code it needs to in the meantime. And this is great for high bandwidth applications such as network disk, you know, video. And also a nightmare for a fuzzer because it means that our input space encompasses all of memory, all of memory, basically, and because of the PCI mappings that I mentioned, our input space also encompasses all of port IO. So if you want to fuzz a virtual device, we have to consider this entire input space, which is absolutely intractable for an off the shelf, a shelf of fuzzer, considering that, you know, combine this can be gigabytes or even exabytes if you're talking about 64 bit machine of input space. So this, this is the main problem with fuzzing virtual devices on hypervisors is this huge input space. And to sort of emphasize this point, I want to briefly walk you through an actual bug that our fuzzer discovered. I'm not sure why that tab is down there but instead of actually trying to explain what it does I just copied an email that one of the developers I was fixing this bug in in this 1000 network card. Responded with to the original bug report so they say the bug is interesting which is probably, you know, as much praise as you can expect. When you're reporting a bug. And, and then on the left, I have the actual port IO and memory map to and memory instruction. So first, as I mentioned, there's this PCI controller that can map additional memory map to your regions and that's basically what these instructions do. And they lead to this memory map to your region down here being created for the 1000 device. And then once that's done we can start interacting with that memory map to your region in, and in this case we basically set up a packet transmission request. Notice that there's these purple regions that appeared well these are the DMA. These are two DMA buffers that to actually reproduce this, this bug. So these, these could as well be anywhere in memory they could have been all the way over here it's just that the address that happened to be written up here in this instruction, provided this, this particular location. So, I actually drew this picture slightly and accurately because the bug entails the fact that is involves the fact that this region over here actually overlaps the devices that you 1000 devices memory map to your region. So, when the 1000 tries to write to this DMA buffer it actually ends up writing to its own memory map to your region, triggering a reentrancy bug and basically it ends up freeing some resources that that's that we're still in use causing a use after free bug. So, I guess the takeaway here is that we needed to rely on all three modes of IO in order to reproduce this bug in order to generate this bug and also if you just look at like these addresses here. There's the it just emphasize that the input space is enormous the the actual addresses we write to are only a tiny tiny fraction of all of the possible addresses. And the fuzzer generated from this from scratch. So this is sort of what the crashing trace looks like as I said this is a reentrancy bug so down right here we have this segment of 1000 code and then. And then there was this DMA access that I mentioned and that leads to another nested mmio access leading to a use after free one. So, if you want to tackle this enormous input space. How do we do this. We are. Well, first let's look at the port IO and mmio part of the equation. So how do you hypervisors even implement port IO and memory map to you. Well, when I guess when the guest CPU or the guest tries to access normal locations and Ram. That goes through fine right there's no the hypervisor doesn't take over to service those requests generally. But with what we want to do with memory map to use instead of letting the device just read or write from that location and Ram or memory or even worse like the underlying privilege device. We want to intercept that request and handle it in the virtual device code. So what the hypervisor can do is, for example, unmapped the pages that correspond to the mmio range. And so when when you do that. When when you access address and mmio from the guest, you you trap into the hypervisors code. And then the hypervisor can inspect you know what address led to the trap and inspect the the access. And identify you know what what virtual device code it needs to run in order to service their request. And the key thing here is that the hypervisor needs to be needs to keep this mapping of guest to guest physical from guest physical addresses to virtual device handlers right. And it does this and something like a table. The hypervisor needs this to to perform any of my own functionality or poor functionality. So what we can do is, as the buzzers we can just keep track of this table and if we if we keep track of this table we always have an accurate representation of the poor day on the memory map to your regions on on the guest. So that was party on memory map to the more complicated one I think the more interesting one is direct memory access right so there's no table of direct memory access regions because they could be anywhere in in memory. I don't remember so how the how DMA works is usually the CPU writes some some address of a DMA region over poor memory map tail and then eventually the device decides okay I need to fetch this this this buffer over over DMA. And it does that it. It can't just do you reference a pointer pass from the guest right because the address space of the hypervisors completely different from you know what they what the guest view of of the address space is. It needs to somehow convert the address that I guess from the guest into an act and into an address or access into a buffer that I can actually reach from the context of the hypervisor. And to do this hypervisors generally implement a memory access API so QMU implements a set of functions that are convenient to use for virtual device developers to just read or write some data to a from a DMA location in guest memory. And of course you probably know this though what's coming already. Yes, the fuzzer can hook these memory accesses this memory access API. And in fact what it can do is interrupt execution or intercept the execution of the memory access API and make sure that there's some fuzz fuzz data at the location that the API is about to access. Before it returns execution to the memory access API so when a device accesses some data over DMA the fuzzer takes over execution quickly fills in that region with randomized data and then gives access back or gives execution control back to the memory access API so it can read that data and and as if it was always there. So to bring this all together, we we hook into these two hypervisor essential hypervisor constructs so we have this table of port I own my origins and the DMA access API. And then we have a interpreter basically for all of our fuzz the inputs where we where we interpret fuzzer input fuzzer generated random data into port IO and memory map to your instructions. And then when a device eventually tries to perform a MMI OAC DMA access, we use some of the fuzzer fuzzer provided data to fill that DMA region in just in time. And eventually we will find some crashes because this this way we we can guarantee that each bite inside the fuzzer provided input is going to directly impact a virtual device in some way we're not we're not at risk of wasting fuzzer provided data on you know writing to Ram that's not going to be used for anything or writing to a port IO address that isn't mapped to any virtual device. So finally, one thing that we want to do is you know we hypervisors don't normally have this like just in time DMA functionality that we use so what we can do is we can actually collect a when we find a crash with our fuzzer. We can reorder the order of the commands so that the any DMA just in time commands will directly proceed the pre the prior port IO or MMI request. So that way, when you rearrange it this way by the time the request that triggers the DMA access executes, it's as if the DMA data had already been there the whole time. So we implemented this for a QMU and as we were doing this we kept track of you know the coverage that we achieve over various devices, and we could zoom in on individual lines and to see you know what how to how to improve our fuzzer and this was eventually the technique that we came to. And because of the way that our fuzzer functions. It's very easy for the developer to reproduce a crash and that was a big focus as we were designing this so when I when we do find a crash. We can actually send it to the developer by email and they can just copy and paste it into their existing build of QMU to reproduce the crash immediately and they can see exactly which. Port IO MMI on DMA request led up to the crash. So we've already found reported in fixed bugs in a wide range of network devices graphics devices audio devices and there we've reported over 6060 bugs there we there's I think 60 v's associated with bugs that we've reported up to date. And most of our work is actually already upstream within QMU and because QMU is such a popular hypervisor our work directly benefits a lot of projects that depend on QMU and because QMU is open source we can also take advantage of programs such as OSS fuzz so as I'm giving this talk right now are the QMU code is being fuzzed in real time on the cloud somewhere and bugs are being found. And as soon as new code gets committed to QMU we fuzz that code as well to catch bugs as soon as possible. That was my talk I'd like to give a huge thank you to everybody who reviewed my code and helped me come up with various techniques for fuzzing virtual devices. I'd never be able to do this without my mentors and the rest of the QMU community. And I think that the takeaways from this work can also be applied to a lot of kernel fuzzing and even stuff like browser fuzzing efforts. If you thought this was interesting you'd like to talk more here my contact details below and with that I'll end my talk. Thank you for the amazing talk Alex. Folks feel free to put your questions down in chat or you can also carry on the conversation in the breakout room the link to which I just posted in chat. Okay I guess I'll go to the breakout room then. Thank you. Sounds good.