 All right. Good afternoon. My name is Jay P. Blake. This is Chris Rogers. We'll be presenting our work towards virtual disk integrity in real time, or because that's a mouthful, what we call VEATER. At a high level, our project consists of extensions to Zen's disk I.O. interface, BlockTap 2. Our primary goals were to extract faster and finer integrity guarantees about Damu virtual disks. So first, we'll go over our motivations for this work. We'll cover some related efforts, go over our implementation details, and wrap up with a few small video demos. So in high assurance environments, we want to know what the exact state of the VM we're starting. We can use hardware extensions to bootstrap trust by way of a measured launch of the overall system, but we care about extending that chain of trust to guest virtual hard drives. Now, when we started this effort, the main focus was on scaling the measurement of boot time integrity to larger drives. However, as we'll go into more later, with those hooks in place, the capability is also there to provide continuous monitoring. We all know that those security goals are wonderful, but we know that they can't come at the expense of usability, so our foremost concern and what we really wanted to prove out during this was that we could minimally impact the performance of virtualized disk IO. So disk and file integrity are obviously not new topics, but there are three related efforts that we're most interested in. As then client XT, my apologies to the enterprise guys. Tripwire and iNotify. We drew lessons from XT's handling of boot time integrity measurements and tried to improve on its performance. Along the way, we realized that we could also approximate the basic file integrity checking and work towards the functionality similar to that offered by Tripwire and iNotify, but with a different set of security guarantees. So let's review XT's architecture first, because it's the product environment that we're most interested in targeting. So Citrix and client XT is a security conscious client platform developed in conjunction with the Air Force Research Lab. And using the separation that Zen provides with VTX and VTD, XT is ideally suited as a middle system. XT has also taken some great steps towards the disaggregation of DOM0, dedicated network domains, and stub DOMs. And using Intel TXT and Tboot, XT executes a measured launch. As a protected partition, whose keys are sealed to the platform state so that it can be encrypted offline. And XT uses the VHD format for all of its VMs. And it was because of our interest in the XT product that we kind of honed in on that. Even though, I mean, as you'll see when we go along, a lot of the ideas are general and could be less specific. So XT has some unique extensions to traditional Linux FS layout. We want to talk about this just briefly, because some of the Vita enhancements rely on this layout. So Slash Config is an encrypted, protected partition that is encrypted with the keys that are put in a sealed blob that can only be decrypted when platform measurements in the TPM match the known good state. When we discuss the Vita implementation, we'll describe how its keys are expected to be stored in this location. Slash Storage holds the virtual hard drives for both guest VMs, for the service VMs, obviously unencrypted. And the rest of DOM 0 is measured in moment read only, that's the Linux in enforcing. So going over XT's measuring process. On the boot of a measured VM, such as the network domain, the tool stack measures the VHD's corresponding TAP device by way of a shot one sum. And then it compares this hash against the known good hash of that VM's VHD that is stored in the config partition. Depending on if they match, if they don't, it'll follow the corresponding policy. So moving along to Tripwire. Tripwire is your classic UNIX file integrity checker. And it's a tool that will run at the application level inside the target environment. When it's initialized, it creates that database of system object information. It's basically hashes of the files. And this amounts to snapshot the current system state. It compares the result, future object scans, journaled men on a cron job, against the existing entries in order to detect changes. And then it can notify the user admin. So I notify was added to the Linux kernel a while back. And it was basically that subsystem to support monitoring of file system events that could be reported to applications. It really amounted to that real-time file system monitoring, but also from within the target environment. So a shot sum for the entire disk, very coarse grained. I mean, it really only lets us know if anything on the disk has changed. It's coarse, but it's definitely a good starting place. What's unfortunate is that hashing those big drives is really pretty slow. XT skirts around this problem by using open embedded-based service VMs, which result in much smaller drive to measure as opposed to using Debian Weezy. However, we don't always want to be forced into using an embedded disk drill. Tripwire and I notify have their applications and usefulness depending on the threat model. Our threat model is different in that we do not want that integrity monitoring code to execute in a potentially compromised host. So having explained what we see as some of the limitations in XT and Tripwire and I notify, our goals really aren't too surprising. So we're looking to achieve a faster equivalent of a shot sum on a TAP device belonging to a guest disk. We also wanted to make progress toward providing a DomU file integrity monitoring tool from DomZero. I'll hand it over to Chris. So first I'd like to begin just to discuss some of the high level design goals of Vdirt. First we felt that supporting the dynamic VHD format would be most effective for our tool, mostly because it's ease of use within XT and the basic structure of the VHD was just easy to work with. So we did have to make some changes to the VHD format such as adding our list of hashes that we'd like to track during the course of VM runtime and I'll go into those in more depth momentarily. In addition we had to make some modifications to BlockTap to actually implement the tracking scheme. So the idea here then is instead of incurring this large performance penalty VM boot time we can just amortize this bottleneck by measuring over the course of the VM as it's running. So really the core of Vdirt, it's really pretty simple. It's very similar to Tripwire. When the DomU invokes a read, Vdirt intercepts the data before it's sent back to the DomU and it compares the stored hash for that particular block with the hash of the recently read block. And then similarly for the write operations the block is hashed, it's committed to disk and then the list of hashes is updated in memory. If of course the VM is allowing writing to disk. So understanding BlockTap call can just be a little tricky so I just want to take a quick pass through typical BlockTap call. BlockTap 2 is disk's ZenIO interface and so first we have the back end establishing itself in the Dom0 kernel and using an event channel or shared memory ring we can communicate with block front in the DomU kernel. So then an IO requested pass from the user space to the kernel space and then it's redirected to Dom0 and BlockTap. Once this request reaches the BlockTap in the back end it's forwarded to the TAP disk utility which is responsible for dispatching it to the correct IO library. BlockTap 2 is nice in that it supports a variety of custom disk interfaces and it even allows you to write your own so VEATR could be extended in the future beyond just the VHD format. Once the VEATR is complete the data sent back to the shared memory ring for the DomU to utilize. So here I just want to take a quick moment to talk about our terminology for our tool. It can get a little confusing. So the default VHD block size is going to be 2 megs. That's what we worked with. The DomU default file system block size is 4k and BlockTap also handles data at a granularity of about 4k. So we call the DomU blocks sector clusters just to avoid the confusion when referring to them from the BlockTap perspective. And here at the top the hash list is just an array of all insums that represent the present hash of any given DomU block. They're mapped one to one. And so earlier I mentioned that we had to end up modifying the VHD structure. I understand that's an issue in terms of extensibility. Oh boy. Can we get that? Alright, so I guess the format's a little off. Alright, so in gray here you have the typical VHD format as defined by the specification. And in green we have our header that we've added to the metadata here. And so what we'd like to show you is there's one, two, three, four, five sections of this, what we call the hash header. The first one is an entry. It's simply the absolute byte offset from the beginning of the file, from the VHD to the next piece of metadata. So in this case the header. This is just implemented similarly to the other pieces of metadata here in the VHD structure to quickly traverse the VHD. The second entry is simply a number. It's the total number of hashes for ease of use with our computations. Third entry is just a simple flag. We feel that either it should be enableable or disableable without needing to recompile then itself. And then fourth entry is our list of tracked hashes that we're going to store across the lifetime of a VM. And the last entry is a pad. Just keep it 5,000 byte aligned. So obviously the list of hashes can get quite large if we're hashing each of these Damu blocks so storing them all in memory quickly becomes a problem and I'll talk about that momentarily. Okay. So this slide was intended to sort of trace through a Vdirt block tap call. I'm actually going to diagram this in our paper. I'm sure we could provide that. So I'm just going to walk through it. Very similar to the normal block tap call except once the VHD is created, future reads and writes through the tap disk utility are going to be measured. And so once we're down inside a block VHD when a disk grade is scheduled we determine that it's a write and then using global and block offsets we compute the proper index for this block in our hash list. And then we hash the block of data that's going to be written and the index and the hash are both stored inside the IO request data structure which is then shipped off to the AIO library. We let the IO happen and then on the callback for which is VHD complete we then update our hash list because we know that the write has completed and that we need to keep the hash list and the disk blocks in sync. When we verify a data read we also compute the sector cluster index but we don't hash anything because we don't have anything to hash. So we just allocate space and we let the IO happen and then on callback that's when we verify the read. And simply compute the hash of the data that was read and then compare it with the hash that's being tracked. And policy can dictate whether or not to stop VM execution or simply print out a warning to the user and form the user that some sort of integrity has been violated with the disk. So obviously this list of hashes that we're tracking in the VHD metadata is sensitive information we want to encrypt it. And since we're targeting here done clients or XTs environment we expect that protected partition to exist and be a place where we can store our decryption key. So in order to guarantee the integrity of the hash list we encrypt it before writing the changes back to disk and that way the next time the VM is booted a failed decryption will result in an automatic integrity failure if say the list was tampered with in an offline state. So in order to maximize performance as well Vdirt is only designed to commit these trashes on VM shutdown so we realize that in the context of say a power interruption it's going to become a concern. If the interruption is malicious in nature such as an attempt to modify data to circumvent some read verification Vdirt will still detect the invalid data the next time it performs a read because the clusters in memory and the data on the disk are out of sync and that's great. But if the power interruption is coincidental then the VHD should not necessarily be marked as compromised. So we could use a simple flag to just indicate a proper hash commit. So really with Vdirt we set out to achieve faster equivalent of hashing the entire disk similar to XT and we did simply by tracking the measurements of each individual disk block and then hashing that entire list for a single unified measurement. As far as watching specific files we have a demo to show you where we've hard coded the block addresses of these files just as a very simplistic case to show this capability and in the future we see a lot of potential that Vdirt could help with implementing a honeypot type of virtual machine or analyzing the disk footprint of malware. So performance is a huge concern for us. So Shah summing a block 4k quickly adds up in memory for a one gig disk we have a hash list of 5 megabytes. That's not bad but if we keep scaling up there's eventually going to be some bloat and I'll be showing you some performance data that we measured. It's for a very simplistic case just reads and writes. So here you can see sort of the scale. If we have an 80 gig disk we're going to have a hash list of about 400 megs and as you keep increasing the size of that disk the hash list can get larger and eventually could become a memory issue. Also throughout this talk you probably noticed we've been a little hand wavy about policy details in terms of verification or handling failure. Vdirt really is in the prototype stage right now so we don't have a full policy engine implemented but we thought it was worthwhile to point out the opportunities where policy could dictate the correct option to take. So this is just a graph of comparison between Vdirt and native open source Zen for a synchronous IO workload. Modified Vdirt implementation is the darker graph and the darker bar and the lighter bar is native. As you can see there was not very much of a performance penalty. When using or when doing these simple IO tasks. Similarly this is the bandwidth for each of these tasks as well. Cool. So I'll show you a couple quick demos. So you're looking at right now how we ported VHDU tilt to basically take out our tracked root hash. We're just going to fire him up. On the right hand side we're kind of tailing where Vdirt is logging different messages. You can see between, I love VLC, what it's seeing as root hash. You're seeing it match up right now in the real time log. We booted this VM just read only and then we're just going to detect when it's remounted read right. As soon as we do that we're going to see a lot of action on the Vdirt log and once he shuts down we're going to see his last root hash match up with what we call VHDU tilt again. Really the VHDU tilt call is going to be the analogous action to when XT makes its tool stack call to measure it. So that was really the main goal. We're just going to see our root hash over here match up. So the second video here is where we're actually watching a specific file inside the DOM U. Again on the left we have that DOM U booted up and on the right we're in DOM 0 tailing that log. So we have the file we actually care about watching. Obviously right now for file system context specific we're going to just move him over and what we're doing in the back is we determine the block offset addresses for the inode and data for watch me and that's what we're tracking. So your simple moves to inode this is where we're actually messing with the data that we're watching and the VLC capturing at its best. And then again kind of like your trojan idea that you're just going to replace that file and again that's something you're going to detect so by inode change we sync up and catch that as well. So and there you have it. Thank you. Questions anybody? Everybody's ready for the coffee break. Did you look at using something like Merkle hash trades to enable you to do the effectively have a chain of hashes all the way up the top more incrementally? Yes we did actually that was our initial choice for this data structure but as we started experimenting with larger disks that quickly became impossible to maintain in memory. So short of implementing a caching scheme for this hash list we found that the Merkle tree was just too large in scope. Introduced a lot more overhead than what we really needed tracking all that but that was our thoughts to when we started out. Yeah I mean I guess for what you're doing you don't need the incremental consistency which is able to give you. I think it's hard but I think you could make it work. Thank you. So you're effectively doing lazy integrity checking of the disk, right? Instead of doing it all at once you're doing it as the pages get read or written for the first time. So I'd expect there to be a significant performance overhead if you were to read a file for the first time a large file when it's coming up. I mean did you do any kind of performance evaluation like that? I mean when we staged the VMs initially you know from the GitGill it's with the block tap instrumented with Vdirt and there's really no extra time for doing you know like an install of the Weezy VM you saw. So there's no you know just common usage we couldn't see you know a big hit to performance and that was like the main question that we were seeking to answer. Thanks. Nowadays I think Xen mostly use QM as disk backend do you think it would be hard to port your work to QM that is another basically provider of disk backend in user space? So that was kind of what I was hinting at when we started out. I mean Vdirt itself very general idea like oh we're going to intercept you know the call kind of in transit. Again our use case we're targeting XT so we just kind of dialed in. I guess I'm not as familiar with you know the Q disk and came you kind of back into I would hope not. Anybody else? I seem to not. Okay thank you very much. Thank you.