 I'm Adam Baldur. I'm a Principal Software Engineer in the Zen Server group at Citrix. I've been in the Zen Server since 2009. My main areas of responsibility are the Windows PV drivers and lately virtual GPU as well. This is about Windows PV drivers. The day's talk is basically kind of a history and a tour of what we've done. I want to start with the drivers in the state that they were in in Zen Server 6.0.2, what we did in 6.1 and why and where we want to go in the future. That includes making the drivers available to a wider audience, making the work on installations other than Zen Server. I want to give a bit of insight into the general driver structure and also then move on to how one might write another PV driver where it would fit in the stack. The kernel interfaces we expose to make life easier for doing that. Then I'll finally wrap up with how to take one of our current drivers, which is now open source, build it and install it yourself. Why should you use PV drivers? Well, HVM guests get emulated hardware as we've been hearing. On Zen Server today, that means basically you get up to three ID disks because we need the fourth ID slot for the CD ROM. I believe we can do up to eight real tech network drivers on network devices. There are 100 megabit devices, pretty ancient. We use real tech because that's the only inbox network driver that Windows XP has and we still have to support Windows XP. Emulation is done by essentially trapping IO port accesses or MMIO page faults. It's very slow. Not so bad if your hardware is DMA driven, but pretty much those devices aren't. So it's pretty slow. Migration is also problematic. Theoretically it should work, but in testing we've done in Zen Server, it's pretty flaky, falls over quite a lot of time. We don't have source code for the device drivers and Windows, so we can't really debug it very easily either. PV avoids these limitations by massively reducing the number of VM exits by not using IO emulation and page faulting for MMIO. Remove the limits on the number of disks and nets and increase them massively. We have the code for the suspend migration path, so we can debug it. The VM can suspend and migrate cooperatively, which makes it less prone to failure. I'd like to go through a bit of notation, first of all, I'm using in the slides. Windows drivers basically hang off of physical device objects which are enumated by bus drivers. So there is a top-level bus driver in Windows which goes in enumerates its bus and then everything cascades tree-like from that point downwards. So when Windows notices you've got a new physical device object, that's when you see that the pop-up saying, found new hardware, it'll wander off, try and find a device driver for it. Device drivers in Windows, the package just carries something called an imp file which basically specifies what type of device that device driver will bind to, so Windows uses those to go and discover new drivers for hardware to appear and it will go and install drivers onto those devices. The driver that binds to a physical device object is called a function driver. It creates a function device object. Function drivers will also be bus drivers, that's how it cascades, and they can enumerate new PDOs. I've used a little bit of notation there to denote the Zen Store keys that I'm using to create PDOs in PV space. The dotted lines that surround are the packages that ship the actual drivers, because you're allowed to ship multiple drivers in a single package in Windows. After the side there, that purple one, there's another type of device driver you can write in Windows called a filter device driver, and that can sit above physical device objects or function device objects and intercept some of the IOs that they do and thereby change their behaviour. This is the driver structure we had in 602. It's a pretty odd driver structure. It hangs off of two parents, rather than just one. One of them is called a routinium rating node. The one on the left, that's created by an installer. The one on the right is synthesised by Keamie, that's a PCI device. The storage driver directly hung off the PCI device. The main reason for that, I believe, was that it was thought necessary to do that to allow crash dump to occur properly, because if you're writing storage device drivers in Windows, there's really no documentation to do it. You just expected to write a storage driver for a PCI device, and then it all magically works. This doesn't quite ring true, but I don't think anybody really wanted to take on the job of trying to write a non-PCI storage driver and then trying to figure out what didn't work and why. So, ZenEvention and ZenUtil are the main interfaces to Zen, but as you can see, they're kind of split across two packages. ZenUtil has to be shipped with MVPD so that we can do the crash dump support, but ZenEvention basically enumerates everything else, and talks, and the interface to Zen is split between those two in a very odd way, and it's quite hard to manage. ZenBiff, there is a class driver for network. It was easier to do a class driver for network because supporting operating systems from XP through to Windows 7 meant we had to deal with two different Endis protocols, or Endis MiniPort wrappers, so actually it made a lot more sense to put the network code into one driver, which you could then ship on all versions of the OS, and then leave the Endis five, Endis six drivers at the bottom to be very small, little thin wrappers on top of that, so that's why we did it that way. You'll also notice there's a driver there called SCSIFILT, which sits below each of the disk drivers. The disk drivers are shipped by Windows, they're just part of the standard OS, but the problem is that we were using a storage model called SCSIPORT, again for compatibility back to Windows XP. SCSIPORT basically has a single queue per HPA, and calls all your entry points at intro level, so it's extremely hard to use. You can't allocate memory, for instance, basically in your driver. You have to get the frame to do everything for you. It does it in a pretty horrible way, and most things end up being single threaded. So by sticking the filter driver on there, we could avoid most of the IO even entering the SCSIPORT wrapper. We sort of cherry picked it out before it got there and shipped it over the PV ring first. That way we could actually write a sensible storage driver and do everything at a much lower intro level. So I should have probably said all this already. So I'll move on to the next slide. Actually, I'll go back to the bottom of that one. There's a problem with the structure we had before. The use of the root node, the installer created root node, meant we actually needed an installer to install these things. The cross package, it meant we had to upgrade everything as a unit. One of the desires is that we eventually move our drivers onto Windows Update because at the moment, if you want to upgrade your drives and you want to install your drives, you've got to run the installer on every single VM on your host. If you've got several hundred VMs, that's pain. By putting stuff on Windows Update, the hope in the future is that you just install Windows on your VM and then it just goes and grabs the drivers all by itself. You don't have to do anything particularly active. You just sit around and wait and it will just deal with it itself eventually. We haven't done this yet, but that's where we want to go. Around that time, enter Windows 8. Microsoft had this big get-together at the Anaheim Convention Centre in LA and came up with a small bombshell, which was... They decided that they'd ship a new WDK, or Windows Development Kit, Windows 8. First of all, you had to use Visual Studio to build your drivers now. Worst, you actually have to use a paid version of Visual Studio. You had to throw away your old build script. The other thing was that they knew WDK, you could only build drivers back to Vista. Two K through an XP still in support, you couldn't build drivers for them anymore. Now, you couldn't have a single binary, which would support an OS all the way from XP through to Windows 8. They also told us that you had to build your drivers with Windows 8 WDK if you wanted to logo them for Windows 8, which we want to do because we still need to run logo drives. So, we had to do something. So, we decided to come up with a new set of drives. So, if we send service 6.1, since we had to come up with a new set of drivers, we might as well choose some reasonable goals. With Windows Update in mind, we wanted to get rid of the need for the installer. We were still going to have an installer because it's a friendly way to ship drivers. We don't have Windows Update yet, so we still have to have a mechanism for people to install. We wanted to get rid of the cross-package dependencies because the whole upgrade as a unit thing makes things fragile. So, if we could have versioned interfaces between the drivers discovered at runtime, then they could be independently upgradeable. And that would be necessary for Windows Update because we couldn't enforce people to upgrade drivers in a unit. We couldn't even enforce the order in which they would upgrade drivers. And also, because we were doing this split anyway, we decided these new drives were only going back to Vista. And if we were doing that, then we could get rid of the Skuttyport model. We could use a newer storeport model which basically removed all those problems that Skuttyport had. It was a cuper disk now. It didn't enter your driver at intrate level. It entered a dispatch so you could allocate memory. So we didn't need SkuttyFill anymore. So this is the new structure. A lot simpler. Single parent. Zen and ZenBus replace Zen utilans intervention. More sensible names. Zen is basically the whole of the interface to Zen. So it's an export driver. It's essentially a library which the ZenBus driver uses. So it's the thing that sets up a hypercall page and actually marshals all the calls through to Zen. ZenBus is there basically to talk to ZenStore and enumerate children and provide more abstract interfaces which we'll go into later. ZenDBD is now store ports and no SkuttyFill, much simpler. However, you'll notice the addition of one filter driver in there which is ZenFill. ZenFill is there to actually to pose on the REIT PCI driver. And the reason is that we still need to get in there before the REIT PCI driver and enumerate simulated devices because we've still got to do the magic IOPort unplugged to key in you to suck the emulated networks and such things appearing when we're using PV drivers. It's also around this time or actually shortly after we came to this driver structure I guess it was just before ZenService 6.2 or maybe just after, I can't remember. We also decided to open source the drivers. So pretty much this was done by just taking the source we had chucking it up on GitHub split up into the five driver packages each one in their own repo sticking a BST2 clause license on them and there you go. We chose BST2 clause because we couldn't logo them if they were GPL Microsoft's test agreement prevents you from using GPL. We went back to them with different variations of GPL v3 and LGPL but nothing was good enough so BST was what we stuck with. A couple of problems though with these drivers as they stood we had a patch key new in ZenService actually you'll notice from the previous slide if you're eagle eyed that we changed the device ID on that PCI device from what it was before there was a good reason for doing that which is that if we ever went on to Windows update with drivers that bound to the old platform PCI device then everybody in existence running Zen with a Windows VM would suddenly start getting Windows update drivers which we thought was probably an unfriendly thing to do so we changed that. Problem is that only ZenService creates that device so even if you wanted to use the drivers you'd still have to patch your own key new if you weren't using ZenService. Other thing was that all the build scripts assumed you were working for Citrix and had Citrix as code signing certificate so you couldn't actually build any way without hacking the script. So we have to do something about this so this is what we came up with the idea of the upstream drivers. Largely this is the 6.2 PV drivers with these problems removed. So we went back to the emulated device unplug mechanism I should have mentioned we also used another emulated device unplug mechanism in ZenService 6.2 which was never upstream. So with the upstream drivers we also used the existing emulated device unplug mechanism that's available in QEMU and has been forever so you can use pretty much any version of QEMU you want to. The platform device ID we made the drivers bind to the old platform device ID I'll go into that in a moment but I also upstreamed a patch into QEMU to create a new PV device for the purpose of doing windows update in the future so that way we didn't have to modify the existing platform device you could just add a new device to your system if you wanted to get drivers from windows update. The goal was to make it work on pretty much anything so I've tested it on Zen back 64 up to 44 I've tested with 32 bit 64 bit Debian DOM 0s only one thing I ran into was problem with Netback relying on Udev because of the way that Windows needs all its devices to start in a closed state the tool stack creates the devices in a sort of initialised state so it actually has to then close them and then restart them and that cycle through closed was relying on Udev kicking or basically reinserting the bit back onto the bridge so if you weren't running Udev it didn't work everything just stalled so I've just put a patch into Linux a few weeks back to solve that problem so you will need I think a 3. Well I've submitted stable patches back to 3.10 I don't know what they're there yet but certainly if you use a 3.12 Linux you'll be fine and all the code as it stands at the moment is on some upstream branches in each of those we will be folding into the master branches fairly shortly we're most the way through our QA tests now but there's still some internal Zen server QA that's failing so we can't actually use them for mainstream at the moment but that should probably be fixed in about a week or so I reckon so I mentioned the ZenBus binding so to make things usable by most people the ZenBus driver needs to bind to the existing platform device that everybody has this was something that Amazon requested of us as well because they use our PV drivers so that's what we did the thing is though we can't have as I said before people suddenly starting to get drives and windows update when that occurs so we make it bind to that device and we make it bind to the new one but it echoes through the device information from that device when it's created with its children and we will only ship drivers on to windows update with the bindings for the new device ID so that way if you're binding to the old one you can use the driver but you won't get drivers from windows update but if you have the new device then you will get drivers from windows update but it's still the same binary so I mentioned also the run time discoverable interfaces in windows when you've got drivers it's a relationship it's actually quite easy to do there's a message you can send called a PEMN query interface where you can just say I want an interface with this global UID and then if the thing wants to reply in the affirmative to you you can just pass you a jump table basically and if it doesn't know about that interface then the idea is you cascade that request upwards so eventually something new device drivers, if it implements it will come back to you with a positive response if there's nothing there then it goes all the way up to the top and then the plug and play manager responds with it nothing knows about that interface so that way you can have interfaces implemented at different levels and the client of that interface doesn't actually have to care this is quite important because we have a couple of interfaces exported by this Enfield driver so that a PV driver at the bottom of the stack can figure out whether the emulated device that it's aliasing exists and then say right I'm not going to start if the emulated device is there because one of the problems we ran into in the past is that if the emulated unplug doesn't work you can end up in a situation where you have PV storage and emulated storage for the same disk at the same time that generally corrupts your file system fairly quickly and from that point on was your VM is dead so the general sequence is your query by GUID and version there's a whole set of interface so if you go look in say the ZenBus repo is a good place to look there's an include sub directory in there where you'll find all these sort of things I've said examples for querying there that particular ZenViv repo is a good place to look for a query ZenBus implements the vast majority interface so ZenBus implements the vast majority interface ZenViv is a client of those interfaces it needs quite a few of them so it's a good place to go look for how that interface structure works I'll pass back I wrapped the use of the interfaces in sort of convenience macros so the general sequence is you use the convenience macro specified in the head you're interested in so if it was like the Zen store that the convenience macro is called store then you acquire and release around when you want to use the interface that's just so we can maintain a reference count this makes life much easier if you're doing power management transitions certainly when I was debugging things like S3 and S4 power states it made life a lot easier having that reference count day so you could see if something was hanging around that shouldn't be hanging around and that's just a list of a way you would find certain interfaces I said ZenBus implements the vast majority there's this kind of usual suspects there you've got a ground table interface an eventual interface store the debug interface ZenBus hooks the debug verc so when you hit the Q debug key in Zen it will be injected into the guest so any other driver can go and use the debug interface to register a callback so it can just dump state out to the log at that point ZenFilt implements device unplugging as I said and the emulated interface to determine whether so another driver can figure out whether an emulated device is present or not and then ZenVift implements the VIF interface for the network drivers so that does all the VIF net back driver state ring state management connection and packet reception et cetera so how to build a driver well as I said before you need visual studio unfortunately you can't use the express free edition you can use the 30 day trial edition if you don't have an MSD in subscription you need the Windows 8 WDK if you go on to the standard Microsoft sites now because 8.1 is out the WDK you will find by default is the 8.1 WDK but we don't use that yet you'll still need the 8.1 however it is available at that URL the wrapper scripts around the MS build environment that Windows 20 Visual Studio uses are in python now instead of shell which meant you needed Sigwin before so you need to copy a python 3.x you have to set three environment variables that I listed there just to basically tell it where the kit is, where Visual Studio is and where you want to put your symbols when the driver is built and then off you go you can build three or checked checked is just the debug that's just the Microsoft terminology for debug build and also in each of the repos you'll find an installed MD which is separate from the readme MD because you may actually want to ship it out to people if you are sending packages but installing a driver is pretty trivial the build just creates a directory and a tab or love that directory if you want to use that instead you just need to get that on to your VM and then in there you can either point device manager at the int file so they're using the haveDisk option from device manager or there's actually a little convenience installed that Microsoft provides as a redistributable executable which we just stick in those directories so you can just run that and it will install the driver for you because we don't expect everybody to have Citrix's code signing certificate we also test sign the drivers now that has small repercussions if you want to avoid big scary warnings when you install the drivers you need to install the test certificate on your VM there is actually a separate test certificate for each of the drivers it's in a pfx file which is not password protected so you can just stick it on the VM very easily it's in the project directory of each of the repos but one caveat is you must make sure you set test signing mode on for 64 bit VMs because as soon as you install ZMBD it's managing your system disk and if test signing is not on it won't load and thereby you will fail to boot so if you go look in the maintainers in each of the repositories there's my name in there and also the names of my team any of us are happy to take contributions we will take discussions on access to bell so feel free to ask us questions we're happy to help we're hoping in future as I kind of alluded to briefly in one of my comments earlier we are looking at the MPV HID ourselves it's one of the future projects we have drivers for PV HID it's just that we need some small modifications to the HID protocol because I believe at the moment it scales with the size of the VNC console when you try and use it we'd also like to possibly explore pushing multiple touch points through that protocol if possible and I think the coordinates in that protocol are relative at the moment and it will be nice if they're absolute other things we'll be interested in will be PV USB because occasionally we do get queries from people wanting to pass through USB keys into VMs we don't really have a good story for that at the moment and I think obviously if we can turn off emulated USB as we mentioned before it would be a good performance win so yeah, any further questions? Do you have a mic? I assume this means that you removed all Microsoft example code from the drivers so that you're able to such a good job? Yeah that was one of the things that held us up one of the drives I didn't put in my diagrams there is the simplest to use we have a driver called Xenoface literally all that does is implement a WMI provider which allows you to talk to Zen Store because in Zen server our guest agent is written in .net and it just makes life an awful lot easier if you've just got the WMI version of Zen Store there that driver was heavily heavily ripped off from the toaster sample driver shipped for the WK Exactly, that's a lot art so we did a bit of a clean rewrite of most of that code apart from the WMI code which was already ours Is there a reason why you're using XSDevel instead of ZenDevel or everybody else's? Only that it's essentially already the discussion for and for Zen server related features and these drivers are still branded the Zen server PV drivers I'm happy to take discussion on ZenDevel Scotland's ZenDevel so if people want to ask questions there too you probably like to get a better and quicker response if you go on to XSDevel We'd be very keen for these drivers to be part of the Zen project if the next foundation would be happy to have Windows drivers Actually I had that discussion with them and that probably is okay so if you wanted to do that probably the best way is to start a discussion on XSDevel The only thing I was concerned about was just the branding it's pretty trivial branding you have to have a name for these things in the infiles and we just used the Zen server name for now but we could just call them Zen if that wasn't going to piss off too many other people like James Harper for instance who obviously has his own set of GPL PV drivers My point was that since we haven't gotten around in Zen Client to writing the Linux PV USB frontend it might actually be valuable if we're able to I mean I don't even know about talking with management about doing this but contributing the Windows side of things might help us stream the Linux and Zen back end Yeah I've had discussions with Steve Meisner about that kind of stuff I believe you have essentially a fork of our old drivers and you patch them at the moment for the USB stuff which I think is going to be painful for you in the future having your own child driver using these interfaces I think will be much cleaner Actually there's a really neat work done Yeah we should talk after that you haven't seen it Any more questions? No? Thank you and as I said earlier