 I've been using PC emulators for code analysis purposes like forever. It's, I guess since the 90s or so, note the gray hair. Mainly because you can fill around with the hardware, you can do silly little tricks or so. But QM actually plays a lot larger role than just for freaks like me or so who like to do code analysis. And so the addition of a virtual secure boot, I believe is going to make a very, very big difference or so in things like cloud computing. The following talk by Gert Hofmann is going to be all about the trials and tribulations, I believe, of trying to get secure boot to actually work in QMO and other systems. And that's really all I want to say about it. Please welcome to the stage Gert Hofmann. Hello, I'm Gert Hofmann. I'm working for RedHead for already 10 years around in the virtualization team, working on QMO, both in user space, also on KVM on the kernel side. And one of my work areas is firmware, and that's why I'm giving this talk today. Oops, move that away. I first want to go over some terms, which I'll talk about in this talk, about how creating a plan when implementing something like secure boot for virtual machines. Then go over the implementations we had to do in all three involved software projects. And next, some hands-on session or some instructions if you want to play with what yourself, how you are going to do that. If there's enough time left, I'll go on with a short demo, and at the end, of course, questions and answers. So let's start with the terms. First one is, of course, secure boot. It's specified in the UEFI specification 2.3.1, and the goal of secure boot is that you don't run untrusted code on your machine. And this is done by verification of all those components started. And the basic idea is that each component verifies the next one before actually running it. So the firmware itself checks the boot loader, then the boot loader goes on checking the kernel you are loading, and the kernel, again, is checking any kernel modules or drivers you are loading into kernel address space and running it there. And all those keys used to verify the components are managed by the firmware. And that means for that to work, the firmware itself, the code living in the memory and also the storage for the keys, which is flash, usually, must be protected so that the operating system or any malware trying to infect the system can't modify those kind of stuff and how we are going to implement this for virtual machines as a topic of this talk. So next one, QEMO, you probably know this one. It's an open source machine emulator and virtualizer, and it emulates all the stuff you have in your physical machine as well. You have the chipset, you have the timer, you have the interrupt controller, all the stuff you need to interact with the virtual machine, a virtual graphics card, keyboard emulation, mouse emulation, and built-in VNC server where you can actually talk to QEMO storage, of course, network USB. Next one is KVM, it's a kernel virtual machine. That's a driver for the Linux kernel, which provides applications access to the virtualization extensions available in modern CPUs. On the X86 architecture, those are the VMX extensions on intraprocessors and SVM extensions on AMD processors. On ARM, there's Hype mode and there are also other architectures supported by KVM, and QEMO will actually use KVM for CPU virtualization. That means the code of the virtual machine runs directly on the CPU and, of course, it's the fastest mode of operation you'll want to usually use. The other option we have with QEMO is TCG. TCG stands for China Code Generator. It comes with QEMO and it's usually used if you're running not on your native architecture. For example, if you're emulating an ARM guest on an X86 machine, then you can't run, of course, directly on the CPU. So, TCG will translate your guest instructions and trust instructions and run them. It can also be used to run virtual machines on all CPUs without virtualization support. And somewhat strange use case is sometimes if you're running stuff on KVM, it's too fast, especially if you're running very old guests. There's one funny example, some old Windows operating system version. If you run it on a modern hardware on KVM, it's just so fast that the calibration loop goes so fast that Windows crashes with a division by zero. And the easy fix is to turn off KVM and use TGT instead and then it works fine. Well, next slide. Next one, EDK2, it's an AV development kit. It's reference implementation from Intel for AV. It lives on tianocore.org and the source code is available on GitHub. More interesting for QEMO is OVMF, it's open virtual machine firmware. It's an AV implementation for QEMO and actually lives inside the AV development kit repository. It has drivers for the usual virtual hardware you have in QEMO, for VITIO, for VGA devices. And the EDK2 repository also is a firmware for ARM. So if you want to play with that, it's all sensors repository. CBIOS is a default firmware in QEMO. So if you start QEMO and don't say something else, you want to do this or that firmware, what you get by default is CBIOS. It's a classic BIOS implementation. And the main users are QEMO and also Coreboot. So if you have Coreboot on your machine and you need a BIOS interface, that's also the way to go. And you can also run Coreboot as QEMO firmware. So that's about all the terms I want to cover first. So this is where we have been two years ago. Secureboot support is available in the AV development kit source tree for quite a while already. It uses OpenSSL for crypto support but doesn't ship it as part of the repository. So you have to unpack the tar ball and apply a patch because some differences are in the AV environment. There's no standard over example, so they patch all this out. And then you can just go build it with Secureboot enable option and you have working firmware with Secureboot support. And what's the problem is the firmware itself isn't protected. The memory and the data storage of the firmware is not protected at all. And also the flash where the keys are stored and where the AV variables are stored is also not protected at all. So the guest can do whatever it wants. That's somewhat useful if you want to develop software with Secureboot support or you can still use this to verify using the interfaces in the correct way. But of course that's not really your thing, so I want to go change this. So when designing QEMO support for something new, there are basically two choices you have. Also some in between depends a bit. One option is you stay close to real hardware, you emulate something which exists in the real world. And the other option is you create something new out of the layer. It's all software, it's all virtual, you can do whatever you want. And of course both approaches have their advantages and disadvantages. If you go emulate something which exists, it's easy on the guest side because the guest will find some hardware even though it isn't real, which it knows how to handle, so you don't have to deal with drivers. The guest operating system will be able to just use your hardware. And it also usually simplifies the management of virtual machines and physical machines if you keep the differences between physical machines and virtual machines small. On the other hand, it can also have disadvantages, depends on the kind of hardware, of course. One example is the old USB adapters which are quite difficult to emulate. In an efficient manner, it needs quite a bunch of CPU time. So it has been a problem for a long time that as soon as you connect a virtual USB tablet to a virtual machine that eats a lot of CPU time. Another problem is that you are limited to what the physical hardware you are emulating is able to do, which is becoming a problem these days with the serious graphics adapter, which is not the default graphics adapter anymore because of this. Because the service VGA is a design from the 19th and just can't keep up with today's needs. So the other option is actually something virtual, it's a usual term for that. You have more freedom in designing things, you usually get better performance. If you're doing it in a clever way, you can get a lot of code sharing, for example, all the virtual devices for SCSI, for block, for network, for input, for serial devices. They all use the very same wing format to send data between host and guest. And you can also try to simplify things because it's a virtual machine, so some things are easier than on physical hardware. The back side is you, of course, have to maintain the guest drivers. It's not so much a problem for you as a user. If you're using Linux, the drivers are part of the kernel, so you don't notice. It's more visible to the normal user on Windows because you need the special driver image with all your virtual drivers on it. And if you try to simplify, it can also backfire. If you simplify too much and can create quite a mess long-term. For example, the original VITIO specification went on there. We have a virtual machine, we know what our architecture has, and so we just go with the native endian. So we don't have to bother the bytes webbing, and that turned out to be not a good idea because these days IBM is pushing the PowerPC architecture to little endian, from which was historical big endian. And so the virtual machine usually starts in big endian mode for historical reasons. The firmware talks to your root IO device in big endian mode, then loads a little endian Linux kernel, and the little endian Linux kernel, of course, will also talk to the root IO device. So the full concept of native endian just doesn't make sense anymore, especially for PowerPC. So that are the traps you can trap into if you're trying to simplify too much. So let's see how the firmware is protected on physical hardware. We have something called system management mode on all the processors since at least 20 years or so. And the chipset can set an interrupt to enter system management mode, and the processor will store the complete state of the processor in memory and start execution at a different address. And with a special instruction, it can return to normal operation. It was designed for stuff like power management, that you don't need a special side processor to do your power management. You can just have the chipset rise and system management interrupts, and then it runs this special code, which can check the temperature sensor and maybe tune the fan speed if it's too hot or too cold, stuff like that. And there's some memory which can only access in system management mode. It's M-RAM, it's pretty small, 128 kilobytes. It's a location where usually the VGA framework sits. And newer chipsets have, in addition to that, bigger section called TSEC can be up to 8 megabytes big. On the list, it's the upper end of the memory mapped below 4 gigabytes. That's the upper end of the 32-bit address space. And the configuration for S-M-RAM and TSEC can be locked. So if it's once it's initially inside, it can be tampered with until the machine is reset. So the question is, we are going to do the same on QEMO or something else. The advantage, of course, is we can hope to not have to write too much new code because if you're implementing system management nodes in QEMO, we can use existing logbox code, for example, from the EDK2 toolkit. Performance shouldn't be that of a big issue. The problem is that system management mode was originally designed for something else, not for security. And so it has a pretty bad security track record. You can check the talks from previous Congress years about various attacks on system management mode. So we have to be quite careful if we go this route. Also, two years ago, we had no system management support in KVM. And TCG had at least some basic system management support. So we decided we go the system management route. And let's start looking at how it looks in QEMO. Our to-do list. QEMO can emulate two different ship sets. One is from the older one is from the mid-90s. And this is so old that it hasn't this TSEC memory. So we can't use this for secure boot because the EDK2 implementation needs more memory. It just doesn't fit into SMRAM. So this one is out. We have the new Q35 chip set emulation. If you start QEMO, you can pick this as dash M, Q35. And we have some SMRAM support from the basic TCG support already present back then. We have to complete it. We have to implement TSEC and also implement a lock-bit support. So if you try to look down, the TSEC and SMRAM configuration actually works. And we also have to figure out how we are going to protect the virtual flash. One foundation we are building on is the memory API. It was created by Avi. It's the same guy who started the KVM project. And it was merged in QEMO 1.0. And it pretty much put the memory management inside QEMO upside down. Before, all the versions of QEMO had a very simple scheme for registering memory and MMO agents. You just register it and it stays there until QEMO exits. And if you register something else on the same address or what was there before just goes away. And now we have a hierarchy of memory regions. And you can enable, then disable, then remove them around. You can create aliases. And these regions are used to build address spaces, which was a big step forward in correctness of the emulation of various corner cases. For example, if you remap a PCI bar of your network detector, for example, and now actually works correctly, that the bar actually disappears at the old location. It wasn't the case before we had the memory API. And another interesting one which caused quite some bugs is that each PCI device got its own address space for DMA. And if you don't turn on the busmaster DMA bit and the PCI config space, we also don't turn on this memory region with the effect that DMA stops working. And there have been quite a few drivers, especially with IO drivers, which have never been tested on real hardware, which didn't set the busmaster bit. And this change broke those drivers. And implementing system management mode would have been pretty much impossible without this memory API. So let's have a look how this looks like. I think it fits smaller, so it fits. There's a system memory region of the simplest machine type we have. It's an all-easy PC without any PCI stuff. But it's nice to see how this looks like. Here's the system region, which covers the complete address space. We have the memory, which is 128 kilobytes, I think. And then we have a container, which is a region that just holds the subregions for the service UGA memory. And you can see what is visible right now is this one, because other regions are disabled. And you can program the VGA bank registers to map some memory of the VGA memory into those two places. In that case, the region will be enabled. And it will be visible to the guest because it has a higher priority, yes, a priority, that's prior to your own. So this will overlap the memory if it's enabled. There's just the room area. There's a bias. And that's it for a simple ISA machine. So how it looks like for our Q35 machine. It's very similar. We have, of course, a Q35 PCI. So we have a lot of PCI bars, which I left out here to have enough space on the slides. You have the assembly region, which was already there in all the QME versions for this basic SMM support we already had. Which is new is this TSAC black hole. This overlaps the top of the RAM region, which is right here. It's the topmost piece of the memory. And it has a higher priority. So if your guest tries to access this address range, it will land at the black hole. And the black hole just discards any writes. And if you read from it, you will just got FFF, nothing else. So that way, we can hide the TSAC memory from the normal system view. Then there's also MMIO, MMCONFIG address base, IOAPIC. And the Q35 chipset also has SMM alias at a higher address, which is disabled because EDK2 doesn't use it. But the alias is there. And then we have area for message signaling interrupts and finally, to address areas for the flash. Where the flash 1 is where the variables are living, and flash 0 is where the code is living. Come to that later in detail. To enable access to the system memory and SMM mode, we have this SMROM region, which has alias into low and high SMROM locations and also for the TSAC, which refers to actual memory. And then each virtual CPU gets its own address base, this one. And it's just these two, the SMROM, which we have right here. And the normal memory is all the stuff we had on the previous slide. And if the CPU enters system management node, this SMROM alias will be enabled. And the CPU can access all these regions. And if it leaves system management mode, it's disabled again. And it's not visible anymore. And we have one of these address bases for each virtual CPU. So it's impossible to run SMP attacks because each CPU has its private space. So we can't try to bring one virtual CPU into system management mode and try to trick the systems out there to get access to those memory regions with the help of another CPU. It doesn't work because each CPU has its own private view. And for each CPU, individually it is enabled or disabled here. We also have to care about PCI devices. And PCI devices get for BASMAS or DMA just a normal system view, which is basically the same CPU as seeing if it's not in system management mode. So you can't use your network card or a hard drive to access this memory indirectly. So we also had to implement the log-bit support. This is a snippet from the test case. The configuration register has an open bit, which can be used to make SMROM available even when not in system management mode, which is used for initialization. So the first part of those test case checks of the open bit can be set. And if you set it, it's actually open. On the second part of the test, it just sets the log bit. And if you flip the log bit to true, it will automatically switch the open bit to false. So it's closed and only on SMROM is only available to system management mode. And it also flips the open bit to be read only so you can't open it again until the machine is reset. And this log bit will also lock down the configuration for the t-sec memory. So the original Q35 chipset and physical hardware had a bug. There's a register which basically configures the memory split, how much memory is mapped below 4 gigabytes, how much memory is mapped above 4 gigabytes. And if you set the log bit to lock down the configuration, the t-sec size and enable bit are locked down, but not this configuration register. So you can use this configuration register to just move away this protection and access to memory nevertheless. And that one is easy for QMU to handle because we don't configure the memory split this way. So this register simply doesn't exist in QMU and so it can't exploit it. And we do the memory split configuration with command line switches instead. And by default you have 2 gigabytes of low memory and everything else is mapped above 4 gigabytes. Next one is flash protection. There are all the keys and viables are stored. We need to protect this one as well. And on physical hardware it works that way that if you try to write to flash, it will be put in device mode. And it will also trigger system management interrupt. And the system management interrupt will put it back into read-only mode. And it's pretty complicated. It's also prone to race attacks. I've been successful attacks to that and because it's on physical hardware you can't fix it. So newer versions of the chipset use even more complicated protocols to handle the races. And we tried to do this in a simpler way. First, we have two parts of the firmware. We have first the code and second the viables where all the keys are stored as well. And the code is usually just read-only. The guest card, write it, and firmware updates simply happen by doing them on the host. So the firmware is just a normal distro package and your normal distro upgrades will also upgrade the firmware. And QoMeo will pick up the new version on the next power cycle of the virtual machine. And we have the second part of the flash which stores the variables. And we can simply discard any writes which happen if you're not in system management mode. That's a concept we have borrowed from ARM. We have a secure flag which is on ARM used to signal user space or kernel mode access on ARM and actually exists as bus signal. So the flash chip can see whenever the access comes from kernel space or user space and can control access at ray. On x86 this doesn't exist as bus signal but on QoMeo it's software so we can just borrow this concept and use it on our virtual machines. It's implemented using memory transaction attributes where among other things is noted where the secure flag is set or not set and then the flash code can just decide, OK, it can allow this write or we can't. So that's a lot easier than on physical hardware. So this all works just fine with TCG but we also want to run it fast with KVM. So I have to look how we are going to do that. We can't pass the memory regions the hierarchy we have as is to the KVM module in the kernel. So KVM has to flatten the address spaces into a memory map which is just a linked list of slots. This one is passed down to the kernel module so KVM kernel module knows everything it needs to know about the address spaces and this memory map is used by all VCPUs. Each time the address space changes, it has to be updated. So we have the problem that becomes quite expensive on SMP machines if we would take the same approach we have on TCG that each virtual CPU gets its own address space or in this case its own memory map. So it has to be designed in a different way. KVM got support for address space IDs. So now we can pass down two address spaces to KVM. One for system management mode and one for outside system management mode. And again these two maps we have now will be shared by all VCPUs so any more VCPUs wouldn't make things slower. We also needed a new IO control to ask KVM to rise in system management interrupt. And also the KVM run state structure got in flag for system management mode so KUmo knows when an exit happens whenever the CPU which is currently running is in SMM mode or not because it needs that to set the memory transaction attributes correctly. So the flash chip knows when other write is coming from system management mode or not. So this are the address spaces in KVM mode. They look a bit different. We have two of them to generate those two memory maps. One is the CPU memory which is just a normal system view basically for if you're not in system management mode. And for the system management mode address memory map we have the other KVM SMRUM address space which is basically SMRUM container very similar to the one in TCG mode with higher priority and also the lower priority in RDS to adjust the normal system memory. So that was it for KVM. And of course we also need some support in OVMF in our firmware. So we thought, oh, that should be easy because the intention of doing the SMM way was to share the code and reuse the code. And then the lockbox code is in the EDK2 repo but all the code to create system management for initialization and also the SMM handle itself wasn't in the repository. So we run around, ask the Intel guys which are maintaining the EDK2 and the first thing that you got was a pointer to the quark toolkit which is a 32-bit chip set reference design from Intel. And yeah, of course only the 32-bit code was in there which was a bit disappointing. And Lars Loh, co-worker of mine went through the code and found a security debug in there which delayed the whole thing quite a bit. He had a lot of discussions with Intel of the EDK2 maintainers also of the quark team and of the security team. The security issue wasn't handled very well by Intel. I'm not aware of any official security advisory for the quark thing. But finally, half a year later, the code was committed to the EDK2 repository and we didn't only got the 32-bit or the 64-bit code and the first committed version also had the security bug fixed. And the nice thing is Intel continues to maintain this code in the open repository. So we see a constant flow of patches and improvements so it's not an Android-style code drop. Just here, you have it and now keep queued for the next month until we have the next release. That's actually happening in the repository. And it's also the repository where they cut the development kit releases now and then for the OEMs for other laptops. So there's nothing at some point in the future you should be able to look at the code which should be roughly what you are running on your laptops. So it's nice that you got this open-sourced as part of the support. So this in place, it was as easy as you expected initially to get to implement secure boot support. It was still not trivial. There are a lot of little details you have to get right and but we got it finally merged one month after the initialization code was merged. So if you wanna play with us yourself, you need QMO version 2.5 or Solinos 4.4 on your which shouldn't be a big problem with the recent distro. LibWord support took a bit longer. It's in version 2.1, which is pretty recent. So it could be that you're not have it on your distribution. EDK2 doesn't really do releases. So you can just fix the latest Git snapshot and we're serving using a snapshot in mid of the year. And then this is how the LibWord configuration looks like. You need the Q35 machine type right here. Oops, that's a bit long. Can I make it fit? Yes, it fits now. If you don't want the default firmware, which is C-Virus, you have to specify this loader tag which is called loader for historical reasons, dating back to Zen times. You will say read-only because this read-only is for the code. That's read-only. Secure, yes, as for the viable store. Type tflash, it's flash memory. This is a template for the viable store, for an empty viable store. And you don't have to explicitly add this one the LibWord stored by itself. Create a copy of the template stored in Vallib LibWord somewhere. And of course you need system management support turned on because it's turned off by default because it doesn't work in older versions. So if you want to use Qmer directly, it looks like this. Of course you need again the Q35 machine type and you have to turn on system management node. This creates the flash device, the first one for the firmware, for the code, which we set to read-only. And the second one for the variables, which we put into secure mode, which needs to be done with a separate configuration switch. And then of course all your configuration arcs. One nice thing Qamo supports is that you can put command line arguments you want to add to Qamo. In this special Qamo text, you have to import this special namespace in your LibWord config. And then you can add this one. It's very useful if you have a LibWord, which is too old to have system management support. You can do it this way. Was created to make it easier for developers to work on new Qamo stuff, which isn't not yet supported by Qamo. You can still run your virtual machines you are using for development and testing using LibWord because you can just tweak the Qamo command line this way. I'm running Jenkins instance, which does automatically firmware builds. It lives at this address. Each time something is committed to the upstream repository, it kicks the build. So it has updates quite frequently. It's an easy way to install the ADK2 firmware, OVMF. There's also packages for the ARM firmware, also C-virus core boot. But the most interesting one is OVMF. There are three different variants. The one named Needs SMM is one with system management support. And system management support is a compile time option so that binary doesn't work without system management support. And it also requires Q35 because all the chips that just doesn't have a big enough TSEC. And the other ones, poor AFI is just an AFI firmware with support both Q-mage chipsets. And the third one additionally has C-virus compiled as compatibility module. There's little reason to actually use it in a virtual machine because you can just use C-virus directly, but if you want to play with this or test something which needs a compatibility module, you can use those. Okay, what's the time? Yeah, I think you have enough time for a short demo. Let's use this one. Okay, this, I think it's the device manager, is it? Here's secure boot configuration. You can look at it, it's disabled. And oops, I just noticed I missed the slide but I can show it in the demo. Let's exit here. OVMF doesn't come with any, doesn't come with any keys in the default key store. And the way you can add them is with a small ETHO. You can see, oops, you can see it right here. This one is an ETHO image attached as SCSI-CG ROM. And there's an AFI application which can be used to add the default keys. So, it adds the keys and it enables secure boot support and then we can reboot and then it just stops. The reason for that is that the ETHO image which enables secure boot isn't signed and because it isn't signed, it isn't booted now anymore because secure boot is enabled. Oops, that wasn't, so I can just use this one. Let's kill this one. So, you can see secure boot is enabled. And if you look, you can here see the certificates which are loaded by this NREL application. First one is a Windows key which Microsoft uses to sign the Windows operating system. The other one is key Microsoft uses to sign third-party stuff that includes Linux AFI application, Linux bootloader application used by various distributions and also PCI-ORMS and stuff like that. And the Red Hat version of the ETHO also installs the Red Hat secure boot key. Okay, let's close it. Okay, that's the end. I didn't do this all of my own. Actually, it was quite a small part of it. I did most of the Q35 chipset implementation. It's also involved Paolo, which did all the KVM and TCG code. Paolo is current KVM kernel maintainer and most of the work inside OVMF and talking to the Intel guys about releasing the source code for the SMM initialization or SLASLO. And I think he did most of the work, I think it's the end to get all this going. The slides are online on this link. You can also scan the QR code to just the link on the slides, to the slides. Cool, that's it, any questions? So if you have any questions, we have four microphones here, two on each side. Do you have any questions yet? Otherwise, I have a very quick question, kind of high level. Is there gonna be support for, or does secure virtual boot support things like containerization with C groups, or will it, or is it not even relevant? I think for containers, it isn't relevant. Although there is a need for secure boot in that environment too, I would have thought. I think we would have to take a completely different approach because you just don't have a virtual machine or you don't have firmware if you're running containers. Is there, are there any questions from the internet? Unfortunately not, okay. All right, well, join me in thanking the speaker for such a great talk.