 My name is Vitaly and today I'd like to talk about the current state of Hyper-V emulation in KVM. A few words about myself. I'm a long-time KVM contributor. I'm mostly working on Hyper-V emulation in KVM as well as running KVM on Hyper-V, KVM PV features and master's virtualization in general. Also, we had some changes in KVM maintenance recently and Hyper-V emulation in KVM was given its own subsystem, probably admitting the growing size and complexity and one of the co-maintainers there. So let's get started with the talk. But why do we even want to emulate some ardent hypervisor in KVM? The answer is fairly simple. We want to run some legacy or proprietary operating systems which we cannot change. And not only we would want them to run on KVM, this can probably be achieved by just properly emulating hardware. But we want them to run fast. And to make that happen, we need to emulate something which these operating systems already know about. In case of Microsoft operating systems like Windows and Hyper-V, the choice is fairly obvious. It's Microsoft's hypervisor. How do we do that? Do we disassemble Microsoft Hyper-V to figure out how it works? Well, luckily we don't really need to do that. Microsoft is generally publishing a so-called top-level functional specification. Previously they used to publish a new version with every Windows Hyper-V release. Now it was converted to an online doc and we get updates to it more frequently. 30-minute time slot doesn't allow me to talk about all Hyper-V enlightenment we support in KVM because we already have too many. Here are the ones which I'm not going to talk about today. In case you're interested in these, I encourage you to watch one of my previous talks on the same subject, namely in 2019, I gave talks on both DevCon and FOSDEM. And also in 2018, together with Microsoft employee Tian Yulan, we gave a talk about the other side of Nest about running KVM on top of Hyper-V, where KVM acts as a consumer of the exact same enlightenment. One may wonder where he can read some documentation on these enlightenment. The only doc I currently know about exists in KVM source tree and it gets updated with every new enlightenment we support. Let's get to the newly implemented enlightenment. The first one is so-called HVEVIC and I'm going to use KVM names for the features which you can pass to KVM as CPU flags to enable them. HVEVIC solves a long-standing issue where Hyper-V synthetic timer and Hyper-V synthetic interrupt controller in general were incompatible with hardware epic virtualization. The reason for incompatibility is that there is a feature called automatic end-of-interrupt, which is completely incompatible with it. Luckily, there was a feature bit introduced in Hyper-V specific CPU ID which tells Windows to not use this auto end-of-interrupt feature. More than Windows versions obey to the recommendation. Though some older versions like Windows 2008 and older may actually ignore it and in that case the enlightenment won't actually give you much. The next one is enlightened emissary map and it's a nested specific enlightenment which is now supported for both Intel VMIX and AMD SVA. The idea is that there is a PV protocol established between Level 1 and Level 0 hypervisors where Level 1 hypervisor lets Level 0 knows every time it updates emissary bitmap structures in memory. This way Level 0 doesn't need to analyze Level 1's emissary bitmaps every time it's running a Level 2 guest. This speeds things up significantly in my tests I got like 700 CPU cycles cut with the feature. Also it should be noted that this feature is already supported for KVM on Hyper-V. The next one is so-called XMM input and it allows to use XMM registers 128 bit long for Hyper-V hypercode input. This is useful for hypercodes which allow for long parameter released or an array of parameters. Without the feature such long parameter released has to be passed through memory and then hypervisor will have to go and read gets memory to get them. Now we support these XMM registers and it makes these hypercodes faster. It should be also noted that Hyper-V supports using XMM registers for hypercode output too but we don't support any of the hypercodes which would actually make use of the feature in KVM yet. Synthetic debugger. Normally if you would want to debug your Windows guest you'll need to connect to it via nick or a serial port and this is not super fast. With virtual machines there is a better way. Hyper-V allows for a Hyper-V specific transport where our guest uses hypercodes to send data and we now support this interface for debugging Windows. It's supported both for KVM and KVM enforcing Hyper-V CPU 80. It's not a new enlightenment but rather a change of way how we actually do Hyper-V enlightenment in KVM. It may come as a surprise but once you pass a single Hyper-V enlightenment to your guest your guest will actually be allowed to use all of the Hyper-V enlightenment currently supported by KVM regardless of what it sees in guest-visible CPU IDs. This feature allows you to change the behavior. You may want to do this to, for example, reduce your attack surface. There is a similar feature for KVM PV features. The only caveat is that you need to be careful with dependencies. QEMO as VMM is trying to do good job tracking these dependencies but there is no guarantee that we are doing it perfectly for all currently existing and namely future Windows versions and in case your guest needs some enlightenment which it doesn't have it may try actually using it and with this feature enabled it will likely crash. Okay let's get to current work in progress what you can see on the mailing list. The first and probably the biggest is TLB flash improvements. Initially we did Hyper-V TLB flash in a very simplistic way. Basically from all the parameters of the hypercode we've only analyzed the target VCPU list and we were queuing the request to flash TLB buffer for them. The hypercode has more information. In particular it can specify the exact guest virtual addresses which require flashing. Now instead of just queuing the request to flash TLB buffer completely we've added a queue, a per VCPU queue where we can actually put these individual requests. This is supposed to significantly speed up TLB flash operations from Windows and Hyper-V. The next one is direct TLB flash how Hyper-V top-level specification calls it or level two TLB flash how we call it in KVL and the idea is that level one and level zero hypervisors collaborate in a way that level one gives level zero all the required information to perform TLB flash calls from level two guests directly without the need to exit to level one. This is much faster and the series is on the mailing list. It's part of this bigger TLB flash improvements and you're encouraged to give it a go. Enlightened VMCS got updated with Windows Server 2022 and Hyper-V 2022 releases and it gained some new fields for some pictures which are previously incompatible with Enlightened VMCS. Supporting this for KVM on Hyper-V is fairly easy we just need to add these new fields and remove them from the filter the previously had but supporting them for Hyper-V on KVM is very hard because we never expected Enlightened VMCS to change so our enablement is basically all or nothing and this creates issues in migration. The way to make such changes to the Enlightened VMCS without changing its version is currently being discussed on the mailing list. We are trying to find a way which will be supportable in long term in a variant TSC. Normally in the variant TSC is an architectural bit which tells the operating system that TSC frequency is stable and it's never going to change. The speed cannot be set if we want to migrate our guests between hosts with different TSC frequencies but recently TSC scale was added to Intel CPUs. It was present in AMD CPUs for a while already which makes it possible for Hypervisor to keep guest visibility frequency stable across migrations and Hyper-V enabled it in slightly different ways. They've added a PV MSR and guest has to opt in by flipping with zero of this MSR. Only then this architectural bit appears in CPU ID. We've checked and things actually work with modern Windows versions without this PV enablement. If we just pass this architectural bit from day zero though we will want to be as close as possible to the genuine Hyper-V. So we will enable the feature of the exact same way Hyper-V did. Okay, let's get to our to-do list. What's described in TLFS but no one is known to be working on this feature yet. The first one is unhalted synthetic timers. Normal synthetic timers they run continuously regardless of whether guest CPU is running or not, what how did and not. This timer only runs when the guest CPU is not halted, which means that it's actually running and doing something. The rest is similar but not the same. For example these unhalted timers cannot send VM bus messages, they only inject interrupts. The next one is so-called non-privileged instruction execution prevention and it allows to block certain sensitive instructions from user mode and the guest. The feature is very similar to so-called user mode instruction prevention but it's done using Hypervisor capability so it doesn't require that feature. KBM already knows how to emulate user mode instruction prevention so this feature should be relatively easy to implement. We will use the same mechanism just the interface for enabling it is going to be slightly different. More TLB flash and direct TLB flash improvements. With the series which is currently on the mailing list we've started analyzing guest virtual address argument of the hypercalls. Though there is one more, there is an so-called address space or CR3 of the target vCPU. Normally it matches but sometimes when the request is queued the target vCPU switches from one task to another so flushing TLB there when it actually comes to processing the request is redundant. Now with per vCPU queues we have it should be fairly easy to add this parameter, save it and analyze it on the target vCPU when we get to actually processing the request. This should eliminate some unneeded TLB flashes. TTP flash. There are some hypercalls which allow for flushing mappings derived from second level of translation, EPT on Intel NPT on AMD. We already support these hypercalls for KVM on Hyper-V but we haven't implemented them for Hyper-V on KVM yet. The performance advantage comes from the same fact that it's only the hypervisor who actually knows whether target vCPU is currently running or not, whether it requires an immediate action to flush it back or whether the action can be deferred. Free page reporting. There is a hypercall in Hyper-V where the guest can tell the host whether some pages are being used or not. So pages which are not used can be swapped out, re-bubbered for other guests for example. This makes it more efficient as otherwise hypervisor should just evict some random pages. We already support these for Linux on Hyper-V guests but we don't support it for Hyper-V on KVM or like Windows on KVM because we don't know what to actually do this with this information if we get it on Windows. And now the elephant in the room virtual secure mode. Virtual secure mode is a huge beast. It's a set of capabilities which allows to create additional security boundaries within your operating system by using hypervisor capabilities. This means that for example you can build an enclave and put your secrets there protecting them from kernel level exploits because normally if something is running in your CPL0 it has the highest privilege level and you cannot hide anything from such malware. Windows uses this feature as like virtualization based security for features like device guard, credentials guard and if we don't emulate this virtual secure mode then Windows just adds hyper-v roll underneath which actually makes this nested configuration and then it uses this virtual secure mode from Hyper-V. You may want to avoid that for performance reasons although it's not very easy to implement just because of its size and complexity. That's not not least one additional idea on how Hyper-V emulation can be used for something besides just running Windows and Hyper-V guests on KVM. Currently KVM doesn't implement that many PV features for speeding up KVM on KVM workloads but we already have some features for both KVM on Hyper-V and Hyper-V on KVM basically server and client parts of the same features. These include like Enlightened VMCS and Enlightened MSR Bitmap, Direct TLB flash and others. We can of course try to invent something similar for KVM but what if we just use these features? In fact this is available today. You can hide KVM identification from your hypervisors and hyper-V identification and then KVM will happily use these hyper-V features while running on KVM and your workloads will likely run faster but at the same time you won't be able to use KVM PV features like a synchronous page fault. So the idea is what we if we just add a PV feature with two KVM saying that certain Hyper-V features are available so KVM on KVM workloads can actually go ahead and use these features natively. An interesting idea was trying to have some spare cycles give it a try or I'll try to find some time and maybe try to do it myself. That was it for today. Thank you very much for listening.