 Hello everyone. Welcome to my talk, which is called how cart could it be to flip a bit? And today we're going to talk about KVMPV feature enablement of the virtualization stack. My name is Vitaly. I work for Red Hat. You can usually meet me on KVM development mailing list. I'm trying to focus on KVMPV features, Hyper-V enlightenment, and Nesting, especially mixed Nesting. When we're on KVM on Hyper-V and Hyper-V on KVM. I also do KVM and Hyper-V PV feature enablement in QML sometimes. So let's get to the talk, our virtualized features. What are these features? Well, these are features which are not present in the hardware we're trying to emulate. KVM implements quite a few. First, there are so-called native KVMPV features, which were developed specifically for KVM. Other than that, there are features from other hypervisors, which KVM tries to emulate. For example, there are Hyper-V enlightenment. There's also a number of features from them and VMware hypervisors. But for the purpose of this talk, I'll be mostly focusing on native KVM PV features and Hyper-V enlightenment. PV features, why do we do them? Well, mainly there are two reasons. First, we are trying to make things work faster because ambulating hardware interfaces is not always as fast as we would want it to be. Second, sometimes we'd like to introduce some unique capabilities, which are not possible, but don't really make sense with bare heart. So if you are missing some unique capability in your guest, you will certainly notice that. But the performance-related feature is sometimes hard to notice because you don't really know that your guest could have run faster. So the regular advice for users is to enable all currently supported KVMPV features and give them to your guests. They will likely run fast as they will decide if they want to use these features or not. Performance-related features are usually implemented in KVM itself and not in user space VMM. But after the feature is implemented, we still need to somehow let the guest know that the feature is available. On X86, the regular interface for that is CPU ID feature piece. So your user space VMM, like Kilmoo, has to do two things. First, it has to query KVM for the supported enlightenment or PV features. And second, it needs to set the corresponding feature bits in guest-visible CPU IDs. A good question and a side topic. If the feature is implemented in KVM but not presented to the guest in guest-visible CPU IDs, can the guest still use it? What do you think? The answer is surprisingly, yes, there is nothing in KVM which would stop the guest from using the feature. Recently, two features to change the behavior and actually limit the available feature set to what was exposed in guest-visible CPU IDs. Two features were introduced in KVM. First is for native KVM PV features. And the second is for HyperV enlightenment. So far, Kilmoo doesn't support any of these features. Okay, let's get back to the PV feature enablement. So user space VMM's job is to query KVM for a supported feature set and then just set the corresponding feature bits in guest-visible CPU IDs. That sounds really easy, right? And in fact, it is. But making the decision to flip a particular bit is not as easy as it may sound. Let's talk about native KVM PV features first. Here is an example of a feature recently implemented which is dear to me. It's a so-called interrupt-based asynchronous page fault mechanism. And this mechanism is known to significantly improve the throughput in memory overcommitted environments when you're trying to consume more memories than you have on your host with your guests and applications. This feature was merged in Linux 5.10 and it's supported starting Kilmoo 5.2. The feature is also known to completely replace the legacy asynchronous page fault mechanism. So even if you expose this mechanism to your guests today with a recent kernel, the guest won't be able to use it. The mechanism won't work. So when you start your guest with Kilmoo, what do you do? You pick a machine type, you pick a CPU model, then you add some devices and the guest will run. Where do KVM PV features native PV features fit into this picture? Will this expose any? The answer is yes, it will, but which ones? I'm yet to find a documentation about that, but we can use Kilmoo source code instead, of course. If you do so, you will likely stumble upon the following array, which defines six KVM PV features, which are going to be added by default for all CPU models. And there is a comment which explains why the list is much shorter than what I've showed you on one of the first slides when I listed all currently implemented native KVM PV features. The features which we added here have to be supported by the oldest supported kernel. And what's the oldest supported kernel in Kilmoo? Well, this doc tells us that it's Linux 4.5 or Red Hat Enterprise Linux 7. Linux 4.5 was released in March 2016. It's been almost five years since. So for a newly implemented KVM PV feature like entire based asynchronous page fault, it's expected to take roughly five years before we'll be able to add it to this list, so the feature will get enabled by default. What do you do in the meantime? Well, you can always enable the feature manually with a corresponding CPU flag. You can also use so-called ManCPU host model, which will enable ever since supported by the host. This mode is known to be not migratable in general case. If you are running a layer stack, then you should wait until all your layers support the feature before you'll be able to enable it. And this may take some time. Also, it will require you to update all your VM conflicts to include this new feature which was just introduced to KVM. And that's no easy task, of course. So as a result, we are seeing very low adoption of the newly introduced KVM PV features in the wild. And a reasonable question is, can we do better? I tried gathering some ideas around and here they are. First, let's step back and discuss why can't we enable all currently supported PV features by default. Well, then first simple. If we enable all currently supported features by default, then KVM won't start on anything by the latest kernel, which has full feature set. And this is undesirable. This will break the support promise. Okay, if we can't enable all these new features unconditionally, maybe we can enable them on the host which supports them on newer host and avoid enabling them on old host. Well, there is a feature called live migration, which is known to be very important to users. And when you create two VMs with the same KVM common line and two different hosts, it's expected that the VM will look exactly the same. It will have exactly the same set of features. If it doesn't have certain features on the destination hosts which were present on the source host, your guest may get very confused upon migration because the feature will suddenly disappear from underneath. We can certainly support migrating to hosts which are equal or newer, which have a super set of the features from the source host, but we can't support migrating in the other direction. So another idea. Qmo is known to introduce a new machine type with every version, like a new version of the machine type. For example, if you use Q35 machine type with Qmo 6.1, it's an LS for PCQ35 6.1. And with 6.2, it's going to be 6.2. And we can enable features conditionally based on the machine type. Can we do that? Well, yes, we can probably add new KVM PV features to the newest machine type and keep it disabled for old machine types. Then only the new machine type will require the latest curve. There is an expectation from users that even the newest machine type can be created with the oldest supported kernel. We never said otherwise. And we start requiring very recent kernels for the newest machine types. When users will switch from using, for example, Q35, which will run the latest, they will hard code some older version of the machine type. And they will never change it again. And surprisingly, this may lower the adoption of not only KVM PV features, but all features in Qmo because users will get stuck with some older machine type. There is an idea to introduce another configuration dimension, so called cost platform, and specify the required minimum required kernel version. The advantage is that we will have a clear separation between the machine type, which will mean exactly what it means now, and the required kernel version. Another advantage is that these can be used all across Qmo for ever since kernel related, which may use newer kernel versions, for example, things like rehost, VFIO, and stuff like that. It's still not ideal because users will still have to update their configurations, which is hard. We're also going to explore test metrics. Previously, it used to be machine type and CPU model. Now it's going to be machine type, CPU model, and host version. Also, it's unclear what to do without with downstream kernels, which are known to backboard features. How do we match them to upstream kernel versions? Maybe we'll need a special syntax for them. There were other ideas expressed, and the first valid question was, do we really need to solve the problem on Qmo level, which is a valid question, but just moving the problem up the stack is not going to magically solve it. It's also known that there are multiple higher level applications using KVM Qmo stack, and we'll have to teach all these applications about this new KVM PV feature. Also, before we are able to flip this bit or enable this feature on the highest level, the enablement should have it all the way down the stack, and it's not as fast as we would want it to be, because we cannot send a patch to Qmo to enable a feature before the feature is introduced in KVM and fully accepted. We cannot send a patch to live work before the feature is fully accepted in Qmo, and so forth. It takes time. And even if we move the decision to enable the PV feature somewhere up the stack, it's still a hard problem to solve, because users will have to somehow know what are going to be the kernel versions of the destination host when they decide to migrate their VMs, not even when they start their VMs, and they need to make this decision in advance, which is certainly hard. There are some other hybrid suggestions, like we can have a separate required minimal version of kernel for each machine type, and we'll be raising this bar slowly for users, which is a valid approach. Also, we say that we cannot enable PV features conditionally because of live migration, because you won't be able to migrate to older hosts, but maybe this is acceptable by certain users, and we can have a specific option for them, like limit migration to same or newer hosts. In that case, all PV features can be enabled. Unconditionally, all PV features supported by the source host. Another slightly crazy idea is to introduce a new PV interface to revoke PV features from guests, and the interface will be used upon migration. It will allow us to start a guest with a certain set of features, then revoke some because we are migrating to a host which doesn't have them. Or maybe we just need to do a better job documenting the features, because currently there is no documentation at all about native QVM PV features in QMO, so users don't really know that we introduce new features. Now, I'd like to say a few words about Hyper-V PV features, or so-called Hyper-V Enlightenment. They are very similar to native QVM PV features, but unlike native QVM PV features, nothing is enabled by default. The general advice for Windows guest users is to give them all currently implemented Hyper-V Enlightenment. Their guests will certainly run faster. If you try running Windows or Hyper-V guest without Hyper-V Enlightenment, you will quickly realize that some of these Enlightements are not really optional, because without them, your guests will run really slow. So what do users usually do in such a case? Well, they Google and they find some years old device to add two, three flags. They add them and their guest starts running faster, so they never update their configuration again. They will not do any research on what other features are available. So the real-world adoption of these features among users stays low. I should also mention that Nordic's existing have HV pass-through CPU flag. It enables everything supported by the current host. It's very similar to minus-CPU host for native QVM PV features, but similar to minus-CPU host. Generally, this is not migratable. There was an effort last year to introduce so-called HV default mode, which would enable a reasonable set of Hyper-V Enlightenment, so users won't have to specify them all. The problem was very similar to enabling native QVM PV features. What do we put to the set? If we only put the Enlightenment which is supported by Linux 4.5, then a lot of the features will be let out. If we start requiring newer Enlightenment, we'll be raising the required kernel version. The idea can of course be combined with setting the specific version for every machine type. That will know which Enlightenment can be enabled. Another problem specific to Hyper-V Enlightenment is that some of them are vendor specific. For example, we have Enlightenment VMCS, so-called HV EVMCS flag, which is Intel specific. Do we add this Enlightenment to the set or not? We can add it to the set, but then we'll have to enable it conditionally based on the CPU vendor. Enable it on Intel and not enable it on AMD, which is not standard for other CPU flags in QIMA. That was mostly it. Just to summarize, what I wanted to say is that we have a problem with low adoption of QVM PV features and Hyper-V Enlightenment. Fundamentally, the problem is caused by the architecture of our stack that consists of loosely coupled components which are built to support different versions of each other. The kernel stone of the problem seems to be live migration, which requires knowing in advance what features are going to be available across the cluster. What do I plan to do? First, I plan to finish this talk and gather some ideas from you. Introducing host platform is certainly worth trying because it has its advantages compared to just raising the required kernel version for machine types, which can also be used as an alternative approach if host platform, for some reason, is not going to apply. I also plan to enable these hardening features to limit the available PV features to what was specified on the command line, and if host platform idea is going to work, or if we agree upon raising the required kernel version for specific machine types, then we can probably resume HV default work. That was it. Thank you very much. I'd like to take your questions and suggestions.