 Have you ever tried upgrading your cluster and got stuck by migrating VMS because a certain CPU feature was no longer supported by the lead button, the QMU stack, and thus requiring you to power cycle those VMS? Then you're in luck, because myself, Soham and my colleague, Shivam from Nutanix are going to talk about similar issues. And we hope that we can pick the interest of the community to join us in finding solutions for such issues. So we'll start with a high level introduction to CPU features and how they're modeled and maintained across the lead button, the QMU stack. Bear in mind that this is not going to cover how CPU features are coming together to KVM via iOctyls or how CPU ID instructions are evaluated by the hypervisor. We will be strictly focusing to how CPU features are modeled and maintained across the lead button, the QMU stack. So whenever LibVert starts up, it looks for a specific file which is located in the wire cache LibVert QMU capabilities directory. And this file contains the different QMU capabilities. So, typically the file looks as shown in the slide. It contains the path to the QMU binary. And it contains other information related to the host that are obtained from QMU. And it has the different CPU features which are available on the host and are also usable or not usable. And in addition to that, it also lists different CPU models and the versions that can be usable by LibVert on this host. So, in case the file is not present, then LibVert will try to create the file and to create the file it will take help of QMP commands. And the first command that it uses is the query CPU model expansion QMP command. Using this, LibVert is able to get the list of usable or unusable CPU features from the QMU binary. And typically the output of a query CPU model expansion command looks as shown in the slide. It's a JSON which contains the list of CPU features and correspondingly whether they are usable or not usable. Next, it uses yet another QMP command which is the query CPU definitions. And using this, LibVert gets a list of usable or unusable CPU models from the QMU. Now, this CPU models are the ones which LibVert can use for describing the CPU of any guest frame. So, once the file is generated, LibVert loads the contents of its file and keeps it in the memory. And using this content, LibVert is able to give outputs for several commands and one such being the DOM capabilities. And typically the output of DOM capabilities looks as shown in the slide. It contains the path to the QMU binary and again it contains the different CPU models that can be used for describing VMs on this host. For example, here we see that it can be the VMs can be used in the pass through mode, in a host pass through mode where the CPU will be using the VM CPU will be using the host's CPU exactly. Else, we can provide. We have tried updating a cluster and got stuck by migrating VMs because a certain CPU feature was no longer supported by the LibVert and the QMU stack and thus recording you to power cycle those VMs. Then you're in luck because myself, Soham and my colleague Shibam from Nutanix are going to talk about similar issues. And we hope that we can pick the interest of the community to join us in finding solutions for such issues. So we'll start with a high level introduction to CPU features and how they're modeled and maintained across the LibVert and the QMU stack. Bear in mind that this is not going to cover how CPU features are coming together to KVM via iOctyls or how CPU ID instructions are emulated by the hypervisor. We'll strictly focusing to how CPU features are modeled and maintained across the LibVert and the QMU stack. So whenever LibVert starts up, it looks for a specific file which is located in the wire cache LibVert QMU capabilities directory. And this file contains the different QMU capabilities. So typical the file looks as shown in the slide. It contains the path to the QMU binary. And it contains other information related to the host that are obtained from QMU. And it has the different CPU features which are available on the host and are also usable or not usable. And in addition to that, it also lists different CPU models and the versions that can be usable by LibVert on this host. So in case the file is not present, then LibVert will try to create the file. And to create the file, it will take help of QMP commands. And the first command that it uses is the query CPU model expansion QMP command. Using this, LibVert is able to get the list of usable or unusable CPU features from the QMU binary. And typically the output of a query CPU model expansion command looks as shown in the slide. It's a JSON which contains the list of CPU features and correspondingly whether they are usable or not usable. Next it uses yet another QMP command, which is the query CPU definitions. And using this, LibVert gets a list of usable or unusable CPU models from the QMU. Now this CPU models are the ones which LibVert can use for describing the CPU of any guest frame. So once the file is generated, LibVert loads the contents of its file and keeps it in the memory. And using this content, LibVert is able to give outputs for several commands and one such being the DOM capabilities. And typically the output of DOM capabilities looks as shown in the slide. It contains the path to the QMU binary and again it contains the different CPU models that can be used for describing VMs on this host. For example, here we see that it can be the VMs can be used in a pass through mode, in a host pass through mode where the CPU will be using the VM CPU will be using the host CPU exactly. In other words, we can provide a custom model which testives the host best and in this example it's ISEX server along with extra features which can be enabled or disabled on this offer the VM. And in addition to this the contents of the file is also used by LibVert to perform some validation checks during the QMU process launch. So we have been talking about how LibVert leverages the use of QMP command to fetch all this information. Now to close the loop, let's also look at how QMU gives output to the QMP commands. For that QMU maintains the CPU definitions in a file called CPU.C and this is placed inside the X86 directory inside target. And this file typically contains the definitions for each and every CPU models that is supported. And along with several versions of those CPU models based on what CPU features they contain or what CPU features have been dropped from them. Now with this background on how CPU features are modeled across LibVert and QMU stack, we can start looking at few issues that we have identified at Nutanix in our production environment. The first major issue that we want to discuss is the feature deprecations and the example that we chose here is the deprecation of the Pecanfig feature. So Pecanfig was a non-virtualizable feature that got accidentally added in QMU 3.1 and correspondingly it got added in the LibVert codebase as well for the ISEX server CPU model in LibVert 4.8. But soon enough it was, the mistake was caught and the Pecanfig feature got removed from the QMU 4.0 binary and subsequently it was also removed from LibVert as part of LibVert 5.10 release. But by then the damage was done and we couldn't do anything about a migration path that we exposed whenever we tried to migrate a VM from a source host to a destination host, which has a LibVert version lesser than 5.10, but the corresponding QMU version was rather than 3.1. This is a well-known bug and has been reported in the upstream community as well. And moreover, the LibVert side fix that was added, as you can see, it simply checks whether the custom model provided is an ISEX server or not. In case the ISEX server, it will drop the Pecanfig flag and will not send it to the QMU command line. But the catch here is that some users might be using even lower-based CPU models like KVM64 for maximum migratable support for the VMs. And in such cases, this fix wouldn't work and the Pecanfig feature can't be hidden and they will be passed on to the QMU command line, which will eventually fail. We will continue with the feature duplications with yet another example and in this time it's the MPX flag. So MPX was a CPU feature which was introduced by Intel in SkyLake architecture and correspondingly it got added through the entire virtualization stack of KVM, QMU and LibVert. But it was later identified as not very useful and hence it was eventually removed from kernel 5.6. Subsequently, it also got removed from the QMU 4.0 release. But on the LibVert side, the MPX definition continued to stay behind in the ISEX server CPU definition and till that it's still there. Now we see two issues here. So even with the QMU side fix that drops the MPX flag, it can happen that there might be guest applications within the VM which were compiled with the LibMPX flag by GCC. And these applications will now suddenly start complaining because the MPX flag is now dropped post the migration. And the second issue that happens is due to LibVert maintaining its own CPU definitions. So on a physical ISEX server host, if we run worst capabilities, then LibVert uses its CPU definitions to identify the physical host CPU. And in this case, since the ISEX server CPU model definition inside LibVert contains the MPX flag, which is no longer usable, it identifies that the ISEX server's CPU definition cannot be used for classifying physical host. And as a result, worst capabilities will identify the physical host with an Intel model, which is from an earlier generation. And this is a known issue and has been reported upstream. The next issue that we want to highlight is the tight coupling between the LibVert and the QMU versions. And the example that we chose here is resulted around the Spectre meltdown timeframe and that to in a nested virtualization context. So we have here, I highlighted a kernel commit where whenever the Intel spec serial and the Intel STI BP flags are set, then the kernel having this commit will expand the spec serial bit to use the AMD specific bits for IBRS IBPB and for the Intel STI BPBIT to use the AMD specific STI BPBIT. And this we can see in the example. So let's say on a bare metal ISEX server host, which has LibVert 7.2 running on it and QMU 4.2 install, we try to power on a VM with the CPU in a host pass through mode. This can be seen in the domain XML and also in the QMU command line. Now, once the VM is powered on inside the L1 VM guest, the kernel has the commit that we discussed before. And as a result, if we perform on LSEP operation, we will see that the flags IBRS IBPB and STI BP are exposed. Now let's say we want to enable nested virtualization within this L1 VM. And for that we have LibVert 7.2 running inside the L1 guest. We do a worst capabilities. And we see that the AMD bits IBRS IBPB and STI BP show up as extra features reported in the worst capabilities output. So we have a slightly older QMU installed on the inside the L1 VM. And this is QMU 2.12. And to this QMU, if we pass these extra flags that we got as an output from worst capabilities, we'll see that QMU starts showing errors saying that IBRS is not a known property. This clearly shows there's a coupling between the version of LibVert and the QMU that we can use on a particular system. Next, we would like to talk about another issue or a thing of concern, which is the LibVert checks on validation checks on host and guest CPU compatibility. Initially, LibVert used to check the host CPU features by probing the host using the CPU ID instruction and it used to store those host model informations. And eventually it used to use those host model information to validate against the guest CPU configuration that has been requested to find out whether the guest CPU configuration and the host CPU configurations are compatible with each other or not. But then later on, as part of the patch set which is mentioned here, it was decided that LibVert should move away from probing the host. It should rely on the underlying hypervisor QMU to tell LibVert what are the host CPU features. And then LibVert could use those host CPU features to validate the guest CPU configuration that has been requested. Now, we then saw that this introduced a regression and as a result, there was a bug which threw error whenever some CPU configuration was requested for the guest VM which was not supported by the host according to QMU. So this got fixed in LibVert in the commit that has been pointed out and now what LibVert does is it uses a union of CPU information both from QMU and from probing the host using the CPU ID instruction. And together that forms the CPU information for the host and LibVert uses this information to validate the requested guest CPU configurations for the compatibility. But ideally LibVert shouldn't be using the host probing approach to get the host information and instead should be relying entirely on the QMU side information for the host. So we'll see in future in next future slides like how we can tackle this problem or is there in solution for this or not. So all this all the issues that we have so far been talking about can be root cause to one major point and that being there are multiple sources of CPU definitions across the LibVert and the QMU stack. And because there are multiple definitions sources of definitions, this causes the mismatch between the definitions across the stack, which has led to multiple issues that we have seen. And another issue that we see here is that QMU itself supports the versioning of the CPU models, but LibVert doesn't give users the freedom to choose the QMU models and the versions. And secondly, we see that maintaining multiple CPU definitions across the stack has increased the development time and the complexity to maintain the CPU features across the stack. Meaning that if a new CPU feature is added to QMU layer, then the same has to be added to the LibVert as well and this needs to be monitored continuously. And the reverse is also true, meaning that if suddenly a CPU feature has been removed from QMU, then someone from the LibVert community also needs to monitor that and remove it from the LibVert codebase as well. This always increases the complexity of the developer's workflow. So having discussed these issues, let's go on to discuss if there's any fix or solution available for them. Before moving on to discussing the fix, let's reiterate the problem statement. So what we see that QMU supports multiple versions for a single CPU model. For example, Islec server itself has got six different versions. But LibVert's worst home capabilities do not expose those to the users. And it simply gives us the base CPU model name. And hence, LibVert also doesn't allow its users to choose any version for the CPU definitions and only the base name models can be used for defining the CPUs. And this is again passed, this base model name is passed to the QMU command line and QMU then makes the decision of choosing the version based on the machine type. This again leads to a coupling between the CPU versions, CPU model versions and the machine type, which is something we would like to simplify. So what do we propose? We propose that LibVert should rely completely on the QMU CPU models. And it should be, so the worst home capabilities, command and LibVert should now be able to expose the different usable versions that are defined in QMU to the users. So as in this example, you can see if we now run worst home capabilities, we can get to see what are the, we can see the, what are the Islec server versions that can be used for defining CPUs for any VM on this particular host. And in turn, the users can look at the output from the worst home capabilities and choose the intended CPU model version and pass it to the DOM XML. And LibVert then looks at the CPU model version from the DOM XML and passes it to the QMU command line. Another thing that we want to change here is that we want to get rid of the CPU host and guest compatibility version validations. Those checks, we want to remove it from the LibVert side. And this will basically unnecessarily remove the burden on LibVert on probing the host for getting the host details and then comparing with the guest CPU models for compatibility checks. Instead, what we want to do, we want to add a new check in the LibVert layer called hypervisor. And this is something similar to as having the non-check, which basically skips the validation. And in addition to that, it will pass the enforced flag to the QMU command line, as can be seen in the example. So what this will do is that this will delegate the CPU model validation checks and the compatibility and the validation checks to the QMU. So how does the fix work? So first, we, using the fix, we are able to give a common CPU model and version for both LibVert and the QMU stack. With this, we will see that the issues around the P-config deprecation can be avoided. LibVert can now directly use QMU models along with the versions. And so the onus is on the QMU site versioning of the models to deprecate any older CPU features and then QMU and then LibVert can use those model versions to gracefully deprecate the CPU features. Secondly, LibVert no longer needs to maintain the responsibility of the guest and the whole CPU compatibility check. So what are the limitations of this approach? First, this approach only works for the LibVert and the QMU stack. It doesn't cover the other LibVert drivers like LXC and Zen, etc. And secondly, this is not the magical answer that will fix all CPU feature issues that we have discussed and many more that are still present in the stack. For example, the tight coupling of LibVert and the QMU versions is still not resolved by this approach. This is something which we will need further discussions and further opinions from the community to fix it. So in conclusion, we would like to conclude by saying that the main purpose of the talk was to pick the interest of the community in looking at issues and design around the maintenance of CPU models across the LibVert and the QMU stack. As part of this talk, we have tried to come up with a solution that simplifies the CPU model handling and its workflow across the LibVert and the QMU stack. We look forward to sending out patches of our solution onto the community mailing list and we hope to get some interesting feedback and improvements on our proposal. Thank you and with that I'm open to questions from the audience.