 Hello, everyone. I'm Liang and I'm from Alibaba Cloud. Today, I want to share with you the traps of using hyperweave features in KOM environment. I hope our practical experience can help you know more about hyperweave-related features and avoiding some issues in practice. Let's get started. The content I want to introduce includes four parts. The first is the background and the performance issue encountered when using the hyperweave features will be described. After this, the cost analysis and the solutions of the issue will be discussed. The last part is the conclusion. In the public cloud scenario, Windows Guest occupies a considerable proportion, especially in the scenarios of cloud desktop and cloud gaming. Hyperweave is a advisor of Microsoft, which supports Windows Guest very well. To make Windows Guest works better, KOM simulates many features supported by Hyperweave. Currently, KOM has supported thousands of hyperweave features, such as HVTime, HVAsTimer, HVVOpig, HVIPI, HVSpeedNox, and so on. Most of these are to optimize the performance of Windows Guest running in the KOM environment. The common usage of these features is to turn on all of them. Many games currently on the market are running in Windows, and the same is true in cloud gaming scenarios. Cloud gaming work load has held some typical characteristics. For example, many game programs are modded and they use GPU to render game sense. It will take up more CPU and GPU resource when running. Through the virtual machine monitor, you can observe that its workload in the Guest generates a lot of APIs, and some can reach up to 35,000 per second. When we connect this Guest to the top through the Microsoft remote desktop client from Mac, we can observe more than one to many APIs. The frequency of each is about 100 per second. This one too many APIs is sent to some with CPUs of the virtual machine, and the HVIPI feature can be used to optimize the efficiency of virtualization. We found that as the performance of the same workload running in the virtual machine is much worse than running directly on the bi-matter. With HVIPI related features enabled, the average frame rate of the game running in the virtual machine drops by one frame. The production of the FPS, more than 55, decreased by 10%. This performance difference drove us to analyze the reasons behind the performance degradation. The first step we did was to compile the virtualization overhead incurred by running the same workload in the virtual machine with different configurations. In the test, we found that after enabling HVIPI related features, our virtualization overhead was significantly reduced. However, the degree of virtualization overhead reduction is not as good as a configuration method or disabling the HVIPI features. From the following two fingers, we can observe the virtualization overhead under different configurations, which respectively shows the total number of the M-Exit and the total time spent on virtualization processing. The result shows that enabling HVIPI related features is not the best choice. At first, we were either surprised by this. Through in-depth analysis, we found the reason. Through experiments, we can deduce the strategy of Windows guest chosen system timer. We expose the hypervisor CPID feature. S-Timer will be preferred as a system timer, followed by HPT and RTC. When all HVIPI features are enabled, S-Timer will be selected. If the hypervisor related features are not enabled, Windows will choose HPT or RTC as a system timer. When hiding the hypervisor CPID feature, Windows will give priority to the local pick timer as a system timer. While HPT and RTC have low priority. In addition, the enabling or HVIPI features depends on the hypervisor CPID feature. Hiding the hypervisor CPID feature will cause the hypervisor related features to be unavailable. Let's take a look at the virtualization overhead of different timers. First, look at the HPT and RTC. RTC is assimilated by intercepting support IO operations. While HPT is assimilated by intercepting member map IO operations. Both of them are assimilated in user space. In our case, that is in QMI. S-Timer is assimilated by intercepting MSR access. The processing of the grid compass is in the QAM. The virtualization of the local pick timer is assimilated by intercepting the related access of local pick. It is also assimilated in color space. Compact virtualization costs of the four types of timers. It can be found that cost of local pick timer and S-Timer is relatively close. And lower some HPT and RTC. There are two reasons for the high virtualization overhead of HPT and RTC. One is that the number of VMX8 is more. The second is that the context switch overhead of the current model and the user model. With above information, we can know why the virtualization overhead is lower when the halfway features are turned on. This is because virtualization overhead of S-Timer is much lower than virtualization overhead of HPT and RTC. On the other hand, why is virtualization overhead of hiding the CPID feature the lowest? This is because S-Timer also has some side effects which will increase the virtualization overhead in other place. For S-Timer, you need to know the following facts. First, the realization of S-Timer depends on HV, SYNIC, and second, the auto-end or interupt function of HV, SYNIC conflicts with a pickV hardware function. Therefore, when S-Timer is configured, the pickV hardware function will be disabled. This will lead to an increase in the overhead or interupt injection. For inter-CPUs, the virtualization or IPI leads to be trapped by intercepting the ICR register access. When the pickV function is turned off, injection interrupts into the VCPU because we am exit. Therefore, in a business scenario with intensive IPI, if S-Timer is used, it will greatly increase the virtualization overhead of IPI thereby affecting business performance. When the guest uses local pick as a system timer, the pickV function can work normally, so there is no such issue. There are several options for how to avoid the defects of S-Timer in the production environment. The first is to disable the hypervisor CPID feature in the scenarios where IPI is intensive. The advantage of this operation is the simplicity. But the disadvantage is that it also disables other halfway features. So, you cannot enjoy the benefits that other features bring. The second solution is to change the marketing of Windows to select system timers so that when Windows guest detects hypervisor CPID, the local pick timer can also be selected as a system timer. This method requires the support of Microsoft. The third solution is to resolve conflicts between HV S-Timer and the pickV hardware functions. This can be achieved by improving the four in three key points. The first is to disable the auto-end-or-interrupt function of HV Stanley, which can be achieved by setting the HV deprecating AEOI recommended flag as community code already supposit. The second point is to allow guest to use a pickV hardware MSR to access local pick instead of with virtual MSR. Because local pick access through virtual MSR cannot be accelerated by a pickV hardware function. The last point is to optimize the end-of-interrupt induced VMXC caused by S-Timer. By making the above changes, the S-Timer virtualization overhead will be closed to the local pick timer. So we can always configure S-Timer for HV without worrying that it will affect performance. This page shows the actual effect of the third solution above. All the features of HV are turned on, so all our virtualization overhead can be minimized. This is due to the optimization of the S-Timer virtualization overhead, and the local pick timer is equivalent. And the HV API feature of HV can significantly reduce one-to-many API virtualization overhead. One-to-many API is different from the all-but-self API. It cannot be accelerated directly by the pickV hardware features. Therefore, a guest send-ins such as an API requires multiple writes to the ICR register, which will cause multiple VMXC to be generated. And HV API uses P-V-V, which can generate O2-1, VMXC. Finally, let's make a summary. First of all, we need to know that HV related features in the KVM environment has some defects. In order to make it easier to use, we need to avoid letting users to config which HV features to use according to the workload. And new added features should not cause performance degradation. Otherwise, it will make the new feature less useful and increase the complexity of configuration. Secondly, it is necessary to pay attention to the features and the defects of HV, especially when using the old version of QM and QM. It needs to be evaluated according to its supported features. Blindly turning on all the HV features is not necessarily the best way. Before the defects are solved, it is better to do performance evaluation to find out the best configuration. For IPI intensive workload, especially attention is required. Okay, this is all I want to introduce today. Thank you for attending. If you have any questions, you can send an email to the address in this page. Bye.