 Hello everyone. I'm Huaitong Han from Alibaba Cloud, virtualization team. What I introduce today is KVM memory cost optimization in Alibaba Cloud. The following is agenda. First is a background. Second is KVM memory cost proportion. The third is KVM gets the page tracking optimization. The fourth is KVM RMAP optimization. The last is evaluation. The first is background. During the VM's life cycle, the KVM will use a lot of memory. Take Alibaba Cloud's large memory instance with three terabit memory as an example. If 2M and a huge page is enabled, the KVM will use nearly the 8G bit of the memory. For a large number of smaller size VMs, KVM will still consume a lot of memory. They use a memory of KVM is basically the memory with memory salt. The issue must be addressed. This is the proportion picture of the KVM memory cost. The test VM with 2M huge pages is from Alibaba Cloud. It can be seen from this picture that RMAP occupied 79% of the memory cost. The other is only occupied 2%. The object of optimization is RMAP and GFNTracker. Let me talk about GFNTracker first. Because this is a relatively easy problem. What is GFNTracker used for? GFNTracker is used to track page access in guest with set page table. Currently only red access is tracked. The number of access pages will be recorded to the page tracker bitmap. But the page track feature is not necessary for EPTVM now. So memory can be located until it is actually used. This is a normal workflow of the GFNTracker. First, create a memory slot. Second, the then located GFNTracker memory. Where VM is running, the shadow page fault occurs. At this time, the GFNTracker bitmap would be updated. So our optimization is not located GFNTracker memory. It creates a memory slot function. But the two allocated memory with GFNTracker is actually used in guest MMO page. So for EPTVM, this part of memory will not be located. The following is KVM-RMAP optimization. KVM-RMAP pool is widely used in KVM-TDP-MMMU to accelerate the access of the shadow page table entry. R-MAP pool is from KVM-RMAP memory slot structure. R-MAP pool has three elements, PD-Level 40, PD-Level 2M, and PD-Level 1B. The following is a very long MMO page table. From the page table, we can find the PTE of the 4K page. The PTE of the 2M page is stored under the PMDE of the 2M page. The PTE of the 2M page is stored under the PMDE of the 2M page. PD-Level 1G element. So for a GFN page, only one element is used. The structure of the R-MAP pool would like this. For VM, use the 2M HEDE page. The PD-Level 1G element is not used. Most of the PD-Level 2M element is used. And a very small part of the PD-Level 4K element is used. The reason why PD-Level 4K is used is because some MMO is not aligned to the mega bit. And the HEDE page would drop to 4K bits. So our optimization plan is reduced. It is removed. I use the memory. Removed, I use the memory. We transform R-MAP pool into the following. The PD-Level 1G element, the 4K element are not located. Only PD-Level 2M element are located memory. When the 4K bit pages appeared in EPT-MMU, we were located a sub-page. We located a sub-page for R-MAP pool to record the shadow-page table entry point. Under the address, the virtual address of the sub-page is recorded in unused space. In R-MAP pool, the PD-Level 2M, there are two different types of the volume. One is the PgTable entry point of the 2M page. And the other is the virtual address of the sub-page. We regard the second bit as the sub-page bit to determine the attributes of the address. Because R-MAP volume is the 6th form, the bits are on the 8th bits aligned. So, the last 3 bits of the R-MAP volume can be used. The following is the R-MAP, the Traverse process for R-MAP pool, the Pg-Level 4K and the S-K pool in the volume of the address. The Traverse volume of the shadow-page table entry and the S-K pool and the Traverse. The process is the same for Pg-Level 2M and Pg-Level 1G. The following is the evaluate of the memory optimization. KVM memory cost has been reduced by 98%. Another time to traverse all the shadow-page table entry has been reduced by 99.2%. As we know, the TDP-MMU feature has removed R-MAP pool in upstream. This feature access shadow-page table entry has directly transferred to the MMU table. Here is the time, the comparison of the two features, Traverse all shadow-page table entries. The S-Level R-MAP reduced the time by 64.5%. The current problem is that the VM is migrated to the PMM. The all the 2M huge pages will be dropped into the 4K pages, which will increase the KVM memory cost. But when the migration is completely out of the field, this part of the memory will be freed. And for the one-by-one migration in host, the issue looks acceptable. Thank you.