 Hello, everyone. This is Jason Chen. I'm from inter-virtualization team. Thanks all to join my presentation about supporting TE on x86 client platforms with BKBM. First, I would like to show you a general use case of TE on client platforms. You can see in this picture, it's actually a scenario become more and more popular on client platforms. This scenario want to place one primary OS together with one or a few trusted execution environments, which is TE. While for primary OS here, it expected identical user interface as native, and at the same time, it also expected managing all platform resources. TE here is targeted to do computation computing and it does not charge the primary OS or other TEs. It need to be isolated by either specific hardware like TDX, SEV, or a hypervisor for platforms without such kind of hardware support. Basically, specific hardware support is not always there. If we talk about the hypervisor solution, we expected a hypervisor with small TCP which make it hard to be detected and it can be seen enough. At the same time, it should be transparent to most platform resources. Currently, the most popular hypervisor in the open source world is KBM. But KBM is working together with Linux kernel with over 20 million lines of code. So it has a very big TCP and at the same time, it's complicated for KBM to provide a solution of password or platform resource to one VM. So we need to do something to build a suitable hypervisor for this scenario. Google introduced protected KBM on ARM. It's used for an example of supporting primary OS plus TEE on Android platforms, so which is a typical use case in previous slides. It's designed to support TEEs as VMs in ARM non-secure world. It tries to isolate trusted applications which may not be trustworthy words to put together in a secure world. So the PKBM on ARM is taking use of ARM's non-VHE mode. It's split hyperwise path from Linux kernel with original running at exception level one to exception level two. It's a good try. So we saw PKBM progress in the community. It made us thinking of similar approach for enterprise phones. Then we do the PLC and bring the solution here. This page will give you an overview of PKM running flow. I will also elaborate some basic functionality part of PKM in the following pages. So first, PKM is supposed to launch. It's a binary built-in KVM model. It will be verifiable together with kernel image. Let's split from a kernel when KVM model get boot. So the splitting is also called deep bridge. As it will split KVM into two different bridge-level parts. So one is KVM high, it will stay with primary OS kernel at low bridge level. Keep running like native and still acting its VMM lows. The other one is KVM low, which is PKM hypervisor. It will run in high bridge level with super small TCP. With the PKM hypervisor support, we then can support trust guest for confidential computing during the run time. So the trust guest here is also named TEVM its memory and the status should be protected by PKM hypervisor. Another thing I want to highlight here is that DOS attack from the primary OS is not in the scope of PKM protection. So as primary OS owns VCPO scheduling and VM management, etc. So it's all because we want the PKM hypervisor be very lightweight and we do not want it to do any complicated system management to increase its TCP. Okay. Here come two some details. So this page is talking about the deep bridge. So for ARM and x86, the deep bridge step is similar, but with architecture different solution. So for ARM, it is using non VHE mode just like I mentioned, improve its price. So it can make primary OS kernel running at year one at the very beginning. And during the deep bridge, it just install PKM hypervisors binary to year two. And for x86, its virtualization technology is based on VMAX operation mode. So originally, the primary OS will first run at VMAX root mode. And during the deep bridge, so we actually need to do real deep bridge for primary kernel to VMAX non-root mode, which means we will make this into a primary VM. And at the same time, we will keep PKM hypervisor running under VMAX root mode. Okay, it's important for PKM to transfer and platform resource to primary VM, just like I mentioned in previous slides. So we have common mechanism for ARM and x86. So we will do identical memory mapping for primary VM, and we will pass through system memory to primary VM except PKM's code and the date. And then we will pass through MMO to primary VM except our MMO, which should be maintained and managed in PKM hypervisor. And then we will also pass through interrupt to primary VM. So based on this, primary VM actually can directly manage all system resources through native device driver and native ACPI drivers. And also it can take use of native system service like OSPM to manager platform resource, platform powers. Okay, so TE VM's memory need to be protected, as we know it's very important. So we should maintain each memory's ownership in PKM hypervisor during its memory transition. So currently x86 is using same mechanism as ARM, which is to maintain each pages ownership or page status in EPT page table. So for ARM, it will maintain its pages ownership in stage two memory transition tables. But there is some difference for the implementation for ARM and x86. So for ARM, it tries to do memory transition through HEVC, which is HEPCORE. It will be checked by KVM-MMU, which means it will change part of KVM-MMU code. And for x86, we do memory transitions through EPT shadowing within PKM hypervisor. So the benefit for it is it will avoid to change KVM-MMU code to reduce the possibility of creating bugs in KVM-MMU. But you know, at the same time, we also see similar operation as ARM for inter-TTX solution, which will changing KVM-IS-MMU code to do same core for memory transition. So we may need discussion in the future to see if PKM-MX86 need move to use HEPCORE for memory transition as well. Okay, inter-upper handling is straight forward. We have common mechanism for both ARM and x86. So as you know, we pass through all physical interrupts to primary VM. So primary VM were manager or external interrupt. Then it will check if such interrupt should be injected to its guest. If yes, it will do virtual interrupt injection through virtual interrupt controller, which is related by KVM-Hi. So we say virtual interrupts are fully managed by KVM-Hi in a primary VM. Okay, MMI handling also has common mechanism between ARM and x86. So basically we say general MMI operation will still be done by VMM in primary OS, just like a Lexi VMM. And to support the virtual emulation, we need T-VM explicitly show memory primary VM for what we have quick buffer accessing from the backend. And the one specific things for x86 is instruction emulation. So as normally x86 need to do instruction decoding and the emulation in the host after MMI-O violation to figure out what needed to do for an IO request. But you know, T-VM forbid this as its instruction memory is isolated to the host in the primary VM. So the solution here is that we will leverage the solution from T-DX software by moving instruction emulation into T-VM. So after it figure out the valid IO request, it then explicitly to a guest hypercore of this IO request to the VMM in the primary VM for the final IO emulation. Okay, DMA protection is much also very important to protect the T-E's memory from DMA attack from compromised the device. And the protection is done by comprehensively use of for IO MMI-O. We see similar requirements here. So we need virtual IO MMI-O in primary VM to support the untrusted device isolation and the device isolation to its launched guests. We also need PKM hypervisor to own physical IO MMI-O to finally ensure the device isolation to T-E VM. And as we mentioned in previous slides, PKM hypervisor is recording its page status or its memory ownership in the EPT page tables. While our MMI-O page table also show its ownership to access a page. So we sure have a good mechanism to allow ownership for a specific page among these two different page tables. To meet these requirements, we provide a solution for X86 based on VTD scalable mode. So you can see primary VM will see a virtual IO MMI-O with only first level page table and it will fully own it. And PKM hypervisor, it will own physical IO MMI-O and it can work under nested mode by directly to use faster level page table in primary VM and also use its own second level page table. So this is typically used for normal device, IO MMI-O page table in a normal VM. And for T-E device for the T-E VM is IO MMI-O page table in the PKM hypervisor can work under nest second level page table only mode. It will shadowing first level page table in a primary VM to second level page table. Okay, all the second level page table in the PKM hypervisor, we will unify it with EPT page table to simplify the page on the super management. And as for ARM, now we see solution based on a very simple hardware model, which is S2-MPU, Loflunay is stage two memory protection unit. But it should not be a general IP. The solution based on ARM's general IO MMI-O, SMM-MU, is still to be updated to me. Okay, this is a summary of all previous slides I mentioned for some architecture details. So it's a key architecture comparison between ARM and X86, but I will not detail each item, which I already mentioned in previous slides. One thing I want to highlight here is for guest attestation. So for guest attestation for X86, we are still working in process, but we will try to follow similar solution as ARM, which described by Will Dickens in KVM for long 2020, which will simply use a template for the loader. Okay, okay. This page give you an overview of PKMX86 architecture. So basically, we have seen hypervisor PKM here. So which only own necessarily hardware model, like IO MMI-U, VMCS, and the EBT. All these models will just use to help to do isolation of TVM, and the primary VM here on the left top will own all left resource, and the directory and manager then, just like it was in native. And also primary VM still playing VMM loads, it will run its guest based on VTIRM-MU, VTIP-T, and the VTIVM-CS. So both normal VM and the TVM's here on the right top are running like a guest of a primary VM. The VM exit for the normal VM, where first they exit to PKM hypervisor, then directly forward to primary VM for handling. And for TVM, the VM exit will also exit to PKM hypervisor, then forward to primary VM for handling. But we add security enforcement in the PKM hypervisor to ensure there is no sensitive data leakage during this kind of operation. Okay, this page and the next page will show you a basic performance evaluation result for primary VM and the normal VM on PKVM. So let's test the covered IO, CPU, and memory. So as you know, for primary VM, we pass through almost everything, which include the interrupt. So it means the VM exit from the primary VM will be largely reduced. From the result, you can also see they are very close to native. Okay, as PKM hypervisor is very thin, and for a normal VM, it's VM exit will just direct forward to primary VM. So we can say the penalty of VM exit cost in PKM hypervisor is small. So based on it, we see test result of partial IO, vector IO block, IO and the CPU memory are also very close to VM running on top of native KVM. Okay, this is the last page. I want to show you in this presentation. So it's about our state update for PKVM on X86 and also our next step in the future. So currently, we already can deploy your primary OS and also to run normal VM. We simulated and pass through IO, which is based on VTI or MMU. And also we can run TVM with memory protection in PKM hypervisor. So in the future, first, so we will publish our PKVM IS wrapper in the GitHub. And also we will try to do the discussion in the community to allow common framework for PKVM, for both X86 and ARM and also some platform in the future. And also we will continue to support TVM features like we will support partial IO based on VTI or MMU. And we will support VTI IO based on show memory and IO request for TVM. And we will also support security enforcement to ensure there's no sensitive data leakage from the TVM during VM exit. And also we will support TVMs with guest attestation. Okay, the last one I want to highlight here is later. Our target list of code for PKVM or IA will be less than 25,000. So we will keep it as a small straight and then make it's TCB as small as possible. Okay, that's all. Thank you, everyone. We can start our Q&A.