 Hello, I'm Nisan Kiyamahata. I'm working for Intel and working on KVM. In this session, I'd like to discuss about the status update of TDX KVM support as overview. In some topics, there are other sessions on technical details. So for such point, please attend other sessions. At the first, let's recall what TDX and why. TDX stands for Trust domain extensions, and it provides hardware-protected VMs. VMM from VMM or host family. They can see guest state, memory state, and also CPU state. This is basically building blocks for confidential computing. Recently, demands for confidential computing is raising, like recently, Linx Foundation created confidential computing consortium for it. And it means in-cloud environment, users want to protect their data in-cloud, even in-cloud environment, even from cloud service providers, not for other guests, but also the host side. There can be multiple ways for CSP to attack KVM. And TDX extensions compose of hardware extensions and also for software extensions. On hardware side, there are CPU, ISA extensions, and also memory encryption. And in software side, there are new firmware, which is called the TDX module. Let's move on to where to touch to support KVM for TDX. Basically, all related components need to be touched. Linx is host to load and initialize TDX module, the firmware, and also KVM needs to be touched, of course, and to use those extensions. And in KVM side, yeah, KVM also needs to change, especially in how to create a guest initial memories image and run it. The last component is guest kernel. Let's see one by one. The first touch point is Linx host kernel. TDX requires the firmware, which is known as the TDX module. And this firmware is a part of TCP, and it requires a special loading procedure. And it also, the TDX module tracks which page is protected, which means which page is used for protected guest and which is not. And it requires a large physical contiguous memory. That is something similar to Linx page struct. And it, due to NUMA issues and others, it requires complex calculation and to allocate physical contiguous memory. Also, via CSS, the version information of the TDX module is exported. And also, upstream plan is to allow runtime update of the firmware without reboot. And also Kexec and Kdump support is also planned for upstream. Basically, especially Kdump is very important for cluster analysis. And after Kdump reboot, basically the state of TDX module is unknown. So to bring back to operating working state into OS, to bring OS into operating state without actual power cycle or firmware reboot, we'd like to reboot by Kexec. And we need to bring the TDX module state into the initial known state by Kexec. That's a very tricky point. So let's move on to KVM. KVM, MMI, U also requires big change. The next, we have a dedicated next slice point. In normal VMX case, there are VMX instructions to operate on VMX guests. And in TDX case, the operation is done through SIMCore, the firmware code to the TDX module. And replace VMX operation with a SIMCore to the TDX module. And luckily, the x86 KVM already has operation table. So it mostly works without big change. We only update several new operations for TDX. Other change point is debugger support. QM supports GDV stuff. And the current implementation is QM directly reads, writes, memory state or CPU state. But for TDX, it doesn't work. So we need to introduce a new operation for KVM to snoop guest state and then to make QM to use it. Debugger support is a big feature. So this is split out from the first phase merge. OK, let's move on to KVM, MMI, U. The operation is a bit tricky. In TDX, EPT is also protected, which is called CETI-EPT. And KVM cannot directly access CETI-EPT. And KVM has to use SIMCore to operate CETI-EPT. And its costs must be the same. So for performance, we keep the existing conventional EPT as a mirror of CETI-EPT. So that the current existing EPT code can be reused. But it doesn't, CPU doesn't look at it. But CPU uses CETI-EPT. OK, so the big change, last big change is unmapping private space from user space. For security reasons, we'd like to remove the mapping of the guest memory in QM or user space. The current basic KVM design is to require user space mapping from to remove the user space mapping of the guest memory in QM or user space. The current basic KVM design is to require user space mapping for guest memory. When guest memory is required, it means EPT violation happened. Guest physical address is converted to host virtual address. And then get user pages or its variant is used to get corresponding host pages. Therefore, there are two proposals in the community. One proposal is to keep the mapping and reuse the existing page-resolving code. The trick is to make the protection model of the mapping as now, so that actually CPU doesn't work on page tables. And there is a working POC for now. The other proposal is to introduce a new object with file descriptor. And then remove user space mapping completely. It requires to change KVM to get host pages based on guest physical address. Discussion is ongoing, and we are working on the actual implementation. And we hope we can share our achievement. Last link changes, guest changes. Some emulation by KVM. KVM has to access guest memory. For example, in your case, KVM passes guest instructions and understands its address and data to be written or to be read. But in TDX, KVM cannot access guest memory. Instead, guest has to use hypercall for MMIO so that to provide all the necessary data to KVM. For that, guest can be directly privatized or guest can use virtualization exception. If guest issues some instructions that requires hypercall, pound b is injected back to guest, and then guest understands what hypercall is needed. And then issue hypercall on behalf of actual instruction. This is introduced to mitigate a privatization effort. And also for DMA, device DMA, a guest has to balance actual data to shared pages so that KVM or any back end, device back end, can access to emulation DMA. And the final thing is device filter. The basic assumption is guest doesn't trust hypervider. So guest doesn't... It means guest doesn't necessarily trust advertised features. For example, KVM can advertise KVM clock, but guest should not use it, but it should use TSC as clock source. TSC is fully virtualized by the TDEX module so that TSC can be trusted. Okay, let's move on to user space QM changes. That change is actually very limited. First touch point is to create a guest initial image, especially guest requires special initialization so that necessary information is built for guest firmware. Second touch point is to disable some features because TDEX doesn't support some features like SMM or reboot. For such features, those features need to be disabled and make them not advertise. And such advertisements will be done via CPU ID or SPR table. And also, for those features, for example, interrupt is only supported as H interrupt. So level interrupt is not supported. Some logic in interrupt controller, IOAPIC, or IOAPIC, it needs to be modified. Typically, in QM case, such unsupported features, especially for underlying hypervisor, they are silently ignored. For example, SMMI is silently ignored if underlying hypervisor doesn't support it. But it's not undeserved, so other to make it explicit. For example, for KVM to return error to QM, and then QM will log it or adjust about it. And another desired feature is GDB support. This requires medium-sized change. The patches will be separated from the first patches. OK. Just for completeness, in this slide, I cover other software components. There are four big change points. One big one is guest bias. It requires some special operation for EDK2 to support GTX. Also, bootloader requires some change, especially grab2. The most popular one is grab2. Another area is VM management layer. Revert and OpenStack, they need to know about GTX. For Revert, we'd like to say just this guest VM is GTX. And then, yeah, necessarily, QM command line will be built for GTX. OK, yeah, this slide summarizes the current status of the GTX support. OK, thank you for listening.