 Hello everyone. Today, my colleague June Nakajima and myself Ravi Sahita will talk to you about the architectural extensions that we are working on to advance confidential computing for public cloud environments. Specifically, we'll talk about Intel TDX or Intel Trust domain extensions. So we'll cover the following items. We'll cover the cloud thread vectors we're trying to address with Intel TDX. We'll look at quickly an evolution of the cloud workload isolation. Then we'll dive into Intel TDX. We'll try to cover the key architectural building blocks, the thread model and coverage. And finally, June Nakajima will cover the software implications of Intel TDX. Let's look at the cloud thread vectors quickly. The first area we're trying to address are for cloud tenants that are worried about exploits that may lead to loss of their data. At the same time, providers are worried about privilege escalation that may lead to loss of control of the infrastructure. The second key tenant we're trying to address is cloud providers where they want to address the fact that they don't want the visibility into the customer workloads. They want to make sure that the data that's in use is always encrypted and addresses the key privacy and security requirements. And then the third is that we're trying to address in terms of the usage models is to be able to support isolation models and confidentiality models that go across applications to virtual machines and eventually also to containers and microservices. Let's look at how the cloud workload isolation has evolved. So today, we already have Intel virtualization technology, which is used by virtual machine monitors to isolate hardware virtual machines today. And Intel TXT or Trusted Execution Technology, which can be used to measure the VMM on launch so that its measurements can be reported. We also have the capabilities of total memory encryption and multi-key total memory encryption that can be used to encrypt either all the physical memory on the system or under the VMM's control selected memory on the system at a page granular level. And that can be enabled either just basically through the BIOS, for the total memory encryption or additionally through the VMM for multi-key total memory encryption. In both of these cases, however, the VMM remains in the Trusted Computing Base or TCV for both the cloud providers and the cloud tenants. Let's look forward at what other options our capabilities Intel has provided for removing the VMM from the TCV. So first, we have Intel Software Guard Extensions or SGX, which removes the OS VMM from the TCV. It requires enabling of the application or some library OS along with the OS and the VMM. Today, we are introducing Trusted Domain Extensions or Intel TDX, which essentially operates at the VMM's root level and can be used to remove the CSP provided software and the VMM from the TCV while isolating the VMM as the container boundary. And we call such an isolated VMM that's isolated through Intel TDX, a Trusted Domain or a TD. The important aspect of Intel TDX is that with the right enabling done for the VMM and the operating systems, the goal is to not require any changes to the applications to be able to protect them in a confidential manner. So let's start diving into the Intel TDX architecture with some key goals or scope. The first is to provide confidentiality access control and integrity protection for VMS that run as TDs or Trusted Domains on an environment for which a VMM has been enabled. We want to maintain the resource management role of the VMM and that also means ensuring that neither VMs or legacy VMs or new TDs cannot launch a privileged escalation attack on the platform. Some attacks though are out of scope since the VMM retains the resource management role on the platform, any VMM induced denial of service is out of scope and also some hardware adversary attacks such as memory replay are out of scope. Let's start diving into the building blocks for Intel TDX. We start with the CPU ISA. The first key capability is a CPU mode of operation called secure arbitration mode which is used to host a Intel TDX module. The Intel TDX module is software that runs using the CPU seam mode of operation and is protected through capabilities that the CPU provides including a specific set of instructions that are used to enable guest host interactions. The TDX module uses this CPU mode to host the security functions that are exposed to the VMM and in order to protect the TDX module this mode restricts the use of certain ISA so the TDX module can use those instructions. It also provides range registers to protect the TDX module from other host software on the platform. The TDX module is loaded into this range register protected region through an Intel authenticated code module also called a seam loader. Further the TDX module is also protected for against physical attacks or using the total memory encryption engine to ensure the confidentiality and more importantly the integrity of the contents of the TDX module while it is executing. So let's look at how the secure arbitration mode interacts with the legacy state machine. So what I'm showing here is sort of the traditional VMX operation where the CPU essentially through VM exit and VM enter transitions between VMX root mode and non-root mode of operation. Where the secure arbitration mode comes in is as shown on the right hand side where software can enter the seam VMX root mode of operation through a seam call and can exit seam VMX root mode of operation and come out into ordinary legacy VMX root through seam red. VMX and VM enters work almost similarly as ordinary VMX and VM enters in the seam mode of operation with some subtle differences that during the CPU being in the seam VMX root mode of operation interrupts like SMIs are kept pending and they may be taken when the CPU is in seam VMX non-root of operation and are even inhibited until the seam returns to the legacy VMX root mode of operation where they are un-pended. This picture on the left hand side shows that interaction that most of the interactions with opt-out SMM legacy SMM operation or opt-in SDM operation remain unmodified with the interaction of seam mode. So with those three key capabilities defined the Intel TDX module the ACMs used to load the the seam loader ACM used to load the TDX module and obviously the CPU hardware. The fourth key capability that sits in the trusted computing base for TDs is the is a coating on clip and as I described in a later file we leveraged the coating and attestation infrastructure that Intel SGX offers to make it simpler for for you know folks using Intel TDX to leverage the seam attestation infrastructure as Intel SGX. As this picture shows all of the other remaining capabilities on the platform including devices all other host software platform firmware BIOS SMM the host OS and the VMM etc are all not trusted by the TDA and outside the TCB for Intel TDX. So let's dive into the next building block which is memory confidentiality. So I'm showing a simple picture here of of an SOC with with memory exposed and the MKTME or the multi-geed total memory encryption engine is is reused as a building block for Intel TDX and that's programmed through an instruction called pconfig to program the the key IDs in the memory controller and the MKTME engine. The for Intel TDX what we do here is that the key ID space that that is used to program the the MKTME engine is effectively partitioned into a TD private and a shared key ID space. This allows the VMM to select what key IDs to use for TDs and what key IDs to retain for legacy VMs and this partition is configured at system BIOS time system initialization time through BIOS and is verified and locked down by Intel and check. The key important mechanism here as I was referring to in the previous slide is some ISA is restricted in some operations for the Intel TDX module through the C mode of the CPU and pconfig is one specific example. The partitioning of the the key ID namespace is enforced to allow the TDX module to be able to use pconfig to program the keys for TDs while the VMM can assign key IDs as shown in this picture but cannot actually access or program those keys through pconfig. A related point also is that the VMM may optionally retain some key IDs for legacy VMs and that's you know that's really an opt-in capability for the for the VMM. Let's look at the next building block which is memory integrity and as shown by various researchers it is not sufficient to just provide confidentiality through encryption for for for the correct operation of the TDs. It is also important to provide some level of tamper resistance against against attacks that untrusted software may perform. So that is enabled through the through memory integrity capability in the MKTV engine and that is as I said to protect against various forms of memory modification tamper relocation splicing and cross TD corruption. Memory integrity is enabled in the MKTV engine through a shatter-based MAC. It is used in a in a truncated fashion with 28 bits of the of the integrity MAC being retained and that is managed through the DDR5 metadata. Further the metadata also contains a TD owner bit which ensures that the untrusted software does not have any access to the to ciphertext even when a memory when memory is under use by a TD and any attempt from untrusted software to read memory that belongs to a TD will essentially return zero data and depending on the the the type of access performed we also poison the cache line. At the same time we also want to protect the the platform from TD so the VMM can ensure through the through the TDX module that any memory used by the TD is correctly initialized first using direct stores and and this protects the VMM. Further any as I said in the previous point any untrusted software or hardware writes to the to the memory corrupts the MAC and therefore protects the TD. So we achieve this sort of dual goal of protecting both the VMM and the and the TD. Next let's look at how the private keys are managed both by the TDX module and the CPU as TD is operating. So if you recall I described in a previous file that the TID space is partitioned between private key IDs and shared TID space. The private TID space are managed by the TDX module and it tracks their allocation to ensure that the VMM does not misuse a key ID that has been assigned to a particular TD. And that TID is loaded by the CPU during VM entry as the virtual CPUs for the TDs are executing on a logical processor. The TD expresses its goals of what data is accessed as private and what data is accessed as shared through a new architecture convention that is expressed through the IEP aging structures managed by the TD through the bit 51 or 47 depending on the physical address available on the system and is expressed through a control bit called the shared bit. And the shared bit if cleared specifies that the TD wishes to access private data through those through those guest physical addresses and for that the view automatically uses the private key ID assigned by the VMM and tracked by the TDX module. If the shared bit is set however the CPU will use the VMM specified MKTME key ID to access any shared data and that example usages of that may be parallelized IO data. Now while this convention is in place it's also important to enforce that the malicious VMM cannot modify the address translation structures or specifically the extended page table second level address translation structures such that the TD may access incorrect data. And the Intel TDX module and the CPU both work in conjunction to mitigate this attack specifically when TD private GPAs are being translated. So TDVMs unlike a Lake CVM have a second EPTP or extended page table pointer called a secure EPTP and that references a secure extended page table that will be used by the CPU when the TD is accessing private memory. So private GPAs always get walked through the secure extended page tables whereas shared GPA is expressed through the shared bit being one the GPA get translated through the shared EPT structure and that both of these walks are same as legacy walks however the security properties are enforced by the TDX module on the secure extended page table. And the CPU further also enforces through appropriate tags in the TLB that any translations combined translations that are cached through the security are tagged as being derived from the secure EPT in the same non-note mode of operation. An important point to note here that whenever the CPU is enforcing address translation for TDs it enforces that code fetches from the TD or any address translation always occurs in TD private memory and if that's not the case then the CPU throws a fart. Now let's look at the next building block of physical memory management or specifically guest physical management and how the TDX module enforces that memory assigned to a TD is is tracked appropriately. So all the physical memory that a VMM may assign to a TD is tracked by the TDX module through a data structure called a PAMT or physical address metadata table and this structure is through which the TDX module enforces that a page can be owned by only a particular TD and that it's in the right state based on the memory management operation being performed by the VMM. This structure is simply a bookkeeping structure and is not blocked by the CPU. Any properties that are book keep through this book kept through this structure are enforced through the security mappings for the TD private memory and thereby we ensure that there is no additional latency for a TD pagewalk while enforcing the security properties. The VMM assigns memory for TD dynamically and that all comes through the VMM managed memory through the scene called APIs that the TDX module in exposes and the TDX module use of this TDX module APIs ensures that the TD ephemeral key gets used to ensure the integrity of the ZPD structures so that they cannot be tampered with while guest private memory has been assigned and mapped through the ZPD structures. Because the PAMT structure maintains information at 1 gigabyte, 2 megabyte and 4 kilobyte levels the TD private memory can be mapped in any of these sizes. Also the TD private memory may be relocated or relocated by the VMM to support various NUMA optimization strategies the VMM may already be used. Last but not the least let's talk about the building block for supporting of attestation. So as is clear for most confidential computing use cases without a relying party being able to attest that a workload is running inside a TD it's kind of pointless to to do any confidential computing. So we leverage the same attestation model and infrastructure that we have for Intel SGX and this flow may be familiar to a lot of you but I'll just summarize it quickly here. The process starts with a challenger requesting a TD to prove its authenticity to prove that it's running as a valid TD with the expected measurements on an authentic Intel TDX platform and the TD can add at its own data as students step through to through a TD call request a local report for for its state. The that local request into the TDX module is translated by the TD module into a CMOPS invocation and the CMOPS leaf used for this purpose it's called CM report and that generates a locally macked report which contains both the measurement of the TDX module as measured as measured during and recorded during load time and the measurement of the TD that is managed by the TDX module. That locally macked report is then passed back to the TD which in step six is shared with the with the VMM since this mack is this report is locally macked it can be verified on the platform by a quoting enclave and converted into a quote signed with the attestation key and that quote is then returned to the TD and then may be returned to the to the challenger which can share it with with a relying party that can validate that the court is coming from a valid Intel TDX platform with an expected Intel TDX module version and the measurement of the TD that it expects. The relying party can then proceed to to provision any secrets etc. onto the onto the TD and allow the workload to proceed. So in that understanding of the base building block for Intel TDX let's start looking through the the thread coverage based on the thread model we presented earlier. For these different scenarios we will walk through the software adversary attacks the hardware adversary attacks and so on and in each of these cases we will note what we consider the the attacking entities to be. For the software adversary attacks we consider the any of the CSP software which includes the VMM any other colluding TDs on the platform the system operators etc to be in the scope of the attackers and the first attack last we look at is the software attempts direct access to the TD private memory whether to read the contents or to access the ciphertext or to inject content into it and that's mitigated through both the culmination of the access control properties of the TD one orbit and also additionally the the FMRL key based memory encryption and integrity. The next attack vectors are secondary sort of attack vectors which is through Rohamer or through using system system address map aliases created by malicious bios and both of those are addressed basically through the memory integrity mechanisms but also for preventing the malicious aliases through an additional alias check enforcement authentication authenticated code modules from Intel. The next class of software attacks fall into the next these two buckets through the address translation attacks through software based EPT remapping and that's addressed through the security architecture is right as well as any malicious interruptor exception injection by the VMM to essentially cause invalid execution inside the TD and that's mitigated by the TDX module by ensuring that it protects the virtual epic page thereby mitigating any violation of TPR levels or virtual NMI blocking etc by the by the VMM and also protecting obviously the TD control structures through that same confidentiality and integrity protected memory further the CPUs also enforces disallowing any external interrupts with vectors that are reserved for exceptions and that's a change over the previous VFX architecture. The next area of threat coverage we will cover hardware adversary attacks and in this case we add to the attacker set the the ability for the VMM to induce devices to perform these attacks and we'll first look at the you know sort of DMA attacks coming with the use of these private key IDs effectively the VMM using a DMA from a device rogue device as a conduit and the mitigation in the platform is to essentially prevent those DMA access that contain the private key IDs in the host physical addresses. The next set of online hardware attacks are the VMM trying to use the devices to either inject data or move the data from from one location to the other across TDs and that's mitigated through the FMM key base memory encryption and integrity model however any replay of memory hardware replay of memory that targets the same physical address for a specific TD is not protected against in the current generation. Finally looking at some of the offline DRAM attack methods those are basically mitigated by the memory encryption using the TDF MRL keys. Let's look at the next class of attack which is attacks on the attestation infrastructure or the TDX module itself for example by rolling back the TDX module to a prior version which may have a vulnerability. That's mitigated essentially by ensuring that the CME loader ECM will prevent such downgrades because of verification of the security version number of the Intel TDX module. Further the relying party that verifies the attestation can also use that mechanism to verify that the correct TDX module isn't used by the platform. In the other attack case where a tamper TDX module is loaded by the VMM it's also addressed partially by the CME loader ECM verifying the integrity of the module at launch time and recording those measurements through the CPU SVN but it's also important to ensure the runtime integrity of that module and that is enforced by the CPU for in terms of software access control using the range register the CME range register and also from hardware attacks as I described in the previous slide through the memory integrity mechanisms. Last but not the least we also have to consider since we are considering system software that's outside the TCP we also have to consider side channel attacks that may be attempted and we look at these different classes the first one is the poisoning of the branch prediction units to extract you know side channel information through the through the cache and that is essentially prevented through the mechanisms we have already talked about such as using IVPB and IVRS or branch prediction barrier methods that are enforced by the TDX module during the CME call CME rate transitions. Further the TDX module also enforces the isolation of the spec control MSR for TDs to ensure the selected properties through that MSR are enforced. The next you know capabilities that can potentially be used as side channels or performance monitoring and debug right both of these capabilities are optionally available to the to the TDs that choose to turn on perfmon and debug and if those capabilities are enabled then the appropriate perfmon information and debug information is isolated by the TDX module for the for the TD. Further the information that the fact that the TD has opted in to debug and perfmon is reported through the through the TD attestation mechanisms so that it can cannot be malicious enabled by untrusted software. The last couple of side channels are EPT fault information that may be extracted as a TD is initializing as well as you know broad cache based side channel attacks like Prime probe and those are not mitigated in the current generation so with that I'll pause here and hand it over to my colleague June Nakajima who will walk through a summary of the architecture building blocks and continue discussing the software touchpoints for the TDX. Thank you for listening. Thank you Ravi. Now we're going to talk about software therefore required to enable TDX. Before jumping to the details I'd like to recap what Ravi shared so far. If you look at the data structures shared in the if you look at data structures here for example the CPU state virtual epic page or CQB extended page table and then various VM control structures they are maintained in private memory which is protected by the CPU and invisible to non-TD system software including the VMM itself and if you look at the TD the memory itself is in the private memory and Ravi uses the access control such as TD owner bit and then also power TD private key to mitigate VMM attacks from modifying observing the tenant's memory whether in a cache or in DDR. Let's take a look at software implications or solve the effort with touchpoint or Intel TDX. First of all the TD. TD starts from a guest firmware we call TDVF just a TD virtual firmware. It needs to accommodate private shared memory and then also need to enumerate the TD capabilities and then we simplify the implementation initialization of the CPU booting. So those changes are required in the guest firmware. Also the TD the guest OS it's a word kernel mostly all are in the kernel areas. It has to be it's a virtual exception that exception handler it will be generated by the hardware and then for various cases for example handle instructions that are not supported in the TD and that would be that would generate the HBEE. Also the OS it's a need to explicitly tell about the shared memory by default all memory is the entire memory is private memory. Okay. Also since the private memory cannot be accessed by the host BMM the TD guest needs to use bounce buffer for data movement. For example that the BMM needs to copy the data or read the data to the shared memory first then the guest what TD needs to copy from shared to the private memory. Okay to that end we need a DMA API changes in the TD. So there are many touch points in the BMM. The BMM needs to use a TDX module interface for API to manage the TD life cycle. For example assigning key IDs, allocating and then use setting control structure and also setting up the secure EPT and then modifying secure EPT entries. And then if you look at the boot loader or the platform boot loader BIOS it needs to enable MKTMAi and then partition the key ID space between MKTMA and then the TDX and need to load M-check and MCM loader. Now take a look at a software deployment auto. Although TD memory is encrypted so it's kind of special we can use TD for existing models without modifying the upper layers like applications. As long as we have TD enlightenment basically enhancement or modification in support of TDX in the OS especially in the again the kernel. For example this is the typical most kind of conventional case basically you have a full operating system and random operating system within a TD or container. Container in the operating system and then if you have guest OS with TD enlightenment then you can run that unmodified container inside a TD or even smaller ones. And in the future we may be able to support unmodified legacy OS and more IO interactions. Now switching gear to the KBM I will briefly mention the key beam touch point. This will give a good you know more specific example what we need to do. So starting from our TD initialization on the host at the BSB we launch seam loader SCM module and then at the boot time configure TDX module on the old CPUs and then the KBM corporate one thing we need to do is TDX and BMX coexistence. That's a bit complex but I think we can achieve that. Also we need a summer modification to the BMX to support interrupt handling and like I said before the TDX requires the BMM to use TDX module API for various operations so we need to add the code to use TDX module API and when we're doing so we want to reuse existing AMD SCB IO control code as much as possible. On the MMU side we need to add shed or the primary memory handling. Also it's good to MMM guest primary memory from the kernel or the user space BMM because if the kernel or user memory user sorry user BMM accidentally modifies the guest memory that can be captured by the integrity detection and that can cause machine check. We have more on the MMU side so we added the secure EPT and like I said secure EPT itself the page table would be in private memory and the BMM cannot modify the BMM the EPT entries directly and for that kind of case the the BMM needs to use the CIM call CIM API to modify the EPT entries and also the BMM needs to set up the EPT page table to generate the hash BE especially to support MMIO and MMIO upon the MMIO the emulated MMO in the guest that would generate hash BE in the guest and then the guest needs to handle the hash of the MMIO from there okay and then the private memory management there are various data structures that need to be allocated in private memory and then and the primary memory must reside in a trusted domain memory region TDMR. We have more details at the KBM forum Friday tomorrow and then we'll be presenting TDAC's enabling port KBM. Now switching gears to the guest Linux guest starting from actually guest BIOS. Guest BIOS is not a part of Linux but we need to add the TD support to the guest forum here or TDVF and the sub the changes required there will be basically subset of changes required for the TD Linux but one important role of the guest BIOS is the measurement and the attestation of the Linux on the guest itself okay I don't cover that measurement and attestation and hear the details but I'll basically talk about the Linux switch to the Linux and the Linux side we simplified the booting and initials in its state is different in a TD so we need a modification to support that thing and then also TD can get a TD execution environment from TDX module so the Linux needs to have some changes to accommodate such modifications more importantly the Linux needs some TD specific modifications we call TDX enlightenment basically the Linux needs to know whether it's running in a TD or not and if it's running a TD it needs to take a different code path for that we encourage to use GHCI and it we propose as a spec cover up talk about more on the next page so GHCI is a guest hypervisor communication interface and it covers the changes for booting because of TDX support and TDX enlightenment the guest and then defines the API or services from the BMM as API we have a working group the GHCI working group it's of the goal here is that ensure the kernel Linux kernel is built into the single Linux TD binary that can boot and operate in a major operating system major BMMs today and common interface in a GHCI and implementation consistent implementation across the BMMs the key to achieve the goal now we are almost there this is probably the final page and I just want to recap what we presented so far today this is a hardware we have also firmware then firmware does mcheck let's get it mcheck and then load boot loader load the same loader as a SCM then eventually BMM comes up and it can run the legacy VMs and legacy VMs is a VM exit VM mentor now for TD we need a TDX module and then uses new instructions same call same rate then to create a TD we need a private memory mapped by TDMR protected by TDMR and we need to create TD related data structures like SEPT or TDCS and also TD memory cell in the private memory okay now we can create more TDs or again those would be the key data structure need to be in a private memory and whenever the BMM needs operation against the TD related data structure it needs to use a same call basically API okay we also talk about the TDX GHCI against hypervisor communication API okay and finally we didn't cover this one but we have ISO leaves for attestation purposes thanks John so in summary we are developing Intel TDX to scale confidential computing capabilities for the cloud while we try to reduce developer friction due to recompile refactoring etc and we really look forward to continue working with this community and appreciate all the feedback that we've already started getting the links at the bottom are the references to the documentation for Intel TDX as well as the source trees for the KVM changes and the next guest changes that we're proposing thanks for listening again and talk to you soon online