 Hello everyone. My name is Ravi Sahita and my co-presenter today is June Nakajima. We will be talking about the live migration architecture for confidential VMs implemented using Intel TDX. So let's get started. I'll start with a brief recap of Intel TDX just to help remind people of the base architecture for Intel TDX. We had a talk at the Linux Security Symposium last year where we introduced Intel TDX and some of the security properties. Then we will dive into the live migration architecture. I'll cover the goals and the high-level security and functional properties or requirements. I'll touch upon some of the new components introduced in the TDX architecture to support live migration as well as the new interfaces that are supported by those new components. And then I'll describe the use of those interfaces for the host software through a lifecycle of the live migration process. We will then touch upon the threat model and the medications of those threats as supported by the architecture. And then we will also cover June. We'll then dive into the important aspects of the software implications for host software, specifically KVM. So a quick recap on Intel TDX. So the TDX architecture is targeted to remove a large part of the cloud service provider from the TCB and protect cloud tenants running as virtual machines in terms of their confidentiality, integrity for the workloads supported by those cloud VMs or TDVMs. It builds on the existing Intel ISO for virtualization and adds capabilities in terms of platform capabilities for memory encryption and integrity through a multi-key total memory encryption engine. The expected enabling requirements for Intel TDX are targeted on the platform and guest firmware as well as the host software specifically the VMM and the TDOS that runs inside the TDVM. We do not expect to have to modify applications or you know drivers extra that run inside the TDOS to support the use cases outlined for TDX. As shown in the picture on the right hand side with Intel TDX, the host software that hosts TDVMs may also run legacy VMs on the same platform. So as I mentioned earlier, we have covered the baseline architecture in a in a past talk, so I won't dive into any more details on TDX. We will dive into the live migration next and I'll touch upon some of the baseline properties as needed in the threat mitigation discussion. So the goals for live migration architecture for TDX is to is to move a running TD between different physical machines that are TDX compatible. And the goal is to support live migration which means the execution of the workloads in those TDVMs that are being migrated should not be affected and therefore you know meet some of the requirements to ensure that network connections etc are kept alive. And so the related requirement here is that for Intel TDX live migration, the source and the destination platforms are both active and running during the handoff. And the expected use case scenarios for cloud environments are for service level agreement, meeting service level agreements, performing updates to the platform in terms of its firmware, host software, also other various capacity and load balancing properties. Some things are explicitly out of scope for the live migration architecture such as you know storing and capturing TD images where the destination platform for that snapshot restoration is unknown, and those sorts of use cases are out of scope of the live migration architecture. So that let's look at some of the key security properties first and then we'll touch upon the functional properties. So first security property is that a CSP that's performing a live migration of a TD should be able to migrate to a TD platform that meets the minimum requirements as expressed by the tenant and that is expressed through an explicitly measured migration policy. Another important property is that a TD should be migratable as evident to a tenant and that should be supported through the attributes of the TD that are reported in the attestation. Third is as part of live migration we still want to meet some of the base security objectives of TDX to keep the CSP host software outside the TCP. So a TD that's being migrated should not you know be cloneable in the sense that only the source or the destination TD must be executable or executing after the migration process completes. Also a CSP that's performing such a live migration on a TDVM should not be able to get the destination TDVM after migration executing with invalid or incomplete state from the source TDVM. So the the the integrity of transfer of the contents of the TD migration is the responsibility of the TDX live migration architecture. Lastly but not the least during the transport of the TDVM data the confidentiality integrity and replay protection of that data is a function of the TDX live migration architecture and is not reliant on the CSP infrastructure. With that security requirement that let's also look at some of the key functional requirements right the first functional requirement is live migration doesn't require the TD software or workload to be involved in any way even though they might the tenant software may may opt in effectively through through attestation so there is no explicit tenant software involved in the live migration process. We want to make sure functionally there's no performance impact when the TD migration is not is not active on a particular target TD. We also want to support software requirements such as post copy of TD during live migration so that you know once the required assets of the TD have been migrated the destination TD can start executing and post copy some of the data in. From a functional and performance requirement we want to support concurrent memory migration so that all physical resources cores available to the host can be used for live migration and there are a couple of things that are out of scope for live migration. The first one is the migration of shared memory remains the responsibility of the host VMM since that's really the one that manages shared memory for TD VMs as well and it is considered untrusted and similar to the TDX baseline architecture any denial of service you know attempted by the VMM or the host on TDs or on TD live migration is out of scope and is not prevented the VMM remains as the the the main resource manager on the the platform so with that let's look at some of the new components and and we look at the interfaces next right so on the left hand side you can see a source platform with with a with an example TD VM running and the first new component is is a migration TD so once the orchestration entity in the cloud has selected the source in this nation platform the the TCB of the TD remains the Intel TDX module and also includes a migration TD which effectively is a service TD it's purpose built for supporting TDX live migration and it includes very minimal services specifically the ability to set up the migration session and perform policy valuation the migration TD itself is not migratable and then the once the migration TD has verified the destination platform and has set up the a secure session between the source and destination TDX modules the host software is responsible for physically migrating the TD content but the TD content that's migrated is protected by the the TDX TCB components so we'll dive into the the interface points a little bit more and before we do that I wanted to sort of sort of expand on the migration TD a little bit so the migration TD is a new TCB component introduced it is in the TCB of the TDs that are migratable just like the TDX module it performs functions that are you know specific to migration or live migration like evaluating the migration policy performing code verification to to you know mutually authenticate the source and the destination platform as well as setting up the migration session key that is programmed to protect the data transfer between the source and the destination platform a migration TD is attached to a migratable TD or the tenant TD through an explicit bind operation and that allows the migration TD access to certain metadata of the target TDs that are migratable and it is the the the measurement of that bound migration TD is evident to the relying party that looks at the the code for a particular tenant TD and the same migration TD can be rebound to a target TD to re-establish a session if required so to support the migration TD's interactions with with the target TD the VMM can invoke a new set of interfaces provided by the Intel TDX module as part of live migration to bind the migration TD to the target TD the bind operation essentially is records the measurement of the migration TD into the you know the TD infrastructure maintained by the TDX module for the target TD so that on an eventual code generated for the report and code generated for the TD tenant that is migratable the fact that this particular migration TD is bound to that target TD is evident in the in the in the code generated by the platform there are a set of additional interfaces that are described during the the lifecycle of of live migration let me just quickly introduce these these instructions or interface methods here the first one is a migration stream create operation that is used to create a set of migration streams that are ASGCM protected streams between the source and destination platform supported by the migration session created through the migration TD once those migration streams have been instantiated a set of export and import intrinsics are provided the export intrinsic is to be used by the VMM on the source platform to export the the data of the live the target TD being live migrated and the import intrinsics are to be used on the destination platform to import that TD's data into the destination TD template that has been created to support live migration a set of new interfaces are also provided to the migration TD that is bound to a target TD to be able to read and write metadata associated with the target TD and this would be reading the migration policy for example or additional aspects of the migration policy as well as writing the migration session key spec you know negotiated by the migration TD as I mentioned earlier the migration TD that's bound to the target TD has the has its measurements included as part of the target TD's report and quote and that is achieved through the a modification of the the CM report intrinsic that is exposed to the TDX module and this also specifies the new attributes for the target TD to specify it as a migratable so before we dive into the life cycle of the the intrinsics and how they're used on a specific platform let's look at the cross-platform view between the source and the destination platform to understand how the migration session keys are are set up that support the rest of the the intrinsics so on the source platform I'm basically showing a TD that has been chosen to be migratable and that's that's running on that source platform I'm also showing a coding enclave that is used to support the attestation functions for Intel TDX and a similar destination platform chosen by the by the orchestration entity and I'm showing a placeholder TD here but the TD has has not yet been mean migrated so the the initial setup begins with the host invoking the seam call functions to bind the migration TD to the target TD and similarly bind the migration TD to the template of the destination target TD the next step is as the orchestration has introduced these two platforms the migration TDs on both the platforms can essentially request quotes for their for their posture right for for their attestation and proving their authenticity to the corresponding platform and that is achieved through essentially generating the the quotes through the existing TDX based code mechanism I'm also showing a code verification library in each of the migration TDs which will be used to verify the quotes for the corresponding platform that they are interacting with and that process is used to effectively create a secure channel and I'm showing you some examples of protocols that can support those kinds of operations and this creates a transport channel so that the migration TDs can communicate securely over an untrusted fabric provided by the cloud provider once that transport session is in place the migration TD can evaluate the the migration policy per the the migration policy included as part of the migration TD and if that those checks pass it can generate a migration session key and and pass the migration session key over to the destination platform right and lastly the migration TD can use some of the interfaces that I showed earlier to program that migration session key for the corresponding TD control state and that initializes the the session such that the host software can now start invoking the requisite export and import intrinsics which use this migration session key to protect the data as it is being transferred from the source platform to the destination platform so so with those pieces in place now let's look at the sort of the life cycle of the overall operation and I'll move faster through some of the phases of this since I've described some of the the baseline mechanisms earlier so the initial step I'm showing here is on the source platform where a TD is already executing or rather is being is being initialized a service TD or a migration TD may be bound to it as part of initialization once that source TD has been fully initialized note here that it has also been initialized with the migratable attribute set to one that TD may start running on the on the source platform similarly on the destination platform the existing TD intrinsics are used to instantiate a TD template and then similar to the source platform a migration TD is attached to the that TD template note that on the destination platform the TD is not yet executing the migration TD can start executing on the on the source and the destination platform and create a migration transport session as as I was described on the previous file and then based on that migration transport session verify the policy and negotiate the you know create a migration session key that is provided to the destination platform using those migration session keys now the host may create one or more migration streams that are supported by the the the migration session key and once that is complete the export operations intrinsics may start on the source platform so the host first initiates an export of the immutable state of the TD and this this pulls in and through a corresponding intrinsics to import that immutable state pulls in that non-modifiable configuration for the for the TD or to the destination platform once that is accepted the the the host VMM can proceed to essentially start pulling the memory contents while the source TD is executing this is called the in-order or the pre-copy stage of the live migration in the so this this flow of exporting memory from the source system through a migration stream and importing it on the destination platform may be repeated a number of times and if the host you know if the if the local memory on the source TD gets modified the the host is provided with additional intrinsics to essentially allow those rights to occur and and and issue additional epochs of memory migration for which there are supported intrinsics provided called export track and import track right with some additional restrictions enforced such that the the most precious copies of the pages for the source TD have been imported into the into the destination TD before the the migration is deemed to be successfully complete on the destination platform the blackout period for live migration starts by the host implementing executing a tdh.export.pause intrinsic at which point the source TD gets paused and the last few aspects of the memory state can be exported and imported on the destination platform as well as the last few you know pieces of the CPU state virtual CPU state and you know the the mutable state for the TD such as the runtime measurement registers extra can be imported the host can then signal a completion of that that epoch by the generation of a start token that is protected through the through the migration session and and each of these pieces of information that are transferred over to the destination platform whether it's tokens or memory state or CPU state are carried in the form of migration data bundles the destination platform can can accept that that token and if the tdx modular the destination platform accepts that token correctly which occurs after certain security conditions are checked such as you know has has all the you know modified memory been actually imported successfully on the on the destination platform so once those enforcement checks are are performed and the the destination system tdx module accepts the start token the destination td can start executing right and and additional memory may be transported to the destination platform through a post copy intrinsic supported by the tdx module on the destination platform on the other hand it may so happen that the destination platform may abort the the transaction because of some some failure condition and it can do that successfully by returning and abort token to the source platform which can be consumed by the source platform through an export dot abort intrinsic and if that's successful the source td can can become runnable again similarly an abort can also be initiated by the source platform by an explicit export dot abort intrinsic to to make the the source td runnable again in any case the tdx module enforces through these through these token exchanges that you know some of the security objectives that we described earlier are met in the sense that all this all the modified state has been moved over securely to the destination platform only after which the destination td can start running and and the source td becomes non-runnable so with that let's look at some of the threats and the mitigations so recall some of the security objectives that we covered earlier right and the first security attack we want to you know protect against is the confidentiality and integrity of the control state managed by the tdx module and these are the the new state that we're we're we're covering here is like the the the migration session and you know the context state for the migration stream counters etc that are maintained by the tdx module and those are essentially all protected through the the confidentiality into integrity properties of the tdx module itself so as I referred to our sort of past presentations on the tdx module you will recall that the tdx module operates in seam in a seam mode of operation of the cpu and a seam you know range register region that is protected against software accesses as well as protected against physical tamper with the use of its own fmrl key and that same protection model is used to protect this additional control state for the for the tdx module maintained for for the migration session key the second attack is on the confidentiality and integrity of the exported state itself as the host executes the intrinsics to export memory state or virtual cpu state and we also want to protect against you know spoofed state being being used for imports on the on the destination platform right and the main mitigations here are derived from the fact that the the export side on the export side these migration bundles that are created are confidentiality integrity and deeply protected using the migration session key which is managed completely by the tdx module as well as these the bundles that are exchanged are both type and direction enforced by the by the tdx module lastly as the bundles are type specific depending on the type of information that's transferred for example for memory state additional information or metadata is also integrity protected such as such as the expected guest physical to host physical mappings for in undersecure ept and the attributes of those mappings right similarly on the import state the tdx module enforces that the you know the decryption of the the integrity verification of the migration bundles happens in td private memory after which any state can be installed on the on the destination td similarly other other aspects of through the counter mechanisms used in the migration stream are used to ensure that the tdcannot destination td for the migrated td cannot execute until all the modified data has been imported the third security objective is access control of the migration tds assets itself so this is to protect against tamper of the migration td memory its measurement or how it's bound to the to the target td so as i mentioned earlier in the flows and in the in the life cycle the migration td can be maybe prebound to the target td but once before the target td is finalized that can happen and once the the migration td is bound to the target td its measurements are included in the in the code for the target td so that it's evident in the in the attestation mechanism furthermore the migration td is protected in terms of its execution just like any other tds so it has its own fml key for execution etc also the migration td does not directly interface with the target td the target td is not expected to be enlightened at all for migration and lastly the tdx module enforces the access control that only the bound migration td has access to the target td's metadata through the through the read write operations and so it and that happens completely within the tdx module to prevent any sort of interactions with host software the last security objective is integrity of the td migration policy so and there are two aspects here right is to is to protect the migration policy against tamper or tampering of the migration td code itself so the the code verification is used to mutually authenticate the migration td across the platforms to create this this protected transport session between the migration tds and that relies on the security of the code mechanism that we had covered in our in our past discussions the migration policy itself is measured as part of the migration td itself and maybe read and you know maybe additionally read as the td metadata that's controlled via via the interface provided by the tdx module and the migration td evaluation of the migration policy happens within the migration td memory which is confidentiality and integrity protected via its own fmrl key with that i would like to hand it over to uh Joan Nakajima to cover the software interactions with the host software like kvm thank you abhi i'd like to talk about software implication on the kvm including qmu and we believe that it's straightforward to add tdx live migration to those vmms because live migration is mostly driven by user space vmms and the architecture allows to use existing model implementation as much as possible and of course we can use the existing code for the shared memory so the reason i'm talking about shared memory is even tdx guest need to have a shared memory for for example i o so typically tdx guest has both private memory and private memory and then shared memory okay and very at the high level what we need to do is add new io control operation for the user space vmm to transfer the state and private memory this is a summary of what kvm needs to do when enabling tdx live migration i'll show more details in the next slide also waste presentation at the kvm forum has other details so please look at that presentation if you're interested in those next i'll talk about how kvm and then qmu use those same calls we showed this before and then i know it's a busy and then complex want to point out the flow is this flow is driven by user space pmm and in fact logically this model is equivalent to the live migration of the legacy vm except this pre-migration phase okay for example in a reservation we need to make sure the destination has sufficient resources like memory and a cpu to run the new td and i also need to make a reservation and sending the immutable state in advance then we start a relative play coffee phase during this period the guest td is running on the source modifying memory i have more details on the next slide basically we send the modified pages from the source to destination as we repeat we come to the point where we can stop the the td on the source side so that we can copy the rest of the state in a one shot to minimize the downtime then we commit some live migration implementation uh use this is called a pasta copy but i don't discuss that today tdxline migration was designed with pasta copy in mind as well now take a look at more details on the iterated pre-copy phase i use qmeu as an example of a user space vmms here and in this phase like i said the td guest is modifying the memory of course it's running on the source side and the kbm needs to keep track of the dirty pages for legacy vmms kbm write protects the guest pages to that end for example using ept for private pages it uses td export block w and to undo we need to to stop blocking uh we do unblock okay and by doing this kbm can do the same thing basically write protect the guest page for private memory so each iteration qmeu first gets the dirty log which basically the bitmap of the dirty pages then construct a request for export mem then do the IO control and then kbm does a same call against the tdx module then tdx module copy the encrypted pages to the migration bundle note that for legacy vmms the qmeu maps the guest memory so it doesn't need to do that but for the private memory what the qmeu needs to do is just replace that just read you know with the IO control operation export memory okay once that the migration bundle bundle is ready then qmeu on the source transport the package to the destination or target side and then the target side the qmeu we have a code for the qmeu and then does the basically the opposite same basically import here from the migration bundle and there it does uh you it does IO control against kbm and then kbm again use a same call to do import and at the end of a epoch iteration source and then the destination does the the track which basically means the in-order export import phase epoch and also it means either you start new epoch or start the out-of-order export import phase here i'll talk about skater vtm and fcnc considerations the architecture allows the vmm to implement skatable line migration what i mean is the we can use more cpu's and stream to transfer migration bundle as you need it uh if you look at tdh export import mem instruction or same call it takes the td's tdr trusted domain route as our upland in parameter so other cpu's can participate in migration bundle transferring that way we can implement skatable line migration for tdx also we should avoid copying a lot copying migration bundles between the kernel and the user space let me go back to the previous page what to show what i mean so when the qmu want to want the migration bundles doing iocontrol export mem then the data or migrated data should be placed into the area that the qmu can access otherwise we need to copy from kernel to user space so that's that should be the case even on the destination side so that way we can minimize the copying with that uh i'd like to take questions thank you