 My name is Brijesh Singh. I work at AMD Linux Corona Team. Today I'm here to talk about AMD's new confidential computing technology called Secure Nested Paging, or SNP. SNP is the latest generation of AMD SCV technology designed for the confidential computing. If you have been following AMD's work, this is not our first foray. In past, we introduced SCV and then we introduced SCVES, which provides the encrypted state support. SNP builds upon existing SCV and SCVES feature and provides much stronger security guarantees. SCV and SCVES support is available in the first and second generation of the AMD APIC processor, whereas SNP support is starting available from third generation of APIC processor, which is announced this year. SNP is designed to protect the VM from a malicious hypervisor in a very specific way. It is especially useful in the cloud settings where a user may not stress the hosting environment or hosting environment does not have any interest in looking into user guest data. So just to recap, SCVES provides memory confidentiality. SCVES added registry state confidentiality, whereas SNP adds a new layer of protection through integrity. SNP is designed to protect the VM in a specific way. So from threat model point of view, it's a confidentiality is the first one. It prevents hypervisor from reading the guest data. It prevents in a way that all the guest data or guest private data is encrypted with the AES120A. In addition to the confidentiality, the new thing in SNP is the integrity. It prevents hypervisor from modifying or replaying guest data. So the way I look at the integrity is it implicitly says that if a virtual machine is able to read the data, then it should read the data whatever it last wrote. And that's the guarantee which SNP starts providing. SNP also provides protection against offline attacks, such as a cold boot, et cetera. It also provides supports against malicious interrupt injection. In SNP, there is also a new protection against lying about the hardware capabilities through CPU ID. So the idea over here is that guest owner can provide whatever the CPU ID values they want to expose to the guest. And the capability actually gets filtered through a PSP firmware so that the hypervisor cannot lie about it. There are certain things which SNP does not support. So first most is the availability. Hypervisor still retains the full control about the scheduling and allocation. What it essentially means is that a malicious guest will not able to create a denial of service attack on a system. Then some advanced physical attacks are not supported, such as voltage or database type of attacks and certain side channel attacks such as the Prime and Pro. The security guarantee of SNP is enforced through the combination of hardware as well as the guest software. So how we do this enforcement of the integrity? The way we do it is we have created this new big structure called a reverse map table RMP. There is one RMP entry for the entire system. It is created by the software during a boot. The basic property of RMP is it contains one entry of every 4K of assignable memory. RMP is indexed using a system physical address and entry may be manipulated by a special X86 new instructions. Those instructions are RMP, update and etc. which I'll cover a little bit later. RMP entry basically indicates the page ownership that who has the right capability. So for example, if a page is assigned to the guest, then only guest has the right capability to that page. Similarly, if a page is assigned to the hypervisor, then that page cannot be used as a guest private page. Or if a page is given to the firmware, then X86 software will not be able to make any modification or write to it. So the way RMP checks are enforced is during the CPU page table walk. In case of a native page table walk, once the virtual address gets translated to the physical address, then physical address is taken as an index in the RMP table. And the hardware is going to read the value from RMP table to determine the rightability of the particular software. If it sees that the RMP entry says that the page should not be writable, then it generates a page fault, RPF. The things gets a little bit tricky in case of a virtualized machine because there is two page table, right? One is a guest page table and then a nested page table. So in this particular case, hardware does the walk for first guest page table then NPT. And once it reaches to the, once it reaches to the physical address by walking the nested page table, then uses that physical address to find the RMP entry for that particular page. In addition to a typical, in addition to typical writability, it also checks the GPA. One of the entry inside the RMP table is the GPA where this particular page should be from a guest point of view. If these checks do not pass, then hardware is going to rage in PF. We will look a little bit more detail in data slide about those NPFs. So the next one is how we deal with RMP fault if it happens on the host machine. So this strategy over here is we try to avoid having an RMP violation or RMP fault as much as possible. So the first thing we do is when the page gets added in the RMP table, it actually gets unmapped from the direct mapping. So the kernel addresses, kernel should not be actually reaching out to that particular page. And if user space application on the host side attempts to write to a guest private memory, then he will get a sick bus. Things are a little bit interesting when it comes to the backing page support. So for example, if VMM allocates a guest memory and if the guest memory is allocated from a two meg entry from a large page, then while RMP checks are performed, one of the things which hardware does it, actually checks, whether it checks if all the page within that particular walk are in the same state. So what essentially that means is if a memory is accessed using a large page, then all the sub pages within that large page should have exactly same page state. One of the example over here, what I'm trying to show is that let's say for example, we create a mapping for a particular virtual address using two meg. And if you try to access that page, if one of the page within that two meg is a private, then hardware will raise a page fault. And the way we resolve that particular page fault is by splitting the page to multiple of four. And then just resume the guest. In case of virtual machine, if hardware encounters the RMP check failure, then it's going to raise NPF. And there are few bits, new bits added to the NPF in order to help us resolve those particular fault. One of the bit is basically RMP which tells it is RMP fault. Then there is a new bit called INC which actually tells when guest was trying to access this particular page, what's the CBIT it was using. And then there is a new bit for size mismatch here. For example, if the nested page table says memory is two meg and hardware on basically guest OS is trying to access that or trying to be validated that particular memory as a 4K, then hardware is going to generate a nested page fault with the size mismatch error. In a summary, the resolution of the NPF is produced straightforward. If we see a data write, and if the data write is for the C equal to one, then check the RMP table. If there is no entry in the RMP table, then add that particular page as a guest, private page. Just because we added the page as the guest private page does not mean that guest bill is still able to use that page as a private. There is another process which guest needs to take in order to complete the page state transition. Similarly, if we see guest is attempting to access a page with C equal to zero and if the page is not private in the RMP table, then we go and make it as a shared page. So, during a CVES development, one of the things which we developed, which we worked on is the GSEV specification. Guest host communication block specification. The idea behind this specification was that all other hypervisors will implement this standard and they will able to run any guest which supports the ESV specification. The reason why this specification was developed is that in case of the CVES, the registry states are encrypted. So, if let's say, guest makes of instruction, let's say, a CPU ID. If he makes the CPU ID instruction, then some values need to be, some register state information need to be passed to the hypervisor in order for him to assist it further. So, those information get passed through this are defined in the GSEV specification and then guest follows the specification to get the data to the hypervisor, hypervisor reads the data from the GSEV page and what not. That since SNP builds on top of ES, it makes use of the specification. But there are few things which SNP adds, those were not present in the specification. So, we enhance the specification with a few new mode VMG exit. One of the VMG, one of the most important is the page state change. So, the idea over here is that if guest wants to access a page as a private, then it will issue this page state transition of RPSC VMG exit to make a request. Similarly, there are other VMG exits added over here. You can take a look at it in the slide. I'm not going to go through each of them in detail. Another one the important one is the guest message request. So, one of the new thing which SNP adds is have guest capability to talk to the PSP. So, in order for it to talk to the PSP, there is a new VMG exit called guest message request. A guest can issue this VMG exit and hypervisor will pass down the information whatever it has received from a guest to the PSP and let PSP execute and once the PSP execution completes then it comes back. Another important one is the AP creation. Unlike the PSP, in SNP guest VM can create APs. And in order to facilitate that you don't, in order to facilitate that you use the AP creation VMG exit. So, as I was saying, one of the important part is that in this entire flow is the page validation right. We get the fault saying okay, page is not added in the RMP table. You add the page in RMP table. Now the guest need to take ownership and the way it does it by page validation. So, typically then page validation is two-step process. The first step is guest should issue a PSP, page state transition request to the hypervisor so that page gets added in the RMP table and once the page is added in the RMP table, then guest can issue a p-validate to validate that particular page. When guest is issuing the page state transition, it can actually batch multiple of the page state request in one operation. So, as you can see on the right hand side there is a structure which tells you we can batch up to 253 entries in one VMG exit operation. So this minimized number of VMG exits which we have to take in order to do the page state changes. In case of the PSC entry, there is this field called page size. Page size basically a zero means a 4K page one means a two-meg page. So, with the 253 entries we can cover up to 500 megabytes of data in one VMG exit. As you can see the page validation is two-step process. It's going to require the VMG exit then it's going to require the p-validate instruction. It can take a little longer depending upon the size of the guest. There are multiple approaches we can take to validate the page to validate the page basically validate all the system RAM. So, one of the approach is that we validate all these memory in the guest bios. So, before the virtual machine gets launched there are certain page which gets pre-validated. What I mean by that those pages gets pre-validated is as a part of a guest launch there is a command called launch update which actually puts the OVMF or guest bias in the guest memory space. As the data is getting put in the guest memory space those pages gets validated. So that's what this picture is trying to show is that before the first VM run is completed there are certain pages in the guest memory which is already validated. And once system boots OVMF takes over, OVMF in the PIE phase very first thing it does it actually detects the entire memory and goes through its validation cycle and validates all the pages. When it's validating it actually request the highest possible page size which is two meg so that we can minimize the number of PNG exit and number of pre-validated instructions which we have to issue. Another way to do this validation is actually do lazy validation. Lazy basically you do validation on demand and this approach which we could do is OVMF our guest bias can validate only the pages which it needs during its education and then there is a new memory type called unaccepted memory type introduced in the EFI spec and then OVMF our guest bias can build those EFI memory type and pass it down to the guest OS and guest OS can just validate the pages which was not previously validated by the BIOS. So this is a more of the future looking we will be looking at this adding the support after the base S&P is enabled. So now let's come back to the launch process how's the typical launch works so the very first thing we do is host OS initialize the AMD SEV firmware as a part of an installation basically a new random VM encryption key is created and that key gets loaded in the memory controller then host OS calls a bunch of command to put the initial image so in this particular case initial image is actually OVMF image or guest BIOS image it puts those image in the guest memory space and it starts running a typical flow is basically if I start mapping this up in the PSP and sequence then there is command called guest context create activate start update and finish so there are updates the command which is used for adding the pages in the guest memory space I will cover a little bit more in the next slide about the updates so there are multiple types of the multiple types when you are calling a launch update so one of the first one is the page type normal it means it is just a normal data page or instruction page which hypervisor wants to put into the guest memory space once he calls this then all the contents of that particular page gets measured then you have a VMSA which is typical register page I don't want to go through all this just take a time but one of the important one I want to cover is the page type CPU ID so the idea about this is very special page it can contain the CPU ID values which hypervisor wants to give to the guest OS the idea is the PSP actually reads the value from the CPU ID page filters out any values which he sees as security issues are if hardware is trying to lie out lying about certain capability which is not available then PSP is going to filter those out and if there is an error then PSP will actually come back saying hey this entry doesn't look right so in order to support the VM launch flow there are few new commands has been added in the KVM eye octal and so as you want to go through all those things in detail but here is typical commands which are added and for KVM we have added a new object to create an SNP guest this object is part of our RFC 2 will evolve it based on the community feedback and there are few new commands added on the host side so one of them is very standard it's called SNP platform status host OS can call this command to query the cv firmware and all those other version informations and then there is something called SNP get and set config these commands can be used to set the system wide system wide tcv version which will be reported during the attestation flow so one of the big difference between the cv and SNP is when to query the attestation report so if you remember in case of a cv or a cv es attestation report was generated before VM was booted so there was this command called launch major which used to get the attestation report but with SNP that model is changed in SNP guest OS can request attestation report at any time on multiple time so there are there is a new driver added call a cv guest driver that provides a few iachtas one of these iachtas is get report which typically gets the report and while you are getting the report you can provide some user supply data and stuff you can see those all details in the specification so where we are right now right so this is just the status of where the cv support is so a cv got landed in 4.15 a cv es is landed in 5.10 pretty much both the supports are already accepted upstream thing which is right now we are working is the live migration support so there are two type of live migration which is being currently discussed for for a cv one is a slow migration so slow migration is basically the pages gets migrated through the psp and psp is not one of the fastest processor so the encryption and decryption is a little bit slower that's why we call it slow migration then there is another approach where live we call fast migration in the fast migration there is a helper which can run inside the virtual machine and that can assist the migration some of this discussion is actually going already upstream so if you have any feedback please please provide those feedback or participate in those discussion so just on svsnp where we are right so guest and host kernel patches have gone through multiple reviews so right now they are at the version number 5 guest patches and hypervisor patches both are posted upstream and for ovmf we are at the version number 6 for qmu we have posted recently the rfc v2 all these patches is also available now in our staging branch so if somebody has epic 3 processor then you can use our staging area to fetch all these and run yourself so what is supported in the current patch set so in the current patch set basically supports a basic svsnp guest launch it provides most of the features provided by the hardware except there are few which I will cover a little later so one of the features which of course it does is provides the attestation report driver so you can query the attestation and it also has support for this filtered cpyd value where hardware can psp firmware can filter out the values and it also supports allocating the backing pages either from a 4k zone or from a large zone which is basically thp and stuff and so as I said in the current implementation guest bios is the one which validates the entire entire guest ram so guest os doesn't need to do any validation it only need to if he is trying to make a page as a shared page then he may need to make use of those psc and pvalidate to unvalidate the pages but it doesn't need to do any of those validations yet and it also has support for creating a virtual cpu so we can create as many number of virtual cpu so what are the things we are working on are right now our focus are there is a lot of work already going on some of them is being done by google and susha regarding the kvm unit test case enhancement so we are working on adding the kvm unit test for a cv a cv es and snp this work is already in progress and we are also looking at enhancing the case self test to have the cv and cv snp and es supports additionally we are also looking into adding the cv support in the avocado framework we have been trying to implement those things and what is not supported is the restricted interrupt injection so this feature is not yet supported in the current snp patch set which we will be working after the base is enabled and as lazy validation basically is not supported we will be looking into adding the lazy validation support in near future and k exit support inside the guest is not supported this is mainly because this k exit support will require the lazy validation thing first because one of the things which we want to avoid during the validation cycle is to double validate a particular page so we need some method of tracking which pages has been validated and not validated before we start implementing the k exit support live migration support is still not implemented we will start implementing the live migration support very soon and one of the one of the limitation right now we have is that we snp patches do not support the memory backing memory from huge tlb this is mainly because there is no way to split the two pages from coming from the huge tlb in case of snp there are some pages which need to be get is spreaded for example if a guest allocates one shared page or basically makes one shared page within a large page then the only way to do that is to spread the pages and right now huge tlb validation support is spreading so we will be looking into that area a little later and another thing which is not yet supported is the virtual tpm support so there is a specification posted by Microsoft on snp mailing list is called SVSM where Microsoft is proposing to make use of the vmpl a level offered by the snp in order to support the virtual tpm so these are the areas which we will be looking into after the base snp support lands in that's all i have that's all i have thank you so much for your time