 Ok, hi, I'm Jarkus Akkinen. I'm a software engineer at Profi, and I used to work for Intel for quite many years, mainly on SCX and all kinds of TE stuff, and I'm also like the maintainer for SCX part of the X863, so I thought that it's now a good time to give like some kind of status of how that work is. It's going since it got into upstream during the Covid period. So, and SCX is different kind of beast than SMP and TDX, because they are like virtual machine based TEs, whereas SCX is kind of in the league of its own, because it's just based on a virtual address range. So what is confinals and computing? I found this article that had like really good definition for it. First of all, it's about isolation, so that anything outside the container cannot tamper or read the data, and on the other hand it needs like to be attested by a third party, usually like the CPU vendor in this case. In SCX case Intel, and how things have been going on is that like in the 90s when web game we had confidential computing because you were running your own server, and when we moved to the cloud we kind of moved away from confidential computing. So, and especially like in the public sector and in government sector and in military, you still need to actually build secure unclays by building like separate data centers because of some requirements to process some data. So I think the confidence of computing was like born from this need that we need something in between building like separate physical data center and putting things straight to the cloud. It's of course not as secure as physical security, but it's still a lot more secure than just trusting to the cloud vendor, and so in lots of cases you could consider it instead of having actual physical security. In SCX, you basically define an address range for the enclave or the container, and it has its own database for page permissions and stuff like that called enclave page cache map. So because confidential container cannot trust the host in anything, so it actually has to have its own database of permissions. With like normal PT permissions you can of course restrict further, but you cannot like any way extend the permissions defined by the enclave page cache map. And during the booth, the firmware research like ranges of physical memory for enclave pages and these pages are encrypted when they leave from the CPU package. And also CPU does access checks for those pages so that if you are inside the enclave, unless you are inside the enclave, you cannot say CPU will draw like general protection fault exception if you try to access any of those pages. And there an enclave has like one page called SCX enclave control structure that defines like all the properties of the enclave such as its address range, it's like measurement checksum and stuff like that. And it has a threat control structure basis that defines the entry points where you can enter the enclave by using eenter instruction. But only through TCS entry points you can enter to the enclave mode and before you can enter the data is measured and the final checksum is used as part of the identity when the enclave is running. And then for East Asia space there's also this like a separate stack called state safe area that is used to store the register state when the CPU leaves from the enclave. And anything basically that can cause a far jump out of enclave, any instruction such as syscall or anything causes a synchronous exit from it and the state is stored to the state safe area. And so in that enclave is like basically cannot do anything except calculations and for anything that requires like a syscall or anything it has to consult like the host operating system. This picture tries to describe the current attestation seam. There's an instruction called e-report that basically creates a cryptographic report with CMAC checksum of the enclave properties. And then the application can ask Intel site quoting enclave to sign that with the attestation key, which is included inside the quoting enclave and is changed for each quoting enclave release. And the quoting enclave asks the provision certificate enclave to sign the code with the provision certificate key that is generated from the fused seed in each CPU. And Intel has their cloud service which has certificate for every single CPU for the PCK so that way we can like build full chain from the enclave to the CPU so that you can prove that application is running inside like genuine Intel CPU. The reason for splitting the provision certificate enclave and quoting enclave to a separate enclave is that you want like a minimum other surface for the functionality that uses the fused data. So if the attestation key is compromised we can just always like revoke and Intel can release like a new quoting enclave. So here's the sequence in practice these all like explode the bunch of protoboof messages. So first thing that application does is to ask the attestation key ID for ECDSA key because the ASMD daemon does have like a super for multiple quote key algorithms but at the moment only ECDSA is used. And then it asks for target info for the quoting enclave which is kind of cryptographic, it cryptographically identifies the quoting enclave and application enclave can pass this to e-report. So that the e-report will assign the base on this target info will assign the report with the report key of the quoting enclave and then the application can pass the attestation key ID and the report back to the quoting enclave and then the quoting enclave can de-group the report with its, sorry, check the, it's not encrypted, it's signed, so check the signal of the report with its report key. So the PCK is, by practical means generated from these two last keys that are generated from fused CPU data and various like enclave specific properties. So only PCE can generate the PCK that is valid for the certificate provided by Intel and this report key is used in that e-report instruction CPU will find the report key for the quoting enclave based on the target info and also internally the quoting enclave will use this e-kit key instruction to check the CMAC, check some. My work with SCX started with Skylake and it was like a complete failure because at Skylake time SCX was like, it didn't have the current attestation scheme yet. It had basically like EBIT based attestation scheme where Intel, if you wanted to build an enclave, Intel had to license, you would have to license, buy a license from Intel for each enclave and it wasn't like a great success in open source community. So Gemini Lake was the first platform that changed how the whole thing work and provide this flexible launch control so that you can have basically like sign your own enclaves and that was also the initiation for the patch series. It took like quite a long time over three years but finally in the end of year 2020 we got the whole thing in the upstream and as of today, even though we gave a lot of time for competitors to catch up, it's still like the only, only confidential computing technology that is fully in the mainline killer, both the guest and host side. Well, it doesn't have guest and host side because it's not based on the VM concept in the first place. And this is the post upstream timeline. The first release was in February 2021 and in June 2021 we got KVM and Numa support in the one profit of using SCX compared both to S&P and TDX is that because it's not based on the VM concept, you can use it also in the nested virtualization so it gives some flexibility in that sense. And in 5.17 the machine check exception recovery was included so that Poissonnet pages are removed from the usable pages for enclaves and also the enclaves using those Poissonnet pages are killed and in 6.0 we will have life for the first sign support for some kind of accounting because internally their SCX has capability to swap enclaves pages to the regular memory and we use like a private shared memory file for that but so far it hasn't been accounted for the processes so in 6.0 we will have the pages reserved from the shared memory for private shared memory will be accounted for the memory control group where the process lives and also 6.0 will have support for SCX2 or enclave dynamic memory management which basically means that before 6.0 you could add pages to the enclave before initialization but there was no capability to dynamically allocate pages after that so it has been kind of a bit limited so far for workloads and the only future feature that I am aware of currently is this AEX Notify which is basically a countermeasure for this vector of attacks where you reprogram the APIC type. It is a programmer to generate inputs and time how long the instruction stage and so you can predict what the enclave is calculating but more on that later so these are the eye-yachtles that you use before you run the enclave I actually started to think that when I saw the SMP presentation that maybe once the SMP is in the upstream and TDX is in the upstream we could possibly have like a common interface just for the part that builds the initial digest for whatever confidential workload we will have the create creates like the SCX enclave control structure internally and the add pages basically like add pages to the enclave memory space and updates to digest and init like puts finalize the digest and after that you can enter to the enclave so it's pretty similar to other technologies and the feature that comes in 6.0 is called in the SDM is called SCX2 but in some other context it's called enclave dynamic memory management and it provides tools to add new pages, change the page type of existing pages and such and so forth so for addition there's this privilege instruction code in CLS that is used for all privilege stuff and it has a leaf function called EAUC that basically adds a new page to the enclave and in the current implementation that's if you dereference an address inside enclave address space that doesn't have a page table entry the page fold handler will call EAUC for you so we don't have like a separate yoctel for that and so for each of these yoctels and EAUC how the game goes is that the enclave they put like a request on and the enclave have to acknowledge the request for each page with either E-accept or E-accept copy instruction so that host cannot make modifications for even like a single page without enclave acknowledging that for E-accept like let's say an EAUC case if the kernel calls EAUC and the enclave calls E-accept the page will receive right permissions with the E-accept copy the enclave will provide some other page inside enclave that will copy the data to the augmented page and also with E-accept copy you can specify permissions for that page for permissions we have there's a privilege instruction for restricting permissions and then there is which is missing from this there is this extent instruction that the enclave can call the extent of permissions the reason why there's not simply like one instruction for like both restricting and extending the permissions is that extending does not extend extending kind of implicit invalidates all the TLB entries for restricting that could kind of leave like invalid TLB entries that would lead to the security issue so when you restrict permissions you have to do first use the E-block instruction which basically denies creating new TLB entries to PTE E-track instruction that kind of starts a new this suit down sequence so that it creates a counter for existing hardware dressed inside the enclave and increases this epoch number the pages that have the E-block called have like the previous epoch number and each time a thread leaves from the CPU this counter is decreased so once the counter is zero we know that it's safe to modify the page property such as restricting permissions or changing the page type so how we do it in the kernel is that we first call E-block and then we call E-track and then we just try if the pages might or the thread might have left by themselves before calling like let's say EMOTP EMOTP will retune like error code if there's still threads from the previous epoch in the CPU and at that point we like put like API exception to those cores that have still like threads from previous epochs and then we retweet the modification call but yeah for all of this basically if let's say that you want to change the page type from like normal data page to TCS so you want to make like a new hardware thread entry point to the enclave in the E-appset call you specify the page type that you accept the host to provide for you and if it doesn't match to the exception that will cause like a CPU exception so that's why we need this protocol where all of these changes done by the kernel to the enclave are for each page need to be accepted with either E-accept copy or E-accept, the difference between E-accept and E-accept copy is that for E-accept you always assume implicit like read write permissions and you don't get to copy any data for E-accept copy is different in the sense that you can specify the permissions and copy the data when you augment a new page immediately so you don't have to go like through the steps that pages first with the read write permissions and then with the legit permissions. KVM support is based on party discerning, their support to do also over subscribing for virtualization but it's we come in the conclusion that doing over subscribing is just too complicated to be useful so basically what this SCX VEPC device allows is to partition the available EPC between the virtual machines and later on we added this one Yachtal for QMU which basically like keeps the allocations but resets their states for when something bad happens in the QMU, this was like Paolo's record last spring so that they don't have to, if something goes wrong so they don't have to like close the file and open it again so it's just like a performance improvement so there are three ways to execute the enclave, the first one is simply use the ENTER instruction but for which you specify first of all the TCS entry and to easy actually specify this asynchronous exit point that will be used when let's say when you have like a page fault the CPU will put the in enclave resister state to the save state area of the TCS and then it will like generate register states that IRAT will eventually return to the address specified by ECX and that's called asynchronous exit point but there's also when you just want to leave the enclave there's also E-exit instructions and from asynchronous exit point if you want to return to the same location where you started you just call E-resume because when the thread exits through a synchronously the CPU fill the registers already so that the EBX to the TCS entry point so that it will basically fill the registers so that E-resume will work so that EBX will contain the TCS entry point and ECX will contain the asynchronous exit point so the problem with this approach is of course that let's say that you do a syscall you need the only way to catch such things like with this raw instruction is to have a signal handler and that might be like in other use or and that cannot be like you cannot have like a signal handler for each thread or it's not designed for anything like that and there's like million different reasons why there's like a bunch of instruction and mostly like instruction that can cause jump out of the enclave address space that will with the default kernel behavior basically go 6F signal to happen and so for that reason we created a VDS so that will wrap all these E-enter instructions stuff like that and when like let's say you get undefined instruction exception it will fill this data structure called SCX enclave run that will contain all the exception information and it's almost like it's not a function call but for the most part it kind of is compatible with the sysveh calling convention and I will show that the data structure so it will contain the TCS entry point what you know in the case of AEX this function will either contain E-enter or E-resume based on how you entered last time the enclave or if you exit from the enclave it will contain that and then there is basically just standard exception information and specifically for I haven't used this part a lot but specifically for Intel SDK we also provided a way to provide a callback handler but usually you just don't use this last part but instead just take your next action based on what this returns so the callback is completely optional so the way you use it is that you call the VDS also for the TCS entry point and then you just if you get undefined instruction you execute that syscall for the enclave and such and so forth so with this infrastructure you in poor purpose use this let's say a syscall instruction inside the enclave and let the exception happen and with this information you know exactly what happened inside the enclave and then the OS can work as a delegate or the runtime can work as a delegate for the enclave so the problem with the default E-resume behavior is that if you have like AEX handler that is based on E-resume you can fairly accurately time how long its instruction takes inside the enclave by adjusting the AP IC timer rate like a suitable value so the counter measure is a new attribute for enclaves has been added called AEX notify and it changes the E-resume instruction behavior in a way that instead of popping the previous register state from safe state area and returning where the execution bus it will go to the entry point specified inside TCS and then the enclave can use this e-dick cssa instruction return back to the point where the execution bus so that kind of counter measures the timing attacks because then you are just looping in the entry point you don't get like any information but it's going on inside the enclave by that method okay well these are the run times that I'm aware there's probably like a bunch of others but the first is Intel SCX SDK which is very archaic it's there because AM SD and PC CS is this daemon that runs the quoting enclave and provision certification enclave and PC CS is just this daemon that downloads certificate for this provision certification keys from Intel cloud it's it's it's very programmer unfriendly you don't want to use for anything and but I think like the real options are the graphene which is basically like a syscall theme so you can put like any like normal linux executable and it will it will provide the necessary syscall theme that when the when the program calls a syscall it will execute the syscall for the program nrks that we are developing as profian is basically like container or microservice run time based on webassembly so you can compile basically any like program in any language like like even in Java or Python to a webassembly program and host it with nrks and in addition to the SCX it also supports AMD SMP and in future we are planning to add support for ARM v9 realms and possibly some other hardware back as they come but but at the moment at the moment our main targets are SCX and SMP for SMP we basically create like minimal VM to host webassembly run times without any specific operating system oh where is that there should be the one slide is missing but now I switch to Roman he will show the nrks demo thank you Jarko hi my name is Roman I'm an employee at profian and one of the core developers of the nrks open source project I've previously worked at docker and I've also had spent some time building large scale distributed lp1 networks yeah so today I would like to show you how we utilize sgx at the station to secure workload communications and establish trust relationship and just making tees simpler to use for application developers so I'm going to use a demo server which is actually just a web app and it lets us deploy webassembly into real tees real hardware via nrks so I'll deploy a workload from the drawbridge which is essentially a registry of webassembly workloads and yeah so let me just find my cursor so this particular workload is going to be just a web app which inside the web assembly is going to just do plain text tcp communication but the nrks platform is going to provide it with tls so what happened right now actually is that the nrks platform well let me open the link here for now it's just a proof of concept so that's why we have to accept the insecure cert but so what happened here is that when nrks platform started it connected the sgx processor and did the measurement of the firmware of sgx report and also measured the nrks platform itself as well as the workload which is being loaded into itself and then it sent all this data to that station service which then verified the platform, the nrks hash, all the measurements and on successful verification of that it issued a certificate to this workload so it means that the workload itself just does plain text but the nrks platform provides the tls and transparently can upgrade the connection from plain text to tls apart from that the certificate contains data about the identity of the workload so if you had say 2, 3 and workloads they could communicate to each other and establish trust relationships given the data in the certificate and that all is transparent for application developers so I can also show you the cert again for now it's a proof of concept but we can look at it and we are working on adding actual hsm's to improve the situation so you can see here that this certificate was issued by RCA currently and this is the unique certificate issued for this particular workload and this particular CPU and this particular nrks instance so every workload on each different CPU and different machine will get a different certificate so apart from that the last thing I think I'm going to show you here is how we can actually do exactly the same thing on s&p so I'm going to run exactly the same workload, completely unmodified binary, same version, same workload on s&p, sorry that's a different workload but it takes some time of course because it has to send computer reports and to the station service to get it back again I'm going to go over the link except and you will see exactly the same workload so by the way this is free to use for everyone so if you want to try it out feel free and yeah that's a good demo if you have any questions then please let me know