 Hello, everyone. Welcome to abstracting TE silicon implementations with shims. I am Nathaniel McCollum. I am the CTO of a new startup called Profian. And joining me today is Harold Hoyer, who is our distinguished engineer. Harold, please introduce yourself. Hello, my name is Harold Hoyer. I'm the Distinguished Engineer at Profian. And I work mostly on AMGs as we as NMP technologies, shims. Great. Thanks, Harold. So the project we're going to be talking about today is the NRX project. And although we're going to give you a little window into what the project does and how it works, if you would like to get more information, the best place to find that is at the URL that you see here, which is nrx.dev. That's E-N-A-R-X.dev. So let's move ahead to the next slide. Let me give you a brief overview of the NRX project. So we are a confidential computing platform that uses WebAssembly in order to deploy workloads in the cloud on top of different TE implementations. This means that you can take a normal workload compiled in C, C++, Rust, Go, Java, etc. You can compile that to WebAssembly and then you can deploy that WebAssembly binary into whatever TE that NRX has support for. The NRX project itself is entirely open source. So all of our code is licensed under the Apache 2.0 license. And we want to be very clear that we take as a strategy that we do not trust the host. What we do trust, however, needs to be defined. The first thing that we trust is we trust the CPU and the firmware. And we do this based upon a hardware root of trust in the hardware. It's not theoretically possible to validate a CPU. So this is a leap of faith. However, once we have the root of trust established, everything from that point on is cryptographically verified. So what else do you trust in the system? Besides the CPU and the CPU's firmware, you would also trust the open source NRX runtime. However, one thing that makes us a bit different than other platforms is that you do not actually have to trust a binary that we provide. You can provide your own NRX binary. And you'll see how we accomplish that in a moment. This means that 100% of the runtime that's within a keep, which is what we call our execution environment, is provided by you, the tenant. And you get cryptographic verification from the hardware using that root of trust that everything that you have deployed is exactly what you want. The entire project itself is written in Rust plus some assembly language so that we can issue the right instructions for the hardware. And the NRX project is now supported by a new startup called Prophean, of which I'm the CTO. And so first, we're going to start off by identifying what some of our problems sets are. And there's two basic categories here. The first one is the contrast between how these TEEs are implemented. There are sort of two different types. One set is encrypted virtual machines, and the other is process-based TEEs. We'll go over those in a moment. And then the second is we're going to talk about our execution and security principles. So the first thing that we want to talk about is the difference between how the different technologies work at the silicon level. And these really fall into two different categories. So on the one hand, we have things that are process-based. And the way that these actually work is that it allows you to use an API, usually for the kernel, a set of Ioptals in the case of SGX, to carve out a region of memory within a process. And that region of memory is encrypted and has integrity predictions, depending on the technology that's involved. It also will sometimes deny any access to those pages. So this is essentially the process-based TEEs. On the other hand, we have VM-based TEEs. And VM-based TEEs are roughly what you would expect. You have a virtual machine, and inside of that virtual machine, you have all of the pages encrypted unless you request otherwise. And the way that virtual machines work for those who aren't aware is basically the hardware creates a secondary set of page table mappings. So within a process, you can create this virtual machine that has its own page tables. And then, like I said, the encryption of this is controlled usually by the guest. But the problem that we have is we want to present a single execution environment to all of these different platforms, regardless of how different they are. Because there are different ones that work in different contexts. And we want there to be a single way to execute code inside of TEEs on all of these different platforms. And these really are quite different hardware platforms with very different semantics. So we want to abstract these away into a single runtime. Dividing up these into the process space, we have the process space, we have the SGX, which is from Intel. And although SGX has been out for a while now, support landed in the Linux kernel only within the last six months. So it is a pretty relatively fresh technology in that regard. Also, we have the technology from RISC-5, which is called Sanctum. And that also follows this pattern. On the VM-based CE side, we have AMD provides a technology called SEV. And we are not targeting the early versions of this platform, which did not provide integrity protection and did not, the earliest version didn't encrypt registers for the guest. So we are starting support at the SEV SNP version, which provides a secure nested paging for all of those additional protections. And then there's also a product called TDX, which is forthcoming from Intel. Although they've announced TDX, we don't know exactly where it's going to come on the Intel chip roadmap. So look forward for that in the future. IBM Power has also additionally offered PEF, which is a technology that's similar to the way that SEV operates. And so that was available, I believe, on Power 10. And ARM has also announced Realms. And actually, Realms is able to do both process-based or VM-based TEEs, is my understanding. This is also forthcoming in a new silicon that will be coming your way shortly. So the way that we abstract across all of these is that we use WebAssembly as a runtime. And so what we want to do is we basically want to have all the hardware stuff lower on the stack, exports a single interface that is usable for WebAssembly. And WebAssembly, of course, is a W3C standard. So this is not something that we are building ourselves. It's not, you don't write your application to NRX. You write it to your normal language and your normal APIs, and you compile it to WebAssembly and then deploy it. WebAssembly also has a set of system interfaces called WASI, WebAssembly System Interfaces. These are sort of like POSIX syscalls. And so they provide APIs outside of the WebAssembly execution environment in order to be able to do syscall things. And then on top of that, there are language bindings that are provided. So for example, the bytecode alliance provides a libc implementation on top of WASI. And this means that you can take your C application and you can compile it and link it against the bytecode alliance libc, which will under the covers use WASI in the same way that under the cover G libc uses Linux syscalls and so forth. So applications basically just to write in their own language. They have language bindings on top of WebAssembly. And this is all work that is going on in the industry. It's not work that's part of the NRX project. But we get to benefit from the fruit of this. So our focus is primarily at the WebAssembly layer and below. And we call all of this when you combine all of these different components into a single runtime environment. We call this a keep. A keep, of course, refers to the most secure part of a castle. So that's what we call our trusted execution environments. Let's go through the various different principles that we have when we talk about our runtime. So the first one is runtime portability. And there's a lot of reasons we want this. The first one is that we, of course, by this we mean no recompilation of workload. There's a few reasons why this is really important. Why it's important to be able to take a WebAssembly binary that is instructions that architecture neutral and be able to deploy it on a variety of different technologies. So the first one is function equivalence. And this is, we need to be able to, let's say we have performed a task and we get an access station back that some output was produced by a function. If the function is just native code and has been measured by the specific technology, then what you have is a, you have a single measurement that encompasses not only the function, but also the, all of the enablement technology that goes on under the SDK or whatever that you're using to build this. And so it can be difficult to tell exactly what function you're actually executing inside of this heap. And this is also true and gets even worse when you go across technology. So if you're building, say a function and you want to deploy this across multiple different technologies, now even the exact same function, even if it had the exact same instruction set architecture is going to have different measurements because it's produced on a different hardware platform. But what we want to achieve is we want to achieve a function equivalence. So we want to be able to have some easy way to be able to look at an access station that has been produced over a set of data. And we want to be able to say that the two are equivalent. So whether a function was run on platform A or whether a function was run on platform B, it really doesn't matter. We want to know that the same function was run in both places. And unless we have runtime portability, we can't achieve this function equivalence because of the different technologies involved, because of the different native code that's the native instruction set architecture that's running on the hardware platform. All of these things affect the measurement of the actual function that's run. And so we need some way to be able to provide function equivalence, and WebAssembly helps us to accomplish that. Another important one is that we want to have security as config. And what we mean by this is that right now, if you create and deploy an application, it's actually quite difficult to determine the status of your security. For example, what certificates is an application using, what crypto parameters is it using, what hardware resources is it using. And all of this is what we kind of do now is we throw applications into a container where we deploy it on a host. And at best, we can do security scanning of packages to determine whether there's known vulnerabilities on the software stack that we are deploying. But beyond that, there's no real standard way to deploy things like security config, crypto parameters and so forth. And so auditing of all of this becomes a nightmare in practice. And what we want to be able to do is we want to be able to identify our security as configuration in a known way. So for example, if a vulnerability occurs on, let's say, AMD SCV, it really doesn't matter what the technology is, vulnerabilities are going to occur over time. So when a vulnerability occurs, what we want to be able to do is we want to make a configuration, deployment configuration policy change. And we want to shift everything immediately to the new policy that we consider to be correct. All of this, of course, is now auditable because everything is in the configuration. And we don't have to sort of infer that from a variety of properties. Everything is exquisite. And so we always know exactly what the security properties are of any given deployment at any given time. So being able to do this really requires runtime portability. If we want to be able to move from one platform to another, one of vulnerability is discovered. This really requires some ability to have security as configuration. So another problem that arises when we are attempting to do confidential computing is the problem of deployment density. And unlike traditional VMs, for example, where you can share pages in a hardware-protected environment, the memory pages aren't encrypted and they are typically encrypted in some way that's unique per TE instance per VM or so forth. And so what this means is that where before, when we could map a kernel, for example, into a VM, we could accomplish a lot of sharing and get high density by sharing pages between instances. Well, with confidential computing, that's no longer something that we can accomplish. And so since we don't have the ability to share the pages between instances, the ability to have minimal overhead and a minimal runtime environment, of course, becomes much more important. And we've already seen efforts to do this even outside of the confidential space. Projects like Firecracker, for example, are an attempt to distill down to a very minimal VM in order to be able to increase density. Another example of this is GVisor, of course, which again attempts to do things as minimally as possible and increase density. So we need to figure out some way to increase density. This is one of the problems we intend to solve. We also have the problem of the TCB, otherwise known as the Trusted Computing Base. And this is the amount of code that you have to trust in order to deploy an application. And of course, we want this to be as minimal as possible. If you have to depend upon hundreds of megabytes or even gigabytes worth of software just to deploy your application, then that's a large pool of places where security vulnerabilities can arise. And so what we want to do is we want to try to reduce our attack surface by having a minimum trusted computing base as possible. This also aids in auditability. So when we, of course, nRx everything in nRx is open source, and you can come read the code, you can come critique the code, we also plan to do audits as a company as well. The intent here is, of course, that once we have code that is auditable, it's much easier to have a stronger confidence about the security of the software that we're deploying. The third factor about TCB is that, of course, just having it small also improves the startup time, which is something that's great. So we are definitely, we want to be bound on hardware for the startup time. There is some performance penalty to be paid by things like attestation. But this is largely bound by the hardware. So what we want to do is we want to be waiting on the hardware and not waiting on nRx. We believe that we can accomplish that in a way that's very minimal. A fourth principle that's really important is that we want no host supplied bits in the heap. And this is something that arises significantly in other approaches, which is that they want the host basically to provide some kind of a runtime. Or this is particularly true, for example, with VMs. With VMs, you basically want to have a BIOS that's going to find your bootloader and then load your kernel and so forth. And by having that BIOS, however, we're having something that is injected into the protected area by the host. This is a fantastic place to put a backdoor or other things like that. And so it's really important, we believe it's really important to keep those bits out of the heap altogether. And the only other way to accomplish this is with an SDK model, which is what projects like the Intel SGX SDK and Open Enclave SDK are trying to accomplish and best of luck to all of them. But we want to make sure that we have no host supplied bits in the heap, but we also don't want to depend upon a complex SDK. So this is, of course, all because our trust model excludes the host. And we want to be sure that everything that is running in the heap is precisely what the tenant supplied and that there are no backdoors or any opportunity for them to arise. So how do we accomplish that? Well, the answer is we borrow strategies from elsewhere. So although we're going to be introducing some, how the NRX project has architected the solution, there's a lot of stuff here that is really just rehashed from other places. And that's because we think it's best, of course, to stand on the shoulders of giants. To go over the details of this, I'm going to turn over to Harold, our Distinguished Engineer, to continue. Harold? So our goal is, of course, to have a very minimal kernel in its size and capabilities, which makes auditing the whole stuff much easier. It should offload most of the work of syscalls to the host. And of course, we don't need any hardware emulation in the guests, so like virtual network devices or disk devices. So the footprint is much smaller and easier to secure. This all is good for auditability. Again, smaller and better. And I think we can skip that because that protocol is probably... Well, yeah, I think the important thing to note is that we want to have a standardized protocol, something that is standardized from the host to the guest, so that no bits can need to be supplied by the host. The entire protocol is standardized in a version between the two of them. Harold? Yeah, and our protocol is called teleport. The teleport is basically the back door to the keep where all the communication is happening. It's very small and secured and a guard can basically secure that thing very easily. The host side is basically untrusted for the TE. And so we have to check all what is coming in and all what is going out from our payload binary. On the next slide, we can see how our payload here, in our case, our web assembly JIT compiler is the static pi binary, which is running in the TE with the shim as the kernel. And it will... So you can basically run any static pi binary in our shim which you can run on bare metal. It uses standard CIS calls, standard Linux CIS calls, and the shim will intercept those and inspect all the parameters and copy over the buffers supplied by the user space, basically the static pi binary as the Linux kernel does. The shim interprets those CIS calls and serializes them over the Sally port interface into a buffer. This buffer is shared between the host and the shim and is not encrypted and via some mechanisms, the host is then triggered to interpret those data. It will deserialize the data. It will verify all the parameters. It does not trust the shim as the shim does not trust the host. And if finally it has validated the CIS call, it will execute the CIS call on behalf of the shim in the host. The data is then transported back. The result is transported back over the Sally port back to the shim and also validated then back to the static pi binary. So we can... Some of the CIS calls stay entirely in the keep, like uname or memory mapping CIS calls. They don't have to go outside and cause a VM exit or an exit to the hypervisor of the host. This saves a lot of time in usual non-IO situations. Yeah, pretty much the only thing that goes out to the host is IO related stuff. So anything that we can emulate internally we do. Exactly. Maybe threading has to synchronize over the host too, but these are the main things. You can sort of think of the shim as sort of the meat of a sandwich and on the bottom side of the sandwich is the Sally port interface. Back up, I'll show that here. So on the bottom side of the shim is the Sally port interface and on the top is just your standard Linux CIS calls. So the shim emulates the CIS calls to the binary and then translates those over the Sally port. And I think what Harold said is really important about the fact that the shim and the host side mutually distrust each other. So this is the phrase I like to use is adversarial compute, mutually adversarial compute. We are basically trying to compute in a scenario in which there is a well-defined interface, which is the Sally port, but the neater side trusts the other and so always has to verify all of the parameters that go across the wire. So Harold. Yeah, the host side can of course lie to the shim and provide fake data. And so all of the communication happening with the outside world is encrypted basically and secured by max authentication. And really, the shim should not trust any outside data. So the host cannot interpret any of the data which is going through IO with the outside world or even with emulated disks. So in the next slide, we see that our static by binary is basically the Wehrsam-Jet compiler interpreting our WebAssembly application. And together with the shim and the loader, this forms the ANAX project. The application is CPU and language agnostic and can be deployed in any architecture of the ANAX project supports now and will support in the future, which is very without being recompiled, which is very cloud friendly in my opinion. Here we see the root of trust is the Intel CPU with the ANAX loader and shim running in one process space. On the next slide, we see the same thing with an AMD CPU where the loader starts a virtual machine where the shim is running in there. So for the application, nothing has changed basically what the application sees. And that's the real benefit of having this abstracted technology. So thank you. Yeah, we would really love to see people get involved with the project. The current status of the project is basically that we are nearing our first release, and we hope to have that within the upcoming month, which will have all of the pieces basically put together and will allow you to pretty simply run some WebAssembly inside of one of the keeps as a deployment option. There's lots of ways to get involved. We need people with all sorts of skills. So whether you have low level technical skills, whether you have marketing skills, whether you have really anything, even you just like to be sociable. Come see us. First of all, you can follow us on social media. That's a good way to get updates for the project. There are links coming up in the last slide for that. Of course, if you want to try it out, we would love that too. So download, compile, run, test, report, all that kind of stuff. We definitely would like people interacting with it in any way they can help us find bugs as soon as possible. And there's really two different kinds of people we would like to have interested in this regard. The first one of people who just wants to deploy a workload, and our workload runtime is still a little bit immature. So I would temper expectations about being able to do a lot of stuff with that yet. It's going to be coming in the upcoming months as our effort shifts from the bottom part of the stack towards the upper part of the stack. But the other part that we need is we need people who are just interested in contributing to the low level part of the stack as well. There's still a lot of work to be done and we could definitely use your help. We'd love people to also just audit our designs and implementations as well. We do everything in the open and we value your feedback. We definitely need help with documentation. Nick, who is the community manager for the Anarchs project, is currently working on documentation, but he could use lots of help. So feel free to get a hold of him. And then of course we need evangelists, people who are willing to spread the word about what we're doing. And we would love for you to get involved in any way you can. For those who do want to get involved on a technical basis, there's really a lot of different places you can work and you can learn as much as you want. You can go as deep as you want. So if you have any expertise with like SCV or SGX or even really just embedded experience, those are all very relevant. If you have some experience with WebAssembly, we would love that as well. Either on the compiler side or the WASI side, we need help not only implementing it, but also updating the standards that are being worked on. Microkernel slash just call. Like if you're familiar with doing that kind of stuff, great. Same for Linux systems programming. Like if you're familiar with networking and storage, we would love to have one of the highest priority items for us in the coming months is going to be getting a networking stack up and running. We already have it partially running, but there's some specific things we want to accomplish there. So we would love your design input and experience there. A little bit later, we're going to be working on storage as well. So please come talk to us if you're interested in that. And then, of course, we need to be able to integrate with Kubernetes and OpenShift. So if you just have distributed systems experience and deploying in cloud, these are all really great relevant skills. The last two are really just security, auditing and research and Rust. Everything we have is written in Rust, with the exception of a little bit of inline assembly. So we would love you if you have any of those skills, please come join us. It's really easy to find us. We're available online at nrx.dev. The chat is just chat.nrx.dev, so pretty easy to find us there. All of our code is available on GitHub, GitHub slash nrx. And we also are available on social media, so you can go to the nrx project on Twitter. There's a URL there for LinkedIn as well. Or you can just search for nrx on YouTube, and we have videos of lots of the talks that we've done. So you can see those videos there, as well as some other videos. Just a reminder, everything we do is licensed in Apache too, so it's all open source and can all be used very widely. And we would really love to have your input. Obviously, we're pre-recording this talk, so we can't take questions right now. However, I believe we are going to be online when this talk is online, so we will be able to answer questions there. And we hope who have left enough time to cover all of those questions. So please don't be shy. We would love to hear all your questions. All right. Thank you very much. It's been great chatting with you, and we appreciated the opportunity to let you know a little bit more about the project. I'd especially like to thank Harold for doing this talk and taking time out of his busy development schedule. So thank you, everyone, who is involved, and we'll talk to you next time. Thank you for listening. Bye-bye.