 Hi everyone, I'm Samuel Ortiz from Intel and together with Eric Ernst from Apple, we're going to talk about confidential computing and containers. So confidential computing, this may sound like yet another Cloud Buzzword, but we actually believe that the Cloud Computing technology is building and providing a way to create a very interesting new threat model. So if you look at the existing trust computing base for a typical guest running on the Cloud, it needs and includes all these layers here from hardware to software, including the hypervisor, the host operating system, the host kernel, the firmware, everything must be trusted by the guest. Everything that's provided by the host must be trusted by the guest. And what confidential computing and the related technologies is trying to build is a threat model where all the software layers provided by the host are actually no longer trusted. So it's trying to remove each and every piece of software provided by the host out of the guest TCB. This is very appealing. These essentially completely remove the host software stack from the guest TCB, including the host firmware, the kernel, the hypervisor, everything is out. And the interesting part of this is that the tenant is the only one that can see and modify its data. No one else can see it, no one else can modify it. And most importantly, the infrastructure owner, the CSP, the infrastructure provider does not need to be trusted anymore. So this is a very interesting threat model. How do we get there? What needs to be provided by those computing, confidential computing technologies to build that threat model? Well, the first thing is that they need to protect the tenant's data. This is obvious, I think. If you want to remove the host software stack from the guest TCB, you want to make sure that the host software cannot see or tamper with the guest data and the tenant's data. But that's not enough. We also need to let the tenant verify what it's running, which software component is running, how it's running, and on top of which hardware it's running. And we're going to go through those requirements in a little bit of details. So first, we want computational computing technologies to protect our data. And we can already do that. Our data can be in three different states. It can be in transit. That is typically where your data is going through some networking pipes. And we have TLS, we have VPNs. When our data is in that state, it's basically protected. We know how to do this. Same for when the data is at rest, which means it's stored somewhere. We have disk encryption. We know how to protect our data when it's resting in some physical medium. The thing that we are lacking for completely protecting our data is protecting it when it's in use, when it's being computed. And to do that, we need to be able to encrypt the memory where our data is being computed, where it's being loaded. And our data is also going through our CPU states, our registers, our stack. All this needs to be encrypted as well. So we currently don't really know how to do this except when using confidential computing technologies. But protecting data is not enough for confidential computing, for removing the host software stack from the guest TCP. If we protect the data and we don't know which software components are using or loading or modifying this data, we don't know which data it is. It could be completely bogus data. It could be rogue data. It could be malicious data. So as a tenant, in order to not trust the software components coming from the host, I want to be able to not only protect my data and make sure that my infrastructure owner cannot see or temper with it. But I also want to be able to verify that the software stack that I'm running inside my guest is the one I'm expecting to be running. And also that this software stack is running on top of a hardware platform that I know. And that is the expected one. If I can do both at the station and basically verify that what I'm running is what I'm expecting to run and I can protect my data from the host, then I can really start building this new thread model where I can safely remove the host software layers from my trust boundaries. In order to do this, there are some hardware dependencies. Protecting the data in use, doing memory encryption and CPU state encryption, together with that the station both need hardware support. You don't want to do that in software, essentially, because if you do that in software, you again have to trust the host. So that need hardware support. And there are a few technologies coming from AMD, IBM, Intel trust domain extensions that are providing these supports. So those technologies in one way or another, they provide memory and CPU state encryption and integrity, sorry, and they provide a way to attest what kind of software stack you're running in the guest. The interesting part about it is that they all design as hardware virtualization extensions, which means that they have a strong requirement that if you want to do confidential computing, if you want to, as a tenant, as a workload, if you want to be able to take advantage of those confidential computing technology, you're going to have to run inside a virtual machine. So that brings software dependencies. As I said, if you're a workload and you want to be a confidential computing workload, you're going to have to run inside a virtual machine. That means that as a workload, you're going to have to indirectly talk to an iProvisor. You're going to have to talk to KVM and QMU or Client Provisor, for example, or IPV or any iProvisor that actually supports those confidential computing technologies. Okay, so now we've seen what we're expecting from the emerging confidential computing technologies to be able to build that new thread model. Let's see how we could apply a confidential computing to containers. And the goal here is really to abstract all the dependencies that we just enumerated, the hardware and software, and provide confidential computing to cloud native in a seamless form. We want these to be seamless for containers to consume. There are a few blockers. The very first one is that RunC does not talk to KVM. RunC is the most ubiquitous container runtime, and that's the one that is going to start your containers and manage your pods and Kubernetes eventually. So if RunC does not talk to KVM, if you run your container workloads through RunC, you're not going to be able to access the confidential computing hardware extension, because as you remember, the whole design has hardware virtualization extensions. So that means you cannot protect your tenant's data while it's in use, and you're not going to be able to build that confidential computing thread model. The other blocker is that Sierra your runtime expect to be able to mount container images on the host. So before launching a container workload, they actually mount the container image on the host, and then they let the container workload access it through some namespaces or in some of the cases through virtualization technologies, but they would access what's mounted on the host. If we let the Sierra runtime mount container images on the host, we basically are letting the host access and tamper with our data, and we cannot protect the tenant's data while it's at rest. So those are the two main blockers. Solutions for this, solution for enabling confidential computing for containers are three falls. Cata containers is the first one to use Cata containers instead of RunC and leverage the hardware virtualization extensions that are providing by these confidential computing technologies. The second one, the second part of the solution is being able to encrypt the container image layers or at least verify and sign them. And last but not least, we want to be able to offload the container image service inside the virtual machine. And Eric is now going to go through those solutions in more details and explain why we think they will be able to bring confidential computing to the cloud native ecosystem. The scene is all yours Eric, thanks. Thanks Samuel. So as Samuel pointed out, you know, in solution space, one of the things that we think makes sense is Cata containers, but let's kind of get into that a little bit more detail. So requirement is clear that you need to have a virtual machine in order to be able to leverage these extensions. So it needs to run inside of a VM, that makes sense. Further, you know, based on that need to use hardware virtualization as isolation layer. So normally your CRI runtime will call into runtime to actually create the virtual machine and run your pod inside of it. There are a few different sandbox runtimes. G visor, firecracker and Cata containers are three listed here, but you know, there are others as well. G visor, we didn't really see as a perfect match just based on, you know, it's utilizing a user space kernel, it's not, you know, it doesn't end up booting a full virtual machine. So we didn't think that that would be a perfect match there. Firecracker, similarly, it's the firecracker runtime utilizing the firecracker VMM. It has some limitations device-wise that make it a bit more challenging when using in a Kubernetes environment and when you have multi-container pods and everything else. So really we saw Cata containers is the more natural fit solution for this and in that you can have full compatibility in Kubernetes. You can still use VMMs like Cloud Hypervisor or QMU and really, you know, provide the hooks necessary to be able to leverage confidential computing. So with that, then the whole image management, as well as any real storage management, there are kind of two parts of it that we want to look at, one of them being service offload. So when we talk about service offload, we're really saying we don't want to mount it on the host. As Samuel kind of pointed out, like we can, we can do all this work to protect our data while it's executing, but if we forget to protect it at rest, we're kind of, there's no point in anything that we're doing. So we want to avoid the CRI runtime needing to mount or do anything really with the images on the host. So the different ways we can do it, one of them is kind of naive, I would say, and that is to fully offload to the guest. And what I'm seeing here is that don't do anything in the CRI runtime, and instead do the pull and everything inside of the virtual machine. That makes it easy and, well, easier in that it's a single, you know, service operating inside the guest. When I said it was naive before, it's, you know, you can imagine a situation where you have 30 pods running on the node and each of them are running maybe the same image or maybe sharing most of the layers of their image. What you're going to see is, since it's happened inside each guest, which means it happens for every single pod, we do this image management, there's no deduplication. So every single layer is getting pulled inside each one. So it's kind of a bit abusive maybe to the file system, but even worse probably for the network as well as you go ahead and pull all these images for every single pod running on that node. It's a good first step and it really does move things away and protect the data, but probably something a long term that you would look at is to do kind of a mix where maybe you do the pulling of the actual layers on the host itself. So you can have some sharing there, but then present all of these layers to the guest. And the guest then would be able to pick the appropriate ones, do the verification, decrypt, etc. inside the guest itself. So I would see that's kind of the hybrid option for the mixed. And then the next step after that is looking at doing the encryption and verification itself. So great, we're not doing it on the host. That's wonderful. But now we actually still need to be able to do the decryption and everything else. So we need a way to be able to decrypt as well as verify. I say both of those things because you can imagine a situation where your image, you know, you have some very confidential algorithm that you're running on top, but your base image is alpine. You don't really need to encrypt alpine. That's not a secret. You just need to verify that this is the alpine you expect and you can do that through a signature instead. But either way, encrypted or verified, everything that's inside the guest brought into the guest needs to be this. And then in time that you pull these things out of the guest, you need to have it encrypted as well. So anything that it's going to be going to rest and kind of transporting out, we need to make sure that happens. So that's another aspect of things that need to happen. So first, let's walk through. This is what, you know, a high level of how things operate today. If you're using Cota CLH runtime class or using Cota containers with the cloud hypervisor VMM starting at the top, you know, kubelet will come in and see like, hey, there's a pot of science in my node. It's not there. We need to start it up. Go forward. First thing it's going to do is ask the CRI runtime to pull the image. If the image isn't already present, the CRI runtime would go ahead and pull this down and make it available. After that, kubelet would ask the CRI runtime to go ahead and create the container. In doing that, the first step, you know, we'll be making that image mounted so that way it can be consumed by the container. Then after that, in our case, it'll send a create container request to Cota containers. On the host, Cota containers we're talking to is the shim V2. What that's going to do then is start a virtual machine. So it's going to be using in this particular example, it'll be using VT and KVM on top of that will be the VMM cloud hypervisor, and that'll start the actual virtual machine. Once the virtual machine is up, you're running your guest kernel. The first process that comes up is the Cota agent. The Cota agent then is going to see, oh, I have this root FS available. I'm supposed to, you know, here's a config JSON that continuity gave me or cryo. I'm going to go ahead and now create this virtual machine. And tonight you have a pod working. How would this change in the case of using something like confidential computing? So runtime class, we'd have a unique one. Let's say it's cloud hypervisor with TDX. Similarly, people, it's going to do the same stuff. It's going to say, Hey, I'm going to run this pod, make sure you pull the image container date. In our naive case, we'll describe here, we're going to fully offload it. Don't actually pull the image, just kind of cash URL, perhaps, and just move on and say everything's fine. Now people are going to continue and say, Okay, I'm ready to create the sandbox. I'm ready to create the pod. I'm ready to create a container. It's going to issue a create container requests. CRI runtime doesn't need to do any mounting at this point. Instead, it's going to just forward the information to Cota containers, that same Shim V2 to create the container. Cota is going to start up the VM at this point. It's going to do the same thing. It's, you know, going to boot up the Cota agent is going to come up, but now it needs to pull the image. So what we're going to do is we have an attestation service. We have essentially the Cota agent will come up and say, This is the image that I need. Based on that, it needs to be able to decrypt it. And it's going to need to talk to a remote attestation agent or get away to be able to find the keys. So it'll talk to that remote attestation as an example. And it'll say, This is who I am. You know, it did a measured boot of the host platform, of the kernel itself, of the VMM, of the guest kernel, of the Cota agent, up to that point, all measures. So we can say, This is who I am. You should trust me. Please give me the keys so that way I can work on decrypting all the layers of your image so we can run your workload. So at that point, assuming everything looks good, we get the keys, we're able to decrypt and we're able to mount the image. And at this point, we're at the same situation we were before in the traditional case where we have a RudaFast on the file system in the guest. And we want to launch a container and things just continue normally at this point. So stepping back, great. On a slide, we show that it all works. And let's pretend we're a little bit ways out and all this is working end to end. What does this really mean for the end user? Confidential computing is very powerful. It's a great new threat model to be able to provide that security and practicing everything else for the end user. What does it mean for the developer? Unlike existing technologies today, it means absolutely nothing. You can get confidential computing without needing to change your workload. You're just writing a container. The thing about that, though, is you really should follow the same best practices as you always do. Namely, if you have a backdoor in your base image, your top layer, like this confidential is no longer confidential. It can be exposed elsewhere. So make sure you know what you're running. I would say that that's the only thing is you should maybe focus a little bit more because you're trying to be confidential. On top of that, I call this a tenant user. This is the person who's going to do Q-Cuttle apply, who is actually owning the YAML, who's running the workload itself. Maybe it's the same person as the developer. Maybe it's somebody else. In doing that, what does this really mean? There's two parts. One, you're going to need to update your pod spec, actually, just generally. You're going to need to use this specific runtime class just like you would do if you're using a sandbox runtime. On top of that, you're going to want to specify, how do I attest? If you were getting keys from a remote attestation server, you need to be able to specify, hey, use this URL to be able to communicate, to get the keys and pull everything else down. That would have to be custom per. On top of that, I would say, maybe that was the person doing the YAML. Maybe there's a person who's actually at the company who's using the service, who's set up the infrastructure for running the workloads. This person would have to, one, set up an attestation service. That way, you can go through and take the measurement from the cloud provider and be able to see, yes, this is the hardware and everything else. This is what I expect it to be running in and who would then be able to manage providing keys such that we could decrypt it if everything looks good in the running environment. Similarly, this person would also have to manage actually encrypting with those keys, encrypting the images and everything else and providing the container images. A little bit extra work there in order to be able to leverage. As far as a provider is concerned, one, you have to install a runtime that supports confidential computing. I didn't put it here, but that's kind of a base start. If that's in place, there's kind of a couple of things. Introspection doesn't really exist anymore. If you're running something that goes through a FALCO or something like that and you assume that you have processes running on the host that you can look at, you can't anymore. That's not there. Again, that's the whole point of what we're doing after all. A little bit of a change of how you manage things. But two, there is often, for all the different architectures, the number of keys that are available for encrypting each trusted domain, each VM is a finite resource. You're going to need to treat it like a finite resource that you can schedule against. That'll be something like a device plugin. That's what it means for there. Ultimately, not too much changing, nothing too drastic. Let's talk about gaps then of where we actually are today versus what we're working on and what's coming in the future. On the COTA container side, there's a few different parts. One is be able to make sure we're using a VMM that has these APIs to leverage the confidential computing hardware virtualization and make sure that in doing that, we are creating configuration knobs for the user to be able to specify things like how do we get the keys, how do we do attestation, how do we actually enable confidential computing in general, so adding these. On top of that, there's the actual key provisioning. Again, depending on the architecture, we're going to be getting keys from different ways. We could be getting it from the firmware on the host. We could be getting it from a remote server. In general, we need a way for the COTA agent to be able to get those keys so that we can go through and decrypt the images. As part of that, we're going to have to be able to do attestation. We're going to have to be able to have a set way to be able to have a measured view up to the point and including the COTA agent inside of the guest in order to facilitate responding to a remote attestation that says, this is who I am, please give me the keys. On top of that, the agent API, we're going to have to lock down a little bit more and remove some of the features. Example, you can cube control exact into a container today and poke around. If you can do that in a confidential environment, things are a bit broken. It's not confidential. You can just access and get in all the different debug and kind of ways to interact. We really need to lock some of those down. Some of the aspects of the agent API itself will remove so that way we can indicate clearly that this is not supported in this configuration. On the image side, the COTA side, we can figure all that out, but this is where it's a lot more fun and interesting interacting with kind of some of the pod lifecycle that we take for granted and being able to facilitate doing this image service offloading, being able to have a remote target for who does snapshotting instead of inside the guest. There's a lot of opens there and really we're looking forward to talking folks on it. Similarly, how do we do image layer encryption? This is something that is beyond the scope of what we're just looking at. This is something already well discussed and in progress from an OCI specification standpoint. So overall, stepping back, some kind of takeaways and summary, one, confidential computing requires VMs. So keep that in mind and that's really again why we're focused as COTA containers and other VM isolated workloads is a good target for this. There are a lot of different implementations from a hardware perspective on how confidential computing is being carried out, but our goal and what we're working on is to abstract it away so that way they don't have to be exposed to this, but still make it clear to understand what level of isolation is there for the user. And for the end user who down the road just wants to use this, really they don't have to think about too much. As long as the attestation frameworks and everything else are in place, you are right in your same yaml and just adding a different runtime class ultimately. So it shouldn't have to change. You certainly shouldn't have to change your actual application itself. And then the last point is that this is very much a work in progress and the goal of presenting here is to really put it out there, get people to think about it, and look for gaining input and discussion from folks. So at this point, I think we are open for Q&A and we do have a link here to the outstanding issue that Samuel has put together to kind of look at bringing support for COTA containers for confidential computing and to end. Thank you very much.