 Thank you, everyone, for coming to my talk. My name is Yeremi Piotrowski. I work for Microsoft. And the title of this talk is the next episode in workload isolation, Confidential Containers. At the start of this talk, I'd like to talk just briefly about myself, just so that you know where I'm coming from and what my background is. So I started working in embedded Linux systems and hardware security modules. I'm one of the maintainers of the flat card container Linux project. I've been working for Microsoft in Azure for the last two years. And for the last year, I've been working on Confidential Containers and within the Confidential Containers project. If you want to reach me after the talk with questions, follow up anything, that's my email. My SCED profile contains other socials, but that's a reliable one. Or Slack. I'm on the CNCF Slack, on the KubeCon Slack. And the focus of this talk will be a project called Confidential Containers. We often refer to it as COCO just because it's nice, short, and cute. And COCO is a CNCF sandbox project. It is built on top of Cata Containers, a project with which many of you may be familiar. Cata Containers is used to isolate workloads using VMs. And Confidential Containers isolates workloads using confidential VMs. It's a fairly young project, so I think it's been around for some two years in the CNCF since a year. We're currently at the 0.50 release, which happened last week. It was very exciting. And the goal of the project is to enable cloud-native confidential computing by leveraging trusted execution environments to protect containers and data. And so this talk will really talk about the different aspects of that mission. If you want to find out more about the project, about the community, that's the GitHub link to the community repo. It links to the various documentation repos and code. And the thing about Confidential Containers is that every single thing about it is interesting. There are challenges everywhere. Everything could be a 30-minute talk. And so I can't cover everything, maybe for the better. But there was a talk earlier today, like right before lunch, by Fabiano Fidenzio and Jens Freiman, also to Confidential Containers community members. If you weren't there, review it offline later. It was great. And give you an intro into the architecture of the Confidential Containers project. If you're more interested in Cata containers itself, there's a talk later in the same track, I guess, in the same room about tackling hard multi-tenancy with Cata containers. Now, there are various hardware implementations of Confidential Containers and Confidential Computing. And the one that I am most familiar with, so that's why it's going to serve as the example in this talk, is called AMD-SEV-SNP. SEV-SNP stands for Secure Encrypted Virtualization, Secure Nested Paging, which is quite a mouthful, but it makes sense. So SEV-SNP, or SNP for short, is a Confidential Computing technology that allows us to run virtual machine-based trusted execution environments. And across all the Confidential Computing technologies, you'll find similar properties of which I will focus on three. One is that we want to guarantee confidentiality and integrity of both code and data. Confidentiality is implemented by encrypting memory, but confidentiality alone is not enough. You can still compromise a trusted execution environment if you can modify encrypted pages from the hypervisor. So that's why we need integrity as well. The second aspect that is important to Confidential Computing is attestation, and more specifically, remote attestation. And I realize it's a complex word. It's difficult to grasp. It took me a long time to come up with a simple way to remember what it really is about. It's about obtaining evidence that we can forward to someone else. They can verify this evidence and decide whether to trust us with something. By the way, don't ask me about the meaning of provenance. The third aspect of Confidential Computing that is important is that the hypervisor is outside of the trusted computing base. So this is often known as TCB. So the essence of the trusted execution environment is that we only trust what is inside this environment. It's nothing outside, especially not the hypervisor. There are various ways to create virtual machine-based teas. Intel has their own technology called Intel Trust Domain Extensions. And if you were to study that one, you would find that it solves the exact same issues in similar ways, in different ways, but it has equivalent security properties. And so in this slide, I also wanted to show you a brief system architecture of the system for an AMD S&P node. There's an AMD secure processor on the chip that is the hardware root of trust. It manages the encryption keys for memory. The encryption keys are unique per virtual machine. And there is an additional table called the RMP table that manages the integrity aspects of the Confidential Computing Platform. And so let's look a bit deeper into what attestation is about. So the goal of attestation in the context of Confidential Computing is to unlock secrets after performing attestation. And the way this is implemented follows an RFC standard defined by the Internet Engineering Task Force, called the RATS protocol, Remote Attestation Procedures. The capitalization is a bit messed up, but that's the way it's supposed to be. And so this defines the interactions between various parties so that attestation can be performed and so that we can trust the results of it. And so here you see two models that are defined in the RFC that are fairly popular. Let's start with the left one. It's the so-called passport model. The attester is the T. That's the trusted execution environment. And the T obtains evidence that it forwards to a verifier. The verifier verifies this evidence, returns some attestation results that can be relayed to a relying party. Why relying party? Because it relies on the evidence or on the attestation results. There's another model in the so-called background check model where the attester communicates with the relying party. And the evidence is checked in the background. And so both of these approaches have merit. Depends on the use case, which one will be more applicable. But I wanted to also show you some examples of what this looks like in the wild, like in the real world. So you'll see same models, but like concrete components implementations. On the left is what it looks like when using Microsoft Azure Attestation Service. The Confidential VM talks to the Attestation Service, transfers hardware-specific evidence, which is then verified. A token is returned to the Confidential VM. This token can then be used to pass to the Azure Key Vault managed HSM in addition to any other authentication, which then uses that token to make policy decisions and then return keys to the Confidential VM. And the funny thing about this left part is that every single one of these components actually runs in a trusted execution environment. So Attestation Service and the Key Vault, both of them, are also using confidential computing technology. On the right, you'll see the architecture of how the Confidential Containers Project is doing things. It's following the background check model. So the Confidential Container or VM talks to the so-called Generic Key Broker Service, which performs the Attestation Service function internally. Now, we want to look a bit into what the Attestation evidence looks like for this specific case. So for SNP, the evidence is called the Attestation Report. It contains a limited amount of data, but I'll focus on four fields that are present in this Attestation Report. The first one is called the Launch Measurement. And the Launch Measurement contains a single value that represents the initial state of the memory of the trusted execution environment. So in this case, you see it's like 48 bytes. It's essentially a hash. And we'll talk about how this is constructed in a sec. The second important field or useful field is called the host data. And it is used to tie data to Attestation before the T is launched. So anything that is additionally included in the Attestation Report before the launch of the T, the T cannot affect. It's fixed. And you see in this example, it's zero. But if you want to know about use cases for this, we have a demo coming up at the Microsoft booth on Friday at 11. We also presented some use cases in the community calls for the Confidential Containers Project. So you can review them later. The third field that I wanted to talk about in the Attestation Report is called Report Data. And this is used to tie data to the Attestation Report at runtime. So this is under the control of the trusted execution environment. When it performs Attestation, it can pass additional data to be included, although limited. This is basically 64 bytes worth of. So this will be a hash of some sort. And there are two major use cases. One is some proof of freshness. So the relying party may or the verifier may request a challenge response kind of scheme where it gives a challenge to the T. It expects this to be included in the Attestation Report so that it knows that the T is actually in control of the Attestation procedure. The other use case is to include some kind of key. And then the meaning behind this is that this key is held inside the T, actually a component, like a public key from a key pair. And so someone who has the Attestation Report can then tie this public key that is also given to the Attestation Report and use it to encrypt data that will only be decryptable inside the T. The fourth property is what makes remote attestation possible. So the whole Attestation Report is signed by a unique key inside the AMD secure processor. And this key is also part of a certificate chain that chains all the way up to the Hardware Manufacturers Certificate Authority. So we get the Attestation Report. We can obtain the key that was used to sign it. And we can check the certificates. And we know this was generated inside of a trusted execution environment. So the launch measurement, as I mentioned, represents the code that was used to launch the trusted execution environment. And so the way this is done is the hypervisor, while launching the trusted execution environment, talks to the secure processor who holds the encryption keys and asks it to encrypt every page into the address space of the T. And while doing so, every single page is measured. And as I said, measurement in this context just means that it's reduced into a single value. And the way this is done is you can see in the second bullet point. So we start with some initial value of this launch measurement. It's all zeros in this case. And then we concatenate that with some metadata about what is being loaded, the address that we're loading things into, the hash of that data. We hash the result, and that becomes the next launch measurement. And we perform this measurement iteratively as we load more data, more code. And this formula has several interesting properties. If we change the address at which we're loading something, it changes the launch measurement. If we change the data, the launch measurement changes. If we change the order, we also change the launch measurement, even though we might not want that, but it would be impractical to do it otherwise. There are other special kinds of data that is used to start the VM that is also part of the launch measurement, for example, the initial register state of the virtual CPUs. This is important because if we were to start the T at a different address, we could invoke completely different behavior. And so this is really all about being able to pre-compute a launch measurement. And then if the same measurement occurs elsewhere, we know it was the same code. It hasn't been modified. I know this was a lot to digest, but let's look at some deployment options of what a real-world cocoa setup looks like. So here we have a bare-metal system. The system is a physical server with an AMD CPU. And it's a Kubernetes node. So it has a kubelet, a container D. The kubelet talks to container D. Container D talks to the kata runtime. The kata runtime talks to the hypervisor and spins up a confidential VM. It spins up the VM with some guest firmware, a kernel, and some inner RD, all of which are measured. So part of the measurement, the inner RD of the confidential guest contains the kata agent, some other kata bits or cocoa bits, like image management code, attestation code. And this Linux guest running kata agent can then communicate to the AMD secure processor in the bottom corner to perform attestation, to communicate with the key broker service component. It gets a key. It can use that key to decrypt container images that it has fetched and start pods. So this is like a reference deployment. But it has to be performed on bare-metal. So you need your own hardware. Probably your own data center. The hardware is hard to access in some cases still. And you have to manage a lot of things underneath the operating system, firmware versions, bios updates, these kinds of things are still essential. So what we came up with is the second deployment option. And the problem here is really naming, as always. So I don't know if I should call it nested confidential VM or confidential nested VM. Probably the second is the applicable one because there's only one nested VM in this picture, even though there may be multiple, but in the stack, right? The T is a nested VM that is confidential, hosted inside a Kubernetes node, running on top of, in this case, the Hyper-V hypervisor. Inside the Kubernetes node, it's the exact same picture as before, but it can be much smaller. It can be managed using the existing workflows. And this is also the deployment option that we will be using in the confidential pods offering on top of Azure Kubernetes service that was just announced yesterday. So if you're interested, look, open up QR code. There's a sign up sheet and you can find out a bit more. We're also gonna be using this in the confidential containers CI because it's so practical. And this way, we can have multiple of these Kubernetes nodes on the same hardware. Each of them can spin up a certain amount of confidential VMs, each hosting a confidential pod. And if you think about it for a second, right? I mentioned the properties of confidential computing is the hypervisor is outside of the TCB, right? So who cares that there's another hypervisor underneath, right? The hypervisor is already not part of the TCB. And for me, this was like a Eureka moment when I got this working. So I was personally involved in getting this working with KVM inside the Kubernetes node. I got this working, I compared the launch measurement between bare metal and this setup and they match exactly. So the hardware in that case tells me no, none of the hypervisors interfered with the process with the security of the system, tampered with any of the memory and the CPU attests to that fact. So not every cloud or every hypervisor has this capability. So we also support something called peer pods in the Cocoa project. You start with a Kubernetes node running somewhere. It can be in the cloud, it doesn't have to be. You know, there's a Kubelet, a container D, a Cata runtime, and it talks to a new demon called the cloud API adapter. The cloud API adapter talks to cloud APIs, as the name suggests, and spins up confidential VMs, which are available in several of the clouds. I know Azure has an offering for that. And inside this confidential VM, we start the usual Carabit and then start the workload after performing the same attestation as in the previous two cases. If you're interested in this deployment, right? We published a blog post together with Red Hat last week with a real life use case, something like Spark running on OpenShift with confidential pods. So check out the QR code, or we're also gonna be demoing this at the Red Hat booth tomorrow at 14. So be there. Yeah, there is a difference here compared to the previous two cases, which is we are not in control of the hypervisor bringing up the confidential guests any longer, right? The hypervisor is specific to the cloud platform. There may be differences in the guest firmware, so the measurement may differ a bit, right? But it will still be confidential. And so where is this whole story kind of headed, right? Where are we headed with this confidential computing idea? So eventually it'll be easier to consume thanks to projects like the Cocoa Project. It will integrate nicely with the secure supply chain story. So you'll be able to generate a software bill of materials for your code. You'll be able to deploy your code in trusted execution environments. You have assigned software bill of materials. You'll be able to pass that to some kind of key vault, right? And it will base the decision of whether to trust your workload on the contents of the software bill of materials, which will be attested by the hardware that it matches what was launched, right? Yeah. So at this point I wanna come to the conclusion, which is that I think this will be a part of a lot of multi-tenant architectures or zero trust architectures. It matches the trust model that we have from a user's point of view when deploying things, which is we don't wanna trust anything outside our workload. And I'm sure lots of you are in industries with regulations, with working with privacy data or personal data, working with financial data, working with compliance and audits, right? And confidential computing is really an answer that closes, shuts down the discussion, right? Like what's the scope that I'm evaluating, right? The scope ends at the boundary of the trusted execution environment. COCO, fantastic project, healthy community, multi-vendor engagement, integrates confidential computing technologies with containers and with Kubernetes. COCO supports multiple hardware, trusted execution environments. You know, this is a list for now. We hope there will be more. And there's a spectrum of deployment options available to you, right? In your own data center, bare metal, nested confidential VMs or confidential nested VMs, as well as just standalone confidential VMs. You know, this is a spectrum, right? Depending on your circumstances, different options will be available, you know? Nested virtualization has overheads, yes. But it's also flexible and you know, you don't have to pay for the whole node. Confidential VMs are likely where we're gonna see things like confidential GPUs show up first. If you wanna find out more about, you know, the COCO project, this is a link to a blog post, also on the Red Hat blog from some confidential containers, community members talking a bit more about the attestation flow. And yeah, that was it. Hope you enjoyed it. We'll be taking questions on the mics on the sides. That's a link to give some feedback. I also have other community members here to answer your questions, if you have any. Thank you. Anyone? Anyone? Questions? Mics? No? Sorry, I got a question. Yeah. Actually too, but let's start with the quickest one. Are there any plans of bringing these to other cloud providers rather than Red Hat in the future? Which part? The whole stack. Well, the whole stack is not bound to Red Hat in any way. Yeah, sure, but currently there's plans of implementing, actually them weighing in, as you mentioned it, tomorrow. So. The demo is actually running in Azure. So it's about, so it takes a while to implement this in a cloud. Let's put it this way, right? So confidential VMs, I know we have them in Azure. I'm sure Google has some, IBM has some, but the stack is cloud neutral, right? So this was just what is available at the time. I think things like TDX from Intel are available in Alibaba cloud, right? But may not work in cloud API adapter yet if you're talking about the thing that we will demo at the Red Hat booth. Yeah. Yeah, so, yeah. Any more questions? Thank you. Please. I had a question. So suppose you launch a different micro-VM with Kata, for example, Firecracker, which is also supported, right? So how will that work? It's been nested virtualization because then Firecracker does not really work there, right? Or is it the default micro-VM which you launch with Kata all the time? So Firecracker is the, can launch micro-VMs, but it won't be used for the, you know, the Kubernetes node that hosts that. So I know... So, you know, like when you launch the micro-VM with Kata agent, right? Yeah, yeah, yeah. And replace the micro-VM in the Kata configuration to, for example, run Firecracker, right? So I'm just wondering because with Hyper-V, the nested virtualization, it does not, by default, support it. So is it only, I mean, the use case you showed for default micro-VM supported by Kata? That's what I... So I know nothing about support in Firecracker. I know we, this will be supported with Hyper-V, with KVM, Cloud Hypervisor, and QEMU, right? Like a combination of the four. I mean, Microsoft will support the Hyper-V and Cloud Hypervisor variant, right? Okay, all right, thank you. Yeah. There may be a follow-up answer from Fabiano. Can I get this mic tool on? So the work that has been done on the VMM sites is up to the VMM developers, right? So we have QEMU quite invested on that. We have Cloud Hypervisor quite invested on that. We would love to have contributions coming from Firecracker, but right now we have none. So that's the reason Firecracker will, or at least right now, doesn't work with the TE environment. If contributions can come, like I'm pretty sure that I, at least, would be happy to see that happening, but this is a conversation to have with the maintainers of Firecracker. Yep. Yeah. Can I have a question on the left? Sure. Can you hear me? Yeah. Thanks for the talk. Thanks. I was wondering if you might say a word or two on the compatibility with other native encrypted containers, like Singularity, for example, that maybe aren't running container D? Unfortunately, I know nothing about Singularity, so I can't say anything about that. Okay, thanks. Sorry. Does anyone know anything about Singularity? No? Any other questions? Yeah. Can you just, thank you for your presentation. Can you just explain a little bit more how your solution can contribute to creating a robust software build of material? So the idea that I have in mind, right, is that you tie your software build, so based on your software, right, you have a software build of materials, you can pre-compute a measurement, right? Then later, when you launch your workload, the hardware will calculate the actual measurement again. And if the measurements match, right, you know that the running state matches the software build of materials, right? So it'll be like an integration eventually when the ecosystem gets there. Thank you for the talk. Did you do any measurements on the impact on performance, for example, with memory latency when encrypting and decrypting everything, whether that has any meaningful or non-meaningful impact on that one? Thank you. There is an impact on certain workloads, but I don't have benchmarks. I mean, ideally, this would be available everywhere on all the time, right? But it does hurt some workloads at this time. Over time, with different hardware generations, it'll get better, hopefully. But I have no benchmarks. Sorry. And I mean, the same question came up earlier in the Confidential Container Session. There's also other kinds of overheads, like resources have to be allocated upfront. It's much harder to do things like paging out pages, so you reserve a block of memory so you can't over-provision as much. So there's overheads. Can you speak louder? Sorry. Sure. Is that better? Yeah. Just so I understand correctly, the root of the trust of all of this is with the processor manufacturers, right? Yes. So they're putting the key in there and a compromise into their key chain would mean all of this stack becomes untrusted. Correct. Okay. Right, we place the trust in the hardware, right? In the implementation, in the vendor, yes. Well, it's not quite just that, right? Because we're already having to trust the hardware. If you're running on some sort of CPU, you're trusting that that works. Yes. But you're changing the trust model from, I've trusted that they've implemented hardware correctly, which also we know that they don't always do, and we've had serious hardware security bugs, but we're trusting them AMD and Intel and potentially others. To manage keys correctly. Yeah. Yes. I sure hope they're using keys somewhere in there, you know, key injection process. Yes. Okay. Yes. Thanks. More questions, right side, left side? No? If not, I'll still be here, and I'll be at the booth. So if you wanna talk, we have several people from the project here. If you have a use case, I'd love to hear about it. Thank you. Thank you.