 This session is about Kubernetes and confidential computing, so hope everybody's in the right room. My name is Moritz Eckert. I'm, yeah, having a background in the software system security space, a lot of, like, binary analysis, reverse engineering, spend a fair bit of amount of time in the capture-the-flag competition scene and, yeah, on my day job I work as a, yeah, architect at Agile Systems, which is a startup from Bochum, Germany, and we build open-source software for confidential computing. On the right-hand side here you can see at least some of the team in our office. I'm also a member of the organization team for the Open Confidential Computing Conference, which is a free online conference focusing just on confidential computing that happens online every year in spring since three years now. And I also attend a lot of conferences giving talks just to spread the message a bit, do a bit of educational work for this fairly new topic of confidential computing, which also, yeah, which brings me to this talk. What I like to talk about today is first giving an introduction. I know there have been two other talks on confidential computing here, so if you've seen them and you saw the introduction or you attended a talk at some under-conference, you might be familiar with that. I'm sorry to bother, but yeah, it's still a fairly new topic and I don't want to lose anybody, so I will give a short introduction, the fundamentals, and then we focus on, first of all, why? I mean, this is a security technology, so the threat model and the use cases are kind of important, and then we will see how we can actually use that a more practical approach and see how this fits into our cloud native Kubernetes-based world in the cloud. All right, so if you've heard anything about confidential computing, you probably have seen that graphic, where the states of data and how we protect it, you all are probably familiar with the fact that you can encrypt your files, your disk, whatever is your data at rest. That's pretty much straightforward and we can, if we send data over the network or some other channel, we also have transport encryption for that. Yeah, that's nothing new. What confidential computing brings into play is now that we want to protect the data also by its use, I'm pretty much filling the gap, so then we can have encryption or protection in all states of the data, making it possible for the first time to have real end-to-end encryption or protection of that data, and there's not only confidential computing that aims for that goal, there are other solutions like common-morphic encryption and some other privacy-enhancing technologies, maybe. So the way confidential computing approaches this is via a hardware feature, new hardware technology, that creates usually called trusted execution environments, that means a space, a context from your processor that allows to isolate your code and data from the rest of the system, from the rest of the hardware, and that adds the nice property that the data is actually encrypted while it's in the main memory and that you have a form of remote registration. Remote registration is probably a term that pops up in other areas as well. With confidential computing, the goal is that you can attest cryptographically proof the integrity, identity and confidentiality of your trusted execution environment from a remote location and basically have the root of trust only in your hardware, in your processor, in your CPU. What that means depends on the implementation of this trusted execution environment. We'll take a look at now. Probably the first iteration and most have heard of it by now in a good or in a bad way is Intel SGX that isolates an individual process. So your trusted execution environment is a process that's isolated from the rest of the system where your code and data resides and the advantages that a process can be very small. Very little amount of pages inside that process that are inside this confidential context. The disadvantage of Intel SGX is that since your guest OS or your system is not part of that context, anytime you have a context which you also have now a context which from that secure trust execution environment to the untrusted world which can have some significant impact on your performance and also means that you don't have a regular system interface. So you can just lift and shift any kind of application into an SGX, TE or enclave, whatever you want to call it. Instead you either need to adapt your application or you need some compatibility layer like a library as. Think of like a vine emulation in Linux. If you're familiar with that concept in a similar way you need to do that here. So small context but no real lift and shift. And the next generation if you want to call it like that is now that the latest hardware generation instead of isolating a process we cut off above the hypervisor and isolate an entire VM. That means the guest OS of that VM is also part of that context. We increase this confidential context but the advantage is now that we have a more better interface for running applications inside. They have a full system interface, they have a kernel on a guest OS inside. So here we can have a better lift and shift approach. And AMD SCV was probably the first implementation of that that now is available in the third generation called SCV S&P Intel has an addition to SGX called Intel TDX that follows the same pattern. ARM has a specification called ARM Confidential Computing Architecture. There's a reference specification for RISC-5, IBM Secure Execution is the same thing essentially. So this is where we are heading, where we already have confidential VMs isolating an entire VM. So far this exclusively focused on CPUs but we see a bit of a trend that especially with this AI hype that we are not only dealing with CPUs anymore. What about GPUs? And it's just a processor, right? So we can have the same properties. We have isolation of workloads running on that GPU. We have the memory encrypted. We can also test GPUs technically. And NVIDIA has now released with the H100 the first implementation of pretty much the same principle for trust execution environment on your GPU. And essentially the way it's now attached to that context is that you have a confidential VM. There's a driver. It does the communication with that GPU, does the attestation and connects that to your context. And in the future this other PCI Express devices will follow a similar pattern. So we can have network accelerators and so forth working more or less in the same way. Any questions regarding those hardware principles, building blocks? All right. So the question is why would we do all these gymnastics to create those trust execution environments deal with these hardware extensions? What are the use cases and what is the threat model for these use cases? And the first one is definitely infrastructure security. There's this notion of the cloud just being somebody else's computer. But the idea remains the same if we move out of our own hardware or do we even trust our own hardware and move into a space where there's shared resources, where there's a third party, a service provider having direct access to that or some edge location that's potentially hostile. How can we protect against the infrastructure? We'll go into detail of that use case in a second. The other use case is more of like new types of things you can implement with that technology and multi-party scenarios. Confidential computing is not the only solution making this possible, but it's probably one of the most performant one right now. There was a talk, there was the Confidential Computing Mini Summit on Monday and Sven Tieflinger gave a very interesting presentation on how we can combine different privacy enhancing technologies to implement such multi-party scenarios and Confidential Computing can play an integral part into that. Yes, with all I said, the GPUs with all of this hype about AI, large language models, they also embarked a discussion on now that there's somebody owning that model providing that service and we're feeding that data either through wire training or also via inference. What if this data contains personable identifiable information? Or if the data potentially contains some intellectual property from a company, do we need to shut down, should we block chat GPT for example? So there's a lot of interesting use cases that embark from that. Can we provide any type of AI solution in a privacy-preserving manner or can we bring models to where the data is and protect the model so the owner, if there's the owner of the model, doesn't need to fear about losing their IP. And finally, supply chain security. Confidential Computing of course can be like the last link in the chain where you also protect the environment your solution is then running in so all of the links before don't end up in an untrusted environment but it can also be applied to the actual supply chain itself. Mike Brazell gave a really cool talk on that on Tuesday at Supply ChainCon on how you can apply confidential computing to those stages and really establish trust for the environments where you package, where you build or where you sign your software. So infrastructure-based threats. As I said, confidential computing is not a solution that will solve all your problems. It's really important to get the idea that we're not talking about attacks through the front door, a vulnerability in your application, a CVE in your container. You expose that by the internet and somebody exploits it. Confidential computing won't help you there. But we're talking about threats where there's more than indirect access. Another tenant in the cloud gaining access to the cloud infrastructure and going vertically horizontally through that infrastructure and into your application or your data. Or somebody that has legitimate access like a foreign government that demands access to that. Or just some employee that might get compromised and somebody's attacking you via that vector. And of course right now this is mostly the concern of very regulated or paranoid industries. They either fear their IP or they're dealing with very sensitive information and they simply can't modernize their IT. They can't adopt the cloud because they are simply not allowed to, either because the legal says no or their fear about compliance. So their confidential computing can be a very promising solution where you can say I can isolate myself from the infrastructure because of the properties of runtime encryption, isolation, and then attestation. I can also verify, cryptographically verify that what's running there is indeed isolated. I said currently the focus is mostly on highly regulated or paranoid industries. But Marc Racinovic, the address CTO at OC3, gave a keynote where he said we're heading towards a fully confidential cloud. Whatever fully confidential cloud means, I think the general idea is often referred to the Let's Encrypt movement, where initially before Let's Encrypt getting a certificate for your website was very tedious and there was not a lot of HTTPS traffic, simply because it was not, was hard to get there. Let's Encrypt and the ease of making this use of TLS allowed to really make this the norm. And today it's hard to browse the web without using TLS encryption. Most browsers won't even let you easily surf to a website. And I'm not saying that it's the same, right? That transport encryption is the same as runtime memory encryption or confidential computing. But there's definitely something to be learned from there, some lessons. And the first one is I think definitely my focus is we need to make this very easy to use. We need approachable, usable, make it almost invisible so that you don't have to deal with all of the details, but instead can just adopt this pattern and then become more to the step where we say, like with TLS, right, initially I would just have my bank, my online banking have used HTTPS and you were saying, why would I use that for my personal blog? There's nothing secret about it. And similarly here we could say, there might be a future where you say, why would you not use the trust execution environment and confidential computing? Because it's just there and it's encrypted. Yeah, and of course we need to make it abstract. We need to make it neutral. When the neutral confidential computing is very vendor specific technology so far. And yeah, finally commoditize confidential computing in a sense. And we're here at Open Source Summit. So the question is why is Open Source so essential also for confidential computing? I mean for the obvious reasons. Don't need to tell you that. But with confidential computing with this notion of remote attestation, the fact that whatever you are testing needs to be somehow verifiable, semantically verifiable means that the software that's verified there, you need to have access to that source code. Otherwise you're verifying a black box and you say, well, you can say, well, this is the black box I might have seen before, but you don't know what it's doing. So there's not much gained. So yeah, attestation requires, in my opinion, requires Open Source, right? Any questions regarding the threat model or the use cases? Okay, so let's see the how. That's the interesting part of this presentation. I probably don't need to explain you that, but just for a slide here. Yeah, if you consider a Kubernetes cluster or typical cloud environment where we have a Kubernetes node, it's usually a VM, at least if we talk about the cloud, or workload package inside a container that runs in, or that is handled as an entity called pot, and they are scheduled and reside in those nodes. And then the node has some agent called the kubelet to talk to the Kubernetes control plane, which itself are just a specific type of node that runs the orchestrational services of Kubernetes. And we can have multiple of these VMs. So our question is, I have my containerized application. How do I make use of confidential computing? How can I deploy my application as a trusted execution environment, potentially in the cloud? And the first approach would be how we do that with SGX. SGX, as I said, is just a process-based solution. So you can package your container by basically making the process that's running inside there in SGX. And that's more or less it. The question is, how do we do that? As I said, it's not straightforward. You can just lift and shift the application. It's a bit more tedious. So there are different projects that try to aim to make this more easy. But I would still say that this mainly applies to where you write new types of application and not really lift and shifting existing applications. So there are definitely two approaches taken. One is having a language-specific runtime, like ego for go or ENOX for web assembly, or as I said, this library as a space pattern where you have a compatibility layer like Oculus or Grammina, and that they try to make it a lift and shift of experience to move your containerized workload into an SGX enclave. So what you do is basically repackage your application with one of those tools, create a container, and then you need to make SGX available to those containers. And SGX appears, for ENOX, appears more or less just a device, or is a device. So there's an Intel device driver that you install in your cluster, and that, first of all, exposes the SGX device to your containers. And then it adds some form for making this schedulable. So essentially, you add some annotations to your deployment files where you say, for example, here, I need that amount of memory from my enclave, and then the scheduler knows, first of all, that this needs to be moved to our node where SGX is available, and how much resources this consumes from an SGX point of view. This is for Gram or Grammin. And then you can have multiple containers for running SGX if you have the nodes available with SGX capabilities. And then this is pretty much where we were a couple years back, facing the problem now that we have all of these SGX enclaves running in our cluster. How do we orchestrate them? How would we provide a config map to that SGX container and can still be, and can trust that what we specified in the config map ends up in the container? Or how do we provide it with a file? How do we attest all of these enclaves, all of these SGX containers? We need to verify that this is really an SGX enclave running there, and it's really the enclave we expected to be running there. Because we can trust the Kubernetes control plane. It's in the untrusted world. We can trust the Kupelet. How do we do that? How do we allow them to communicate with each other so that the container on the left knows that the container on the right is also running inside an SGX enclave, and it's the enclave it expected to be there, and then establish a secure communication. So this orchestration is a challenge. And yeah, we built an open source project called Maverun to tackle these tasks. Essentially the idea is that you create a trust controller. When I say trusted controller, I mean a controller that itself runs inside an SGX enclave. That is there for, first of all, bootstrapping this deployment, and then for you for orchestrating that during the lifetime of this deployment. So essentially you verify the controller, you provide it with a policy called manifest, and then it takes care of these tasks. I've mentioned testing the individual containers, providing them with their identity and configuration and then allows them to build up a microservice architecture. And there, Maverun is not the only solution that's based. There are a lot of proprietary things that do similar things. The cloud provider like Azure has something in place. But yeah, Maverun is probably the, I wouldn't say the only, but the most prominent open source solution for this. All right, what it would look like from a deployment kind of perspective, you would create a Kubernetes cluster. All your nodes need to have SGX capabilities and then you create the device plugin or install device plugin. If you want to install Maverun and go through that procedure and then you apply your application with those edit annotations and then your scaler would put them on the right node and things would roll out from there. That's how the SGX world would look like. Any questions regarding that? Okay. There's a prominent example for this. In Germany we have the electronic health records that's currently being rolled out, currently opt in, potentially opt out end of next year. And the specification for this specifically requires that the operator of that system is excluded from the data itself. And yeah, confidential computing provides promising solution to implement such system. And in fact, there is a production environment for the APAR that is based on SGX. Now, it kind of matches here because new types of application like new implementation anyway and small trusted context and an easy ability to exclude yourself or the DevOps and the administrators from that application. The second approach and as I said, confidential VMs is more of where the future is because it's the better abstraction for really not caring about the confidential computing specific cards when building and packaging your application. And the next approach would be bringing together confidential VMs so MD, SEV, TDX and containers. And the general idea probably the most promising is the idea of using VMs for every container or every pod in your Kubernetes cluster. So that means instead of just creating a container you create a confidential VM when you create your pod. And ideally that would be in a nested way so that you have your note which is a VM and then you can have a nested VM for that container. Nested is a tricky thing and especially nested with confidential VM with confidential VM is a very tricky thing. So this currently is not possible. There's a preview on Azure that allows you to do that but otherwise we're not quite there yet. Instead a pattern emerged where you say I use something like a remote hypervisor where instead of creating that VM locally you create that VM just somewhere else in the cloud and tunnel the traffic through. So that means you don't need nested. You have a workaround around that. The downside is of course you now deal a bit of an overhead. You need to create a separate VM in the cloud for every pod and you need to handle the traffic and the interaction. The concept is based on Cata containers. Some of you might know that creating VMs for pods the idea is essentially for Cata the idea is to protect your infrastructure from the container and turning that concept around is essentially what we have here and there's a CNCF project called confidential containers that implements that. Super promising project. Unfortunately there are still some roadblocks. Some things need to be implemented in terms of making this work. Some prototypes already exist and the main problems that are coming up is the same with SGX. We don't trust the control plane. We don't trust the Qplit. How do we do orchestration? The problems I described, right? Providing a config map to such a confidential container. How does it work with attestation? How do you verify that? With Mabra need to be applied to confidential containers. There was a talk specifically on that here on Tuesday from Magnus. So a lot more of the details about this project, where it is and how it works. I'm not going to go more into the details of confidential containers. If you're interested, please check the recording. Super cool talk. The last approach would be probably the straightforward one with confidential VMs and how we bring that into a cluster by taking what is already a VM and applying the confidential VMs there. That means our Kubernetes nodes. So instead of applying it on a container, we apply it to the entire VM, the entire node. And this can also include the control plane. Then you have to deal with the problem again. But you can also include the control plane and essentially create a confidential context for the entire cluster. And there it shifts a bit, right? Before you could isolate a container or a process inside the container which probably is then the same. But now you isolate the entire cluster. So it's a different threat model. You don't exclude the operator of the cluster anymore, but you isolate the cluster against the infrastructure. So this is the thing that makes sense if you do want to lift and shift and you can't trust the infrastructure, you can't trust the cloud service provider, you want to isolate against that, but you don't want to isolate against your own administrators. So not the electronic health records type of thing, but more this lift and shift thing for regulated industries that want to adopt the cloud or I don't know deploy a cluster for themselves against that. And the cool thing about this is we can use that today, we have the confidential VMs. We don't have the problem of nested virtualization. We have a project called Constellation that implements that pattern that shields the cluster as a whole. And the technical trick here or the challenge is how you can make an entire VM verifiable, attestable, how do you build up the NodeOS image, how do you do the attestation procedure and continue to do that during the lifetime of that cluster. And if you solve that then you can have an isolated cluster. So the workflow for example for Constellation would be it's more or less than like a Kubernetes distribution that is don't really want to make comparisons but essentially if you want to create a Kubernetes cluster in the cloud and don't want to use the managed offering. That's more or less how it feels like. You use the computer resources, the VMs and then you have client-side tooling like that here. You can of course also use infrastructure code like Terraform or so. But essentially what needs to be done is creating that infrastructure, creating that confidential VMs, as a cluster and as I said, then it's isolated as a whole so if you go inside, if you have access to the API server from the inside then it's just Kubernetes. So you don't need to deal with these orchestrational tasks. You can just use Kubernetes as is. Examples for Constellation of course there are a bunch of them it's really hard to pinpoint right, it's any kind of application that runs in Constellation, sorry that runs in Kubernetes and you want to shift that to the cloud because it's a lot of regulated industries. Yeah, it's a lot of this more healthcare, public sector kind of applications like a hospital information system or a collaborative tool like Nexthelter and so forth. But one very cool use case was OCCRP. It's an investigative journalism organization that targets organized crime and corruption and they have very very paranoid for legitimate reasons and want to be very sure that the data they're collecting their sources need to be completely protected and isolated from the infrastructure and they make use of Constellation for running their software in the cloud and there's a case study you can find online that, yeah, super cool use case. Questions on either SJX, Confidential Containers or Constellation. Alright, takeaways. I think important to understand is that confidential computing is not a solve to all your problems solution it adds these features, runtime encryption attestation and isolation and this shifts a bit of the trust model to the cloud. Depending on how you implement it it allows you to exclude a fair amount of chunk of the stack we have in the cloud today so that you can reduce that to trust the hardware you trust the CPU of course and of course you trust the vendor in a sense you do that already if you run on a CPU you trust that it does the computation correctly but we can reduce the trust to that now and of course then we trust the software that runs inside the context and depending on how big the context is you have to trust more or less components that's the idea and yeah, we have the three approaches SJX it's probably a bit of like the legacy of confidential computing but as I said there are still use cases so I thought it would be interesting to also show that confidential containers very promising very much looking forward to what's coming there I would say it's not right yet for production but there's super cool concepts in the pipeline and then the more straightforward approach with the downside of having a different model where you can't exclude yourself but you exclude just the infrastructure then you get real like the closest you can get to lift and shift check out the confidential containers talk from Magnus to learn more about that so now the question is how to get going how can I bring that to use where is it available the important thing is you need the hardware you need the AMD, SMD chip the TDX chip or the SJX chip and your processors where can I get it and I would say most cloud providers have some form of confidential computing offering first of all they have the infrastructure as a service so they have the confidential VMs or the SJX capable machines I think it's fair to say that only IBM and I think Microsoft have the Intel SJX and all others all of them have the confidential VMs this is the focus now AWS has something that's called Nitro Enclave I didn't cover that today to not overwhelm you but AWS now has also SCV capable machines and you can go there and you can create these as infrastructure as service and you can of course use any kind of projects use confidential containers and you can create a bit of prototypes or create a constellation cluster today on any of these and much more just try to put the big ones on here alright that's it if you have any questions I guess we have a bit of time left thank you yes hello thanks for the talk I'll make the same question I made too what about monitoring of station computing and I probably have to give you the same answer it's monitoring is a tricky tricky challenge we need it but it somehow contradicts a bit of these confidential computing principle that your thing is isolated there's no physical law that will prevent to build smart solutions for monitoring these but we need some concepts for that so what you can do today is from the cloud provider's perspective this is infrastructure there's a VM or there's a confidential VM you get some kind of observability from there you can't get the full observability because you can't peek inside so then you need monitoring from the inside for example for constellation you can deploy your usual monitoring stack inside constellation it's a Kubernetes cluster so why wouldn't you and then you need to make smart decision on who has access to that monitoring what kind of data is visible there and what are the endpoints where this is consumed and for confidential containers it's probably similar with the additional challenge that you only trust the container itself and you can't trust the rest of the Kubernetes stack so there's a bit more challenges involved thank you for the talk so what's the largest challenge do you see for the third approach the cluster the largest challenge for the the largest challenge the biggest challenge is that because you exclude the control plane from you include the control plane in that confidential context your service provider like your cloud provider can have access to that because if they have access you break that isolation that means you can't straight forward you can't really implement this as a managed Kubernetes offering this if you go to I don't know Azure or Google today they will offer you a GKE with confidential nodes that doesn't include the control plane and this gives you runtime encryption for those nodes but it doesn't have any kind of isolation or like no real isolation against the cloud provider or the infrastructure because as soon as you get access to the control plane you get access to the cluster and the reason why they don't include the control plane because then they would need to exclude yourself and how can you make it this managed without being excluded and of course the constellation we try to implement a lot of approaches to make this easy, make this feasible but in the end this is the challenge do you guys do any sort of performance testing for example like the SGX the process based confidential computing performance is usually one of the biggest concerns and therefore that's focused for us yeah it's hard to pinpoint and say you get like 5% overhead really hard to say if you ask the hardware vendors Magnus already said that they will tell you something like 2% and then the question is 2% of what is that realistic and is the main overhead so what we observe is the runtime overhead is actually not that high so between 2 and 10% is also what we observe but is that the main source of overhead is more on the IO side probably with SGX the context switches so if you do a lot of IO that's the biggest concern with VMs we see better performance due to this context switch not happening too often and yeah on our get a repository you find benchmarks for example for constellations specifically running applications inside confidential VMs and yeah it varies a lot but usually you're in that I would say around like this 10% mark right roughly for an application but then 10% of what is the runtime overhead or let's say if you have a GitLab or I don't know a chat like a rocket chat how many users you can surf in parallel and then you have like a 10% reduction if you run it on that 10% for like AMD SCV S&P yes that's currently on AMD SCV with SGX you can generally say one is worse than the other but I would say SGX because of these context switches is more prone to overhead thank you any more questions alright thank you very much enjoy