 Okay, hi and welcome everyone. My name is Jens, I work for Red Hat and I prepared this presentation together with my colleague Pradeep Da, you can't be here today. So this talk is about how we can leverage confidential computing technology to enhance the security and the resilience of data and use in the Kubernetes control plane. So before we jump in, I'd just like to get a feeling for the audience. Who here has heard of confidential computing? Can you show your hand? Okay, who has heard of Cata containers? Also quite a few. And who has heard of confidential containers? Still also a few hands. Great. So let's jump in, but let's see why the control plane, what it does and why it needs to be protected in the beginning. So the control plane, you know, I think it's already in the name. It controls your cluster. And if you own it, you're basically root in your cluster. So what does it do? Well, it orchestrates all major cluster activities, like deploying applications, controlling pods and nodes, scaling resources up and down. So really complex tasks. And it also houses sensitive data. It houses secrets that could be passwords or API keys. It has state data, which represents the current state of the cluster, and many more things that need to be protected. It has access control. And it also, the API is the big part of it. It's the primary interface through which users and external systems interact with the cluster. And it receives requests for operations and translates it into internal communication protocols and ensures the correct execution. So it has a really pivotal role, really, and its security is of great importance, of course. So there's a subset of control components that we want to focus on here. And it's those that are end user facing. So we call it request-serving-control-playing components. And these include, of course, the API server, but there's also others like Ingress or OAS. And then there's also Kubernetes distro-specific services. So these components are all critical because they handle requests from users directly. Now, let's take a closer look. The bottom line is we need to protect those ports from unauthorized access and from manipulation so that trust can be established. They are very exposed. That's in the nature of these components. They need to be accessible at all times and listen for incoming requests. So this direct exposure to external entities, users and software makes them basically the first line of defense and also a prime target for potential attackers. It has a lot of complexities. Like it handles a complex array of operations. It has to parse data. It has to interpret protocols. It performs authentication and authorization checks. And the complexity of these tasks can lead to vulnerabilities, especially if there are over-sites in a management. It has a lot of privileged operations. And as I said, it also does authorization. And because of this complexity, there's also a chance of misconfiguration. So these are just some examples. I think we all have seen some examples. And we all know that it's not unlikely that there will be more vulnerabilities. So how do we protect the control plan and clusters in general at the moment? Of course, there's a large number of already existing projects and practices and best practices and methods that we apply. And they are all important and will not change. What we propose here is just something that comes on top of that in addition to existing things. So I just want to add something, defense in depth basically. So now let's take a look at how clusters are used and how it evolves often. So often a journey starts with a single shared cluster. It means shared control plane. And there's many good reasons to do this. There's some resource efficiency, cost saving, a simple administration, fast and easy to provision. And for isolation between tenants, here we're looking at soft multi-tenancy, which is often enough for many use cases. But at some points, more tenants join and we run out of resources or maintenance becomes harder. Trust changes with a growing number of tenants. So you create more clusters. And some reasons why companies do this are for compliance requirements from customers or projects. We need more independence, more flexibility. We need more performance, more resources. We need maybe predictable costs. And maybe also we need a stronger isolation to move towards more hard multi-tenancy. And then often we end up implementing something like managed control planes. So in that case, there are many implementations of this. Garden as one, but also all major public cloud providers that often manage services implement some form of this architecture. And then there's also hyper shift. And today I want to focus on hyper shift. And now you'll say wait, isn't that open shift specific? Yes, it is at the moment. You can get it to run on hyper shift, but it's work in progress to make that more easy. So I will talk about the example of hyper shift today. But what I will show is not only applicable to hyper shift. It's a generic solution or generic techniques that we can apply to all of these different implementations in general to Kubernetes workloads, Kubernetes pods. So it's not specific to hyper shift. I just use it as an example that was easily to use for me. So just a brief introduction to that. This is what a classic cluster looks like and we have control plane nodes and then you have a set of machine tools. That's all in one entity and controlled by a single team. And then hyper shift changes that hyper shift basically decouples the control plane from the data plan from workers and also separates the network's domains and provides a shared interface through which admins and SREs can easily operate like a fleet of clusters. So now the control plane acts and behaves like any other workload. The same which stack that used to monitor, secure, operate your applications can be reused from managing the control plane. There are many more advantages, but I'll stop here. This is not what the main topic is. We've seen transitioning to manage control plane solutions like this can bring efficiency and scalability, but it also introduces additional complexity in terms of security and trust. And even so the fundamental trust relationships and roles they remain consistent across these different setups the ways in which we need to manage and secure these setups evolved. So now we're going to take a closer look at how trust is established and maintained and trust relationships. And also I'll also go into why technical assurance becomes increasingly important in these complex environments. So let's talk about trust. I'm the workload owner. The cloud provider is the infrastructure owner. And I want to use my software in the cloud, but maybe I'm hesitant. Who do I have to trust? There are different groups. It starts already in-house as a whole bunch of teams or groups that I need to trust as admins, as software production teams, as developers and more. Then for cloud providers we have, of course, SREs and all their other personnel and software that they run to maintain their infrastructure and their processes as well. Then there's third parties, other tenants, maybe other container providers, third party software. So in a typical Kubernetes context, the infrastructure provider, like a public cloud provider, is not considered a threat agent. It is a trusted actor of a Kubernetes deployment. But in a confidential computing context, that assumption no longer applies. And the infrastructure provider is a potential threat agent. So confidential computing in general and confidential containers in particular, they try to protect Kubernetes workload, from the workload owners, from the infrastructure provider. The software component that belongs to the infrastructure, and that can also be the control plane, is untrusted. So there's all these different groups. And the question is, do I have to trust them? To what degree? Do they need all this trust in order to perform their job? And often the answer is no. And the same is true for in-house, all the groups that I trust. Do they need this to perform their job? And often the answer is no. And the same is true also for third parties. Do I have to trust them to this degree? Often the answer is no. So if we take the infrastructure provider as a threat agent, then what are the threat vectors? There's container images, and the infrastructure provider could potentially tamper and access my pod container images during storage or pull operations. There's the memory infrastructure provider could manipulate or access or just view the in-use memory of my applications while they're running. And then there's also the data that are stored. The infrastructure provider could also tamper with them, change them, view them. And what we want is to reduce the software and the people and the processes that we have to trust. And this is where confidential computing technology comes into play. It falls down to operational assurance versus technical assurance. Or as I call it here trust-based and procedural assurance versus systemic and cryptographic assurance. So one says we promise not to mess with your data. And the other one says we can't mess with your data. One says we promise not to the other one we can't. And there's a project that has the goal of bringing confidential computing to cloud-native workloads to Kubernetes. And it's called Confidential Containers. It's a sandbox and project in the CNCF. There's a healthy and growing community of contributors from many companies. There's cloud providers, ISVs, SaaS companies, many hardware vendors, and of course companies like Red Hat that contribute. So what does Confidential Container do exactly? What does it add? It reduces the trusted computing base. So it's based on virtualization technology and carder containers. Remember carder containers? You run your container wrapped inside a lightweight VM and from the user's point of view it looks and feels like a normal pod. So Confidential Containers combines carder containers and TEs trusted execution environments. It means the VM that is created by Cata is a confidential VM. And that means it makes user of the TE and its capabilities. Guest memory is encrypted and it allows for remote attestation. And without it, workloads were isolated from each other on the same host. And also my host was better isolated from malicious workloads. So when you break out of a container you're still contained inside a VM and not on the host. CoCo adds confidentiality. It adds protection for your workload from the host and the infrastructure and organization that operates it. So going back to our threat vectors, we talked about it. These are the threat vectors that the CoCo project tries to address. So what are the mitigations? For container images we shift the control of the container images away from the infrastructure owner. These images they must either be signed or encrypted to ensure the infrastructure owner cannot manage, store or pull these images. Only the workload itself should be capable of doing this, of pulling, decrypting, verifying and possibly storing them. So storing encrypted and signed images in confidential memory ensures that they remain inaccessible to the infrastructure provider. And by this it's mitigating threat vector number one. Then there's memory, exclusive use of confidential encrypted memory. So the CoCo model mandates that workloads run only in encrypted memory. And this offers hardware level assurance that the infrastructure provider cannot temper with the memory while it's in use. By this addressing threat vector number two. And then there's storage. Confidential containers ensure Kubernetes pod volumes are encrypted and maintain integrity. They will not allow to unprotected volumes into be attestation-intected secrets or they will not be mounted or created by confidential containers. So this safeguard basically ensures that we mitigate threat vector number three. Okay, let's bring it together. In the first part of the talk we talked about managed control planes and we talked about the control plane itself. Why it needs to be protected. Now then we talked about confidential containers and confidential computing. So now let's try and bring it together. The scenario is now organizations are like increasingly adopting these models where multiple Kubernetes control planes are hosted on a single centralized management cluster. And each tenant's control plane is isolated within its own namespace. So while the actual workload they run on separate node pools. But in addition, cloud tenants are worried about privilege misuse and malicious insiders exploits, things like this. What are the risks? We've seen at managed control planes they bring efficiency and scalability but they also bring risks. There's still cross tenant vulnerabilities with namespaces. There's still a potential threat that one tenant might access or impact on others resources. There's still the problem of excessive resource usage which will affect other tenants. So what do I want as a tenant? I want full control of my environment. No matter how shared it is, I want to feel like I'm in a private setting. And of course I want to ensure that my control plane is the control plane that I intend to run. And I want it to be security checked and it's integrity verified. So what can we do? Let's drill down into the scenario here a little bit. I have a few also control planes with the consisting of the known services running as pods in a namespace. And we have a set of node pools where our workloads run. But the node pools they are not interesting for example here. So we concentrate just on the control plane. And we pick a component and run it as a confidential container. So for that we need of course a hyper shift setup. We need to install the operator and create the hosted clusters which create the control planes. And for confidential containers we need the operator. To install the operator it will deploy the runtime that we need to run Cutter container and the configuration that turns it into confidential containers. So when we have the setup we modify the API server as an example. And literally all we need to do here is to add a few lines to the deployment specification. We modify the pod template because we need to add a runtime class. And this runtime class was created when we installed the confidential containers operator. So this will guarantee that my API server will not run as a normal run C based container. It will use the Cutter container runtime and the configuration that makes it use a confidential VM. By the hardware T instead of a normal VM. So memory encryption guarantees the confidentiality and nobody can dump my guest memory and look at it. But it does not guarantee the integrity of my API server. I want to know that it runs in a secure enclave that's one. And on top I want to know that it's exactly the API server unchanged and not tempered with that I need to run. So how would that work? This is where the attestation part comes in. So there are different ways to achieve this. One is that we store a signature in the KBS component which is controlled by the user, not the cloud provider or an admin. It's by me. And I configured confidential containers in a way that it connects to my KBS during the attestation process. So we use the KBS and attestation service to run the attestation process for my API server. Only after successful attestation, that is we verified the things that I just mentioned, it will start up. If this attestation process fails, my container will not even start up. If it fails, it won't even run. And this is how I know that something is not right. So if attestation was successful and my components have started, they will run. And they will run in the secure enclave in my confidential container backed by the hardware, CE, where memory is encrypted. So this is the full confidential container attestation workflow. I will not go into full detail here because we still have one to show a demo, short demo. But if you have questions around this, I have lots of additional information in the slides. We have a whole series of blocks around how this works, especially for confidential containers and how we got here. You can find all of this information in the slides. So now it's time for a demo. So I have my setup installed. I have created a hosted cluster with my Hapishit setup. And in it, I see the control plane components running as pods. So here we see a list of all of them. And we find the API server. Here it is. And now I'm going to show the runtime classes. You can't see it. So there's two runtime classes. There's Cata and Cata Remote. And now I open the deployment and I go to the pod template and I add the runtime class. So here you can see we're adding a runtime class name with Cata Remote CC for confidential containers instead of a normal Cata container. And I close it and my API server restarts as a confidential container. So this is unfortunate. Here I show all the pods that used the runtime class name. And you can see that it's the API server and the rest is the runtime class name. So now I mentioned the KBS that I need to run before. That does part of the attestation. This I have running somewhere else that's on my control. This is the pod. I deployed this by another operator. It's running here. And now I need to stop, I think. My API server has already restarted and it has already gone through the attestation process. So now we'll show just a look at the logs and see what happened. So this is basically what happens here. Attestation means it sends a request first, then it gets a challenge as a response. And then the attestation happens on the node where my container runs. Basically the attestation agent requests the evidence from the hardware by performing, executing this command and requesting the evidence. And the evidence basically states that it's running in a secure enclave in a confidential VM. And I can see here that it returns a response, including the evidence and a key that's signed. And here I see that it returns a key. So in this case my container was not just signed, it was encrypted and I got the decryption key to decrypt the container image before it starts running. And I think this is the end of my demo. I've seen that already. So future work, of course this was just a proof of concept, a demo. This needs to be integrated into hypershift so that when I deploy my setup I can configure it in a way, maybe I can tell which components I want to run as confidential containers. So basically it needs integration with the tooling. What I showed in the demo was you wouldn't do that eventually. We would integrate it into the hypershift tooling so that you can configure it when you create your hosted control planes. What components do you want to run? Is confidential containers and where would it connect to and things like this? So that's still work to be done. And also for the Cocoa project in general, there's also lots of interesting open problems to work on. I've listed also some of the slides, some of the more interesting ones. But in general, if you're interested in a space, this is a very open and welcoming community. I've also added links to our community meeting or come talk to me after the talk. We're looking for more people and contributors. So yeah, these are some of the things that I wanted to share. I'll just leave them in the slides. I just wanted to mention this is our GitHub. We have a Slack channel and CNCF. We have a new release coming out soon. And we have a weekly meeting where we discuss technical topics and release topics, but usually we have every week one interesting technical presentation from someone that is a specialist and one of the fields. And then I wanted to make you aware of our blogs that we published. We published a whole series of blogs, not just about attestation, but also introductory blogs, documenting use cases and demos that we did. We've also shown some demos at the booths, at the Reddit booths here. I can also demonstrate them for you. If you approach me, we've shown, for example, workloads that protect an AI model in a public cloud using an Azure in this example. One was in TDX hardware and the other demo that we did was together with Startup and Crypt AI. They offered protection for AI models, but they used homomorphic encryption and we've shown a demo where they add confidential containers to the mix to protect a part of their setup. Also there's some other relevant talks if you're interested in this topic. They'll go into more detail, into more technical detail of some parts of this. There was one from last KubeCon, from Jeremy Piotrowski. It goes more into the attestation details, how that works more in detail from the kernel side of things, from the hardware side of things. There's also more an introduction talk on how to deploy confidential containers by Fabiano Fiancio from the Cata project and myself. And then there's also previous talks that were interesting. The last one, Five Week Problems was from the last KVM forum where Christoph talks about the current five biggest problems that we're still trying to address. At the end of my talk, if you have questions, thank you. If there are questions, please come up to the mic so that it will be under recording and I can hear you. Hey, so in a cloud-hosted control plane, you're trying to basically verify that the API server and stuff are as you expect. How can you trust that the cloud provider has installed the right version of confidential containers and the runtime and all of that in order to even have it validate when you actually go to relaunch the API server? Where do you start this whole process? So that's part of the co-co-modelist that you don't trust all the components that are outside of the virtual machine. It's part of the concept that you don't, whatever they install, if they installed something that was tempered with, it will most likely, your container will not start up because if they fake the evidence or if they temper something in between, the KPS part is in your control, so where you check the evidence and basically verify it. You could decide to also let an attestation service do that, but the point is that part is on your control. That attestation is coming from the runtime back to the KDS, the KBS? So the attestation evidence basically comes from the hardware where your container is running. So once you send a request to the KBS and it's authenticated itself and it sends a challenge with the nonce and then a component that runs on the node as part of the VM where your container runs, that's basically just collecting the evidence by requesting it from the hardware. And then to run that broker service somewhere, in order to get this whole thing started, would you want to run that somewhere else, like locally or something in order to get an environment that you can have all the validation in and then move the broker service to that environment and then kind of propagate out that way? Yes, that's a very valid deployment model. And you can take this wherever you want, how much caution you need or what your use case is. Depending on how paranoid you are, you can run more or less on yourself. You can also use cloud provider attestation services if you want, if that's enough for you basically. And they can also run their services in confidential VMs, for example. Thank you. Any other questions? Thank you very much.