 and share containers distributing your VM host by Christopher. So I'll share my screen, it's recorded. If you have any questions, drop in Q&A section. Is this not audible at all? It's very, very faint. Okay, let me try again. We're going to play out of time as Christopher's using the correct containers to teach the things we can do. We're going to go beyond that, we're going to ask the question, will there be a potential in this download how to add this assistant and workload and how to separate the tenant and host security realms completely? First, let me start with a problem statement. Can we trust the host? The containers run on a host which is often managed by a third party like your cloud provider. The sandboxing provided by the Linux operating system or by virtualization technology goes only one way. It's designed to product the host from containers and not the other way along. So that means that the resources that the container uses like memory, CPU, this networking and so on really belong to the host which owns them and has free and unrestricted access to these resources. Containers are carved out of the host resources and that begs the question, what do you need to do is if as a container owner, you start considering the host as potentially hostile? Now, why would you think that? Well, the truth is that the data in the container can be read by the host. So data exposure of information held in the container is fairly possible. This is why multiple tenants may not want to share the same host because of that risk of data leaks between containers. There may even be legal concerns and consideration that preclude the use of containers if you cannot enforce or guarantee confidentiality, make sure that the data doesn't leak to the host and possibly outside of it. We have now an emerging enabling technology called confidential computing that lets us address these problems. Confidential computing is more than just encryption. It starts with memory encryption which prevents the host from getting anything, any secret out of the container's memory. So if it reads it, it's going to get garbage. But there's more. For instance, there is integrative protection to make sure that the host cannot corrupt the guest state or inject bad data in order to crash or extract data out of the guest. There is also an attestation mechanism that lets the guest owner or tenants validate what runs in the guest and possibly block the execution of images that are known to be malicious or compromised. There are many vendor-specific technologies that enable confidential computing. AMD started the fire with the secure encrypted virtualization technology with two more recent variants called encrypted state that deals with the CPU register file, among other things, and secure nested pages that adds integrative protection for memory, interrupts, and more. Intel offers trusted domain extensions. IBM main friends have secure execution. The Power Platform has a product execution facility and are recently announced confidential computing architecture. All these technologies are based on virtualization. So it could seem like it's easy, but in reality, each of these technologies works in a slightly or more accurately, markedly different way. And so for an integration platform like Kali Containers, there will be zombies trying to integrate all these technologies. Let's talk about the basic architecture of Kali Containers and how we can move from Kali Containers to confidential containers. Here's a quick overview of Kali Containers. As you may know already, Kali Containers is designed to run containers that are described the usual way. In other words, with the same APIs, the same Miano manifest file, the same container image formats, the same volume storage, the same networking. But now we want to run them in virtual machines with their own independent channel and very little in terms of user space to reduce the attack surface. Basically just a Kali agent that starts the container and monitors it. So we are talking about benefiting from the ecosystem of containers with the additional sandboxing provided by virtualization. Kali Containers is made possible by the flexibility and extensibility of the Kubernetes architecture, which has a number of interfaces such as the container runtime interface, the container networking interface, the container storage interface. And we can basically add a plugin that replaces existing runtime such as RunC or C-RUN and is invoked by the container runtime interface to start a virtual machine in which, so the virtual machine represents a Kubernetes part and inside this virtual machine we are going to run containers. So this allows us to at this conceptually separate different trust realms for the platform, the tenant and the host. Let me illustrate on this diagram where the trusted platform is drawn in red. It's a trusted execution environment that offers confidentiality guarantees using hardware level cryptographic enforcement. The host which is drawn in blue on the diagram will offer and manage the physical resources that are used to run the container that includes CPU, disk, memory, networking, and so on. Finally, the tenant security realm that is drawn in green on this diagram include a confidential area that can be carved out of the host that we call the trusted enclave and that is protected by cryptography so that the host cannot see it or access the data in it. But this also includes in that same security realm things that may run outside of the host and that includes, for instance, key brokering services, attestation services, container image download, and so on. In order to enable confidential computing for Kata containers, we need to modify a number of components which are highlighted in red on this diagram. The first one obviously is the Kata runtime which needs to pass the right options to the virtual machine monitor to enable confidential computing. The virtual machine monitor itself, like QMU, for instance, needs to be modified to be able to support and activate encryption, setting up confidential VMs and so on. The kernels, both a host kernel and a guest kernel, also need to have additional support. On the host kernel, for example, to support base tables for encrypted memory and for the guest kernel to offer services that let the user space access any secrets that the trusted platform delivers to the guests. The firmware also may need to be modified with additional services, such as page validation. That's the process of transferring ownership of physical page pages from the host to the guest and back. And the hardware obviously needs support, for instance, in the memory controller to encrypt memory on the fly. As far as Kata containers is concerned, this phase is already largely underway. It's mostly complete except we don't have much hardware to test it with many of the platforms that I mentioned. Now, the next step is to secure the image download. And that's basically making sure that the pool image operation happens from within the guest instead of happening on the host as it does today. So what happens today is that the Kubelet sends a pool image request that is delegated to the image service in the container runtime. And we need to forward that information to the Kata agents so that it can itself put the image instead of the host doing it. Now this highlights a relatively typical API situation for the kind of problems we run into for this project, where the APIs today will send the data to the wrong party or in the wrong security realm. Also notice that for the initial prototyping, we need keys to decrypt the images or to access encrypted disks. We're going to bake them inside the image for prototyping, but of course later we need to find a way to fetch these keys in a way that is scalable in the cloud, we can't have a different image for each container service. So we need to find another mechanism and I'm going to talk about this next. So for that, let's talk about attestation, which is the process that lets the owner, the tenants, know that what is running inside its guest is exactly what is intended to run. So the first step for that is to be able to measure what is running. So there are various services that are provided by the trusted platform at all levels, but typically you have hardware registers in the CPU that can be accessed to measure the memory, specific ranges of memory, et cetera. And what is being measured there is selected depending on the platform and on what you want to do, but you would typically measure the initial boot image that includes the firmware, the guest kernel, the agent, that kind of things. The container is a vying part, so we are going to attest that separately. We'll see later how. There are two really big ways to attest the workload. One is called preattestation, which measures the VM before even starting. So you check the boot image before you even allow the VM to boot. Most platforms now provide remote attestation in which the code in the VM can attest itself, can access the measurements and send that to a remote relying party for attestation. All attestation mechanisms can deliver some kind of data, typically secrets like encryption keys, but remote attestation also makes sure that you can invalidate vulnerable images or that you have final grain control on the kind of information that it's being exchanged. So the attestation process that we plan to implement is the following. The kubelet is going to start to ask for a part to be created. That involves the runtime picking up a boot image for the virtual machine that will then be run by the virtual machine monitor like QMU. One additional step that may happen here is when you're doing preattestation, you're going to attest that boot image and make sure that you do not start the VM unless you know that the boot image is good. Then once the VM is created, there are a number of APIs that are sent to the Kali agents and we are going to have, so this is happening over VSOC, but we'll have to restrict some of these APIs because the workflow is no longer correct for confidential containers. For example, you cannot let the machine decide when to start a container because you want to make sure that the attestation happened before the container really starts. So we need to find a way to receive these start container requests, acknowledge them, but not really start a container until we know that we haven't tested it. So there are many changes inside the Kali agent itself which needs new components for instance to be able to manage keys or manage cryptography, as well as additional processes that will be added in the image, like an attestation agent in order to perform the remote attestation process. Scopeo that is going to deal with container images and Umachi that is going to deal with the expansion of these images locally. Let's start with the attestation agent. So the attestation agent starts by measuring what you want to measure also. As I said, typically the guest firmware, guest channel, and the various components in the boot image. And from this, it can build a quote that is sent to the attestation service. And the attestation service gives a gonna go. If it's okay, it's going to let the broker, the key broker service deliver keys to the guests. That can be used for instance, to decrypt the container image that you fetch from the container image registry. Once you have your container image locally, you can expand it to disk. So initially you are going to use a run disk. So in all cases, the container image has to be at part scope. It cannot be shared across parts. But we are going to expand that container image locally. And since encrypted memory is quite expensive, we are very quickly going to want to have some way to store the container images, either in some fmworld block device, or even in the persistent block device so that we can reuse the expanded image at next boot. So here's the attestation flow. As you can see the pool image process will now be forwarded inside the VM. And Scopeo is going to request a decryption that requires keys. And these keys are going to be returned by the validation that was done by the attestation agent. So now we can decrypt that image and say, okay, we put the image. That pool image will let us then proceed with create container. This is where I was talking about blocking APIs. Create container is going to ask you to unpack the image locally. And then that unpacked image will let us return that we created the container. Now, next let's talk about integrative production. And that's a big change inside kind of containers from hotplugging to what we call now immutable parts. How do we configure a virtual machine as a part today? Well, we have to use hotplugging to add memory, CPU, or devices to the part. And the reason is that the historical Kubernetes APIs are designed for the hosts. And so they do not give us information in a timely fashion for the typical use case for a virtual machine. In particular, when you create the part, you don't have any information about things like container sizes and resources. So that means when we create a container, we need to dynamically add the resources because that's when we receive the request. So we get information, for instance, that this container is going to request two CPUs and four gigs of memory. And we have to hotplug two CPUs and four gigs of memory to the part that we created earlier. So obviously this adds a lot of complexity to the runtime, but it's also inefficient because hotplug takes some time and we need to resolve more resources like more base tables, base table entries that we would need because we need to think that maybe the memory will be extended to inordinate amounts. The conflict with confidential containers is with respect to integrity because it's very hard to guarantee integrity if you can change the configuration of your pod after you have measured it. So that's why memory and hotplugging or balloting mechanisms do conflict with encryption or page validation. There are also problems with any kind of device that can access memory with direct memory access because these devices are not part of this trusted platform. So we cannot grant them access to guest physical memory. And so that means we cannot hotplug such devices after the fact, but also that even if we have them ahead of time, devices such as VGPUs or SmartNIC may not be allowed in the confidential computing until the vendors of these platforms find a way to exchange keys with the trusted platform. So now we are talking about something called immutable pods which are fully defined before boot. So before we boot the virtual machine, we are going to know what is going to run inside. And that requires rather massive changes in the existing Kubernetes APIs because these APIs typically put things in the wrong spot, in the wrong realm from a security standpoint. For instance, logs in the Kubernetes APIs are defined in the create pod API and are designed to be sent to the host. That doesn't work in the confidential container cases because they belong to the tenant security realm. The good news, however, is that this change would really vastly simplify and optimize even the non-confidential case because we can remove all that hotplugging complexity. So it's likely that Kata Containers version three is going to be largely influenced in its design by the requirements of confidential containers. This is the need for a shadow control plane. The tenants now need to be able to access the tenant trusted enclave without going through the host because we don't want the host to see things like logs, container metrics, and so on. The good news is that in order to establish the attestation, we have to establish a secure network RPC channel. So we can leverage that channel to run the other APIs through it. So all APIs and all comments that really talk to the container, to the workload, would now go through these tenant side whereas things that deal with host resources like creating the pod, getting CPUs, memory, and so on, a host metrics would have to go through this side. Obviously, that's a lot of work if we want to make this transparent because, for instance, the user-level comments like Qtctl exec would have to go through this side, but creating a pod would have to go through this side. So Qtctl exec or OC exec would have to select which key to use, which credentials to use dynamically based on the common values. That's really a lot of project. It touches a number of components. So it's going to take a few years. So as you can see, getting towards confidential containers is a very interesting project. We have things that we are doing right now, like enabling the hardware, things that we are going to do in the next three months, like image download, things that we are likely to start doing end of this year or next year, like at the station, and things that will take way longer because it touches components largely outside of the container and that includes completely separating the host and tenant reuse. Very interesting project. I really hope that you will want to join us after this talk. If so, please join the KDAI containers discussion forums. Thank you and talk to you in the question and answer session. Thank you, Kirsavi. Oh, is there any questions when we started? There's one question for you. Yes, so I see one question which is the difference between sandbox containers and confidential containers. Sandbox containers is an OpenShift product that was introduced with OpenShift 4.8 as tech preview, and that's basically KDAI containers for OpenShift. So it includes an operator that lets us install the KDAI containers run time easily on an OpenShift deployment, things like that. So we provide an additional sandbox in the sense that the pod now runs in the virtual machine so there is additional sandboxing. Confidential containers will leverage this technology and when we talk about confidential containers, that's the terminology we use upstream as well in the KDAI community, not just in the OpenShift scenario. And it did specifically with protecting the data, the transient data in your containers from the host. The idea is that any data that you want to protect will be that you would store on disk or that you would send over the network would already be encrypted. So we had an earlier talk that you may have seen about how to use secrets in Putman to make sure that you can store secrets somewhere but not send them over, for instance, a few. So the part that deals with the images themselves and secrets in the images is also being addressed. There is also a huge effort that I sort of rushed over regarding the encryption of the images themselves, so the encrypted image layers. That's an effort that we're going to leverage as well. And so all these parts are sort of being dealt with. The thing that is not yet covered is what you have in your memory and that's what confidential containers will address. So now you can run completely secure. You can run bank account management stuff and any user secret passwords for the bank account, et cetera. If you're running on Azure, on Google Cloud, whatever, any public cloud that would still fly because these platforms could not read the data. Does that answer the question? Yeah, it looks like yes. Any other question for Kristoff? Looks like there is no other question, so we can lay. So that's the last session for today. Yeah, so we're done with the new conference for today. So see you all in the tomorrow morning. Thank you. Thanks a lot and talk to you later. Bye. Bye-bye. Oh, there is another question. So I don't know if you can hear me. Yeah, yeah, yeah. Confirm way is attested computing. So I think you're referring to the attestation process that I'm talking about in the slides. If you could confirm in the chat that this is a question. Yes. So I would say this really depends on the platforms. Free attestation, the way it's exposed by the existing AMD SCV, there's a relatively large amount of hardware that is available today with the original SCV technology. So this exists in various clouds today. And we have demonstrated, for instance, how to do attestation in that context, in the context of Leap Caron. If you're interested with that, there's a QVM forum talk that is going to be given on September 16th that addresses this aspect among other things. So that part is done for SCV because the hardware basically exists today and it's widely available. Remote attestation is much more of a battle because different platforms do it slightly differently and they all have developed attestation servers with some sort of, some way to talk to it. Some of them have tried to leverage existing infrastructure that could be used for other kinds of attestation or Trusted Boot or whatever. And so this is a bit, we are in a reconciliation phase at the moment where the various teams are trying to decide together, okay, what are we going to use for this attestation so that we have a single service that could serve multiple platforms? Ideally, we'd like to have a single attestation server be able to serve a TDX machine as well as an SCV and so we are not there yet. We are still discussing, we're still phototyping that. I suspect that upstream you will be able to start playing with that probably in the coming six months. If you want to feel for it, I think the best option for today is Leap Caron. Does that answer the question? It looks like yes. Any other question for Christophe? Let's wait for a couple of seconds. Three, two. I mean, in case that's a very interesting question. Oh, no, Leap Caron. Let me spell it in the chat. So let me just give you the link to the chat to the KVM forum.