 Why do we need to protect the model and the data? Of course, we know that. So what happens if the model gets stolen? Model can get stolen, right? So you lose the intellectual property that comes with the model. Think of another story where the input that you interact with the model, right? Let's say we are talking about a health chatbot where you are kind of sending inputs which are confidential in nature, your health related queries that you are sending to the bot, to the chatbot, and then it returns what if that data gets lost? Which means there is a reputational damage to your, because if that data gets lost and also loss of users trust, how the users will trust if you cannot safeguard the data, right? And not the least that, of course, your competitive advantage is also lost. So model protection, and of course the data protection is very important. Now the question is, how do you protect the model and data? Of course, what we do today, right? Currently we have various ways. So if we consider security as a layered onion, then what happens? You have APS, so we are considering APIs and the models and the key thing is the model. So you can encrypt the model at rest. And of course when the data that comes to the model, it's all encrypted in transit. So you use encryption to protect the data that you already use. You also have role-based access controls, right? To kind of protect it. Then you also have API security in place. So these are the usual stuff. You will have network policies, firewalls, segmentations and all, and of course, auditing and logging. Now in this whole thing, do you see anything missing? So this, I believe, if you are familiar in the sense of in the infrastructure side and particularly, these are common, right? These are table stakes today. Without this anyway, we won't run a real application in production. But do you see anything missing from this? Okay, no, sir. This was just a leading question. Now what I will do is, I will show you a small demo, right? Which is, so this is basically a demo which we are running on a single node Kubernetes cluster for ease of use. So for a moment, so for a moment, you wear the hat of an infrastructure admin who has access to the Kubernetes worker nodes. So I would want you to imagine that you can access the Kubernetes worker node. And this is a very simple demo with what it does. It just creates a simple port which downloads a secret and keeps it in the memory. That's all. It's a very simple program. So let's get started, as you see. This is a single node Kubernetes cluster and it downloads the secret. So this we run it as part of a pod, right? So this is the pod and you see what it does. It just do a curl, get the secret and keeps it and then sleeps. So yes, the pod is running. Now, since we are using the same node, now this is where you are an admin who has access to the node. What I'm going to do as an admin is search the process which is sleeping. So we'll see that. So I got the PID and once I get the PID, I will take a memory dump of this process. I took a memory dump. Now, interesting thing is I can grab for the secret in the memory dump. So the secret is available, right? Now, for a moment, assume this is your model, right? So this is your model where you have spent your money to train the model on your custom data. And if there is a malicious privileged admin, they have access to your model weights. Imagine if this is your input data, which is being used and get the input data from a memory dump. That's what, so in spite of all the securities that we have, there is still an attack vector which can be exploited by the admins. This is what we are talking about. Now, what's the solution for this, right? So the solution is confidential compute. Now, what does confidential compute bring in? We'll take a look at it. So before we talk about what is confidential computing, let's take a step back and look at what's the different stages of the data. Because here what we are talking about is all data. Even your model is a data, right? So what are the different stages that the data goes through? So you have data at rest, of course, right? So you have data at rest. Then you have data in transit. And then you have data that is in use. Okay, so maybe I'm too far, that's why it's not working. So okay, so what we are talking about here is the three stages of the data at rest, data in transit, data in use. I think all of you are familiar with how to protect data at rest. It's encryption. How do you protect data in transit? Again, encryption, we have PLS encryption. Now the key thing is how do you protect the data in use? Because now kind of go back to the previous example, why the admin was able to see the kind of data? Because the data that is in use, which is in memory, it's not protected, it's in plain text. That's why the admin was able to see. Now what confidential computing brings to the table is it brings memory encryption, which means it brings encryption for data in use. And that's the beauty of confidential computing. That's what it brings to the table. It completes encryption for all stages of the data life cycle, right? Now, little bit detail on the confidential computing. It's a processor technology. It brings, or it gives you a trusted execution environment, right? If it gives you a trusted, I will talk about what exactly is the trusted execution environment. And then importantly is you can verify remotely the authenticity and trustworthiness of this trusted execution environment. And this trusted execution environment provides memory encryption, runtime memory encryption to be clear. Now, simply put that at the heart of confidential compute is your trusted execution environment. So whatever data that you want to protect, if it is inside the TE memory, it cannot be accessed or it, the plain text cannot be accessed by any privileged entities that is outside of the team. That's the beauty of confidential computing. Repeating again, the data that needs to be protected. So, but the secret that we saw, if that secret is inside this TE memory, then no entities outside can see the plain text. That's what TE brings in. So the heart of confidential computing is the trusted execution environment. And there are two types of TEs, which is VM-based TEs. So the TE is a full VM. And then there's another type, which is TE is a process. So this picture, what I'm trying to do is to see which all entities has access to your data. So the first one is your general environment without confidential computing. Without confidential computing, if you see all the entities, your infrastructure admin, of course, CPU, BIOS firmware, Post OS cluster admin, all have access to the data. And we saw in that demo, anyone having access to the worker node takes a memory dump, gets the data. Now with confidential computing and VM-TEs who has access to the data CPU, so still you need to trust the CPU. So CPU is still trusted, but there's none of them are trusted. None of the other entities in white, like the OS, the hypervisor, the infrastructure admins, don't have access to the data. Basically the data, which is in memory of the VM-TE. The other option or the other TE type is the process-based TE type, where it's much more granular. It is at the process level, right? So here we are not kind of going into the details of the pros and cons of the two approaches, but just for you to understand what confidential computing brings to the table. The key takeaways, the data that you want to protect from privileged entities, if you are able to put that data inside the TE, you gain that protection. So that's kind of the takeaway, right? And in the subsequent section, our focus will be predominantly on the VM-based TEs. One of the primary reasons for that is the VM-based TEs allows for lift and shift. So you can get your existing application, run it inside a VM-based TEs, and you can reap the benefits of confidential computing. All right, so with this now, so what's the message, one of the key things, or how do you protect, as I said, as long as you can get the data inside the TE, you are protected. So then our problem statement becomes how can I run my workload inside the TE, right? So that's the whole point. So if I'm running my AI application, the inference in Kubernetes cluster, my main priority to reap the benefit of data-in-use protection is how do I get that workload inside, or how do I get the Kubernetes workload inside the TE? Now here we have like two approaches. Either you can run the entire Kubernetes worker node inside a TE, and if you Google for it, it is most commonly known as confidential clusters, so you do have various solutions around it, various vendors are providing that. The other approach is you just put the Kubernetes pod inside the TE, and that's what we call as the confidential containers, right? So it's at the pod level, and one key thing to remember in these two approaches is about the trust model, and to make it simple, who has access to the data? So when we talk about your confidential cluster, your Kubernetes cluster admin can still access the data. So that Kubernetes cluster admin is still trusted, right? In confidential containers, even the Kubernetes cluster admin is untrusted. So the only trusted entity is the user who is deploying the pod, which means, let's say if I am providing an AI model and I am actually exposing that as a chatbot, I am the only trusted user, no one else. And this is enabled by the CNCF Sandbox project, confidential containers. So in the subsequent section, so now we just added it recently based on the announcement that happened for the cloud native AI white paper, so there also for additional trusted security is recommended, right? And the whole thing is that if you want to reap the benefit of memory encryption, you need T and you need a way to deploy the workload inside a T. So in the subsequent section, I hand it over to Suraj, who will take you through how we are enabling the confidential containers and then of course how it is integrated with an inference runtime, case of inference runtime. What user? Thank you, Pradeep sir. So let's talk about confidential containers, right? So before we go into like how it is set up and everything, let's quickly do a overview of Kubernetes. I hope everybody understands this. It's there is control plane, there are worker nodes, worker nodes have QBlit and everything, right? So what in this talk, what we'll do is zoom into the node, worker node. So imagine you have a regular hardware and then it's running like there is QBlit, of course. The general interaction is QBlit gets a request to start a pod which talks to container D, which then talks to run C and a pod is created. That's, I think everybody understands this part. So for confidential containers, what we use is something called as Cata containers as a runtime. How many of you have heard of Cata containers? Okay, so to give a two liner information about what Cata does is, so instead of run C, which uses Linux kernel based technologies to start a container, Cata creates a lightweight VM and starts containers inside that VM. So you need to have like a hardware enabled, virtualization enabled hardware for worker nodes. So it's the same again, QBlit gets a request which it passes to container D and then it talks to Cata runtime. So Cata runtime is basically a replacement for a run C here. Cata runtime is the CRI enabled. So it knows how to get requests from or understand requests from container D. And then on the other side, it knows how to talk to virtualization softwares, like KVM in this case. It's the same again, virtual machine is created in this case, virtual machine boots up and the first process that starts in this virtual machine is called Cata agent. And then Cata runtime and Cata agent communicate on like how to start a real pod. And then the image pool happens and the Kubernetes pod comes up. So that's the general interaction, right? Now what confidential containers has done is they have extended the Cata project so that it can start confidential VMs. Again, so let's see Cata CC. So this time we need a hardware that has support for confidential compute like AMDs, SCVSNP, Oriental CDX. So again, QBlit, container D, Cata runtime, KVM starts, virtual machine boots up. There is Cata agent, but we also have something called as confidential data hub and attestation agent. So like Previta mentioned earlier, right? So attestation is a really important part of this whole process. Because just starting a VM with encrypted memory, like you could be fooled by the underlying infrastructure provider. That's not what we want. We want to really ensure we are, this is for really parallel users who don't trust anything outside that VM, that TE that we're talking about here. So the attestation happens and this relying party is something that you have on your side, your secure side, trusted side. Which knows how to read this evidence and verify, oh, this evidence was really signed by the AMD hardware or the Intel hardware. So this is how you ensure you are really in a safe environment. And that's after that, what you do after you ensure you are in a safe environment is up to you. You can release a key, you can release a secret, you can just be, okay, this is a trusted thing and go ahead. So in this case, we are releasing a key. We are releasing a key for an encrypted container image. So the container image is pulled, the keys used to decrypt the container image and the podstops. So in this way what we ensure that if attestation fails, we are not downloading anything, even if it's downloaded for that matter. But since the image is encrypted, you're fine because the key won't be released until attestation passes. So quickly let's look at the CocoaThread model or the confidential containers, we call it Cocoa in short. So it promises confidentiality and integrity from the CIA tried, like from the traditional security when they try to assess anything that look at the CIA tried. So confidentiality is guaranteed because memory is encrypted and integrity is guaranteed because you're only pulling stuff after ensuring everything is fine. During attestation, you can test stuff like this is the kernel that I wanted, this is the innate RDI I wanted, the kernel parameters were right, like you can ensure the whole world inside the T is fine and then attestation passes. And everything outside trusted, like I said, everything outside this VM, this PE is untrusted, the worker node is untrusted. That's the CocoaThread model that we follow. So like we saw before, we saw a demo about an encrypted memory. Let's see what the encrypted memory looks like. So again, we have the same VM that we saw earlier. And since this is a single node cluster, we are ensuring that it has this kernel model loaded, the SEV SNP kernel model. And to get started with confidential containers, we install the confidential containers operator. The operator exposes runtime classes. So Cata also exposes runtime class, that's how you can boot up Cata pods or, yeah. And this is the same thing from before. The thing that we have changed here is the Cata QMU, right? So the same thing happens, we are pulling in the, we are getting the secret and then storing it in an even more. The pod has started. It's a regular Cata, it's not confidential yet. So it shows that the runtime class is there. We downloaded the secret, now it's sleeping. This time we did not griff for sleep because we griffed for QMU because the VM is running inside, the pod is running inside this QMU VM. We get the PID again and then do a core dump again. And then once we get that core file, we look at the ASCII representation of the secret and yeah, you can see. Even with Cata VM, you can see, yeah. This thing, sorry about that. So, yeah, moving on. So for this talk, what we have chosen is, we have chosen this one model serving platform or inferencing platform on Kubernetes called K-Serve. And yeah, let's do a quick primer of what K-Serve is. So it's an inferencing platform. It can host various forms of models whether you are trained using TensorFlow or PyTorch or all of these lists. And yeah, you can use regular CRs and CRDs and it has all the controllers and you get consistent, you can run it across various Kubernetes deployments and it gets all the Kubernetes benefits like HAA, bin packing, auto scaling, all of that. And the attack vectors that K-Serve are, like there could be a data or model poisoning, somebody does privacy breach, like they see what inputs and outputs are, that they steal the model itself or there is denial of service. But what we are focusing today is just model theft, right? So let's see how these two can work together. So this time again, we are seeing that it's running in a confidential container system. Like we have this operator deployed and we deploy, the K-Serve controller is also deployed already with a specific model. We will deploy a specific model now. We have runtime class created. So this time we are using Cata remote runtime class. So in the Cata confidential containers, we support multiple ways of deploying stuff. The specific thing that we are using here is called Cata remote where it creates a peer VM. It's outside the scope of this talk, but understand it's confidential VM. And in this example that you see, that we have to specify the Cata remote and we have this storage URI. So for K-Serve, you can specify where to pull your model from. In this case, we are using a container image to ship a model. This is this new feature, I think Roland implemented it for the case of community. It's called model car, where you use the model to ship your, you use the container image to ship the model. And the good thing about this particular image is it's encrypted, like we saw before. Let's deploy this. So while this deployment happened, we'll take a closer look at what this image looks like. So we'll use Scopio to do the inspection of this particular image. And yeah. So Scopio gives you like information about each layer. And since we have encrypted using the Coco's encryption stack, it shows you that there is attestation agents information. And if we also look at like one of these layers, you see it's base 64 encoder and the base 64 encoding looks like this. So this is where there is information about where to get the key from. So when it talks to the relying party that we saw on the right side before. So it knows like this is where the key lives on the key broker service. And once it gets the key, it can decrypt the container image. And if we go and pull this image today, you'll see error something like invalid tar header because it's encrypted, right? Even if you use some other method of getting that image, it's an encrypted blog. And the relying party that we showed earlier on the right side, currently we have deployed it on the same cluster. You should ideally be deploying it in some in production scenarios, you would deploy in a trusted setup. Here we are showing like, okay, this has like all like how the attestation happened here and like how the keys were, how the keys were queried and stuff like that. So moving ahead, you can see like the port that we deployed like a couple of minutes earlier, it's deployed now. And if we try to get information about its runtime class, it is deployed in that particular VM. So this is all case of way of exporting like ENV wars and everything. This is from the case of documentation. There's nothing new. This is the input that will provide to the model. And then, yeah, so this is, now we are querying the model with the input. And in the end, we get this prediction, which is one one, which is what we're expecting. So yeah, moving on. So case of and confidential containers, what we have done is what, or what we are doing is we are running those parts of the case of that will expose the model or will host the model inside the confidential containers. That's why you see like only parts of it. You don't want to run everything in confidential containers if it's not needed, right? Or on the, if we look at the data plane, we are running like the predicted side of it. So moving on, like what benefits do you see? Or what benefits Coco provides to case of, right? Of course, like memory encryption, we have been harping on it for a while. So that's the obvious one. Protection from the infrastructure providers, query input output is also protected because it's going to an encrypted place. And then yeah, there are a bunch of other use cases as well. Like it's not just case of here that we are case of was just one example that we picked, but the other ones could be like multiple parties coming together and trying to do computation on same thing, same data. So that you are ensuring that other parties are not sealing the other parties data. So and regulatory compliance, right? Like in here in Europe, especially where data protection laws are pretty stringent, this is becoming more and more into the law that when you're processing some IP, you have to have it encrypted when it's being used. And yeah, I think we talked about the PII data. So further work around this would be using confidential GPUs like even in training because if you are training on a lot of sensitive data, you'd want the GPUs to be also confidential. And today NVIDIA's the H100s or all the X-Series support confidential memory. So once this becomes available generally, we could start using that. And then in the case outside, we need support for runtime class name so that we can use the model mesh. So what are the takeaways, right? So I think like even if you forget everything from today's talk, remember that confidential containers provides data protection when it's being used. Cocoa can help in every aspect, wherever you think that when it's in memory, somebody can access it, it's protected. And then our models are being platforms from like case or anything can benefit from Cocoa. And then not everything needs to run inside Cocoa. You decide where your data is getting exposed to the underlying hardware or underlying hardware owners. That's the only thing that should run in confidential containers. So yeah, this is confidential containers project. If you would like to get involved, that's the QR code there. And I think we can take questions now. Yeah, we can take questions. One key thing to notice, what we are trying to do as part of confidential containers project is to focus on usability. As you saw in the demo, all that is needed once you have the infrastructure set up in place, all that is needed to convert a regular pod into a confidential pod is to use the right runtime class, which provides the necessary infrastructure to run that. So that's kind of our goal. Yeah, I think you have other sessions also in the same area. I have one more session in the evening, but it's around Spire. If you are interested, please come see that. And yeah, questions. Yeah, you can shout out or you can come to the mic in front here. Hello, thank you for the presentation. My question is about the impact on the performances. Do you have any metrics? I can take that. So sometime back AMD released performance benchmarks for running a set of macro and micro benchmarks on confidential instances versus regular instances. So typically for many of the micro benchmarks, the performance hit was less than 4%. So there and even the JVM performance benchmark there and that was also Monte Carlo simulation also was done, it was less than 4%. The spec core benchmark, which is the core CPU benchmark, that was 8% performance. So that's a public documentation. So if you look for AMD SEV SNMP performance, so you will get those data. My question was the same. And is that documentation in the confidential container site? The performance metrics that you just stated? No, I don't think so it is there, but that's a good suggestion. Maybe we can add that in the website as well. That would be fantastic. Thank you. Yeah, thank you. My question is quite a lot simpler. In terms of on-prem and in the cloud, what are the requirements from an infrastructure because not all chips are born the same? So that was my question. Yeah, that's also another thing so far. See the core hardware requirement for confidential computing is having CPUs which supports confidential computing. So for example Intel TDX or AMD SEV SNMP big processor. Now hardware requirement for on-prem is to have bare metal systems which supports these processor types. So that's the minimum hardware requirement for your on-prem. So you need bare metal servers with the right processor supporting this capability. Could you do? I can add to that like even the, all the Epic series from AMD, they have confidential compute capability there. And since this is coming from hardware, it is enabled on the kernel, the virtualization software. So it's like right from the hardware stack up until where we see this application. I just checked your website from the adopter side because I was curious. Looks like Microsoft are doing it already in preview from what I've seen. And apparently AWS also support it. But I'm just wondering, do you need bare metal instances of AWS instances and Azure instances to use this? Or can you just use a normal VM? Okay, so when we are using confidential compute in public cloud today, we can use the confidential VM instances, right? And the demo that Suraj showed with KSERF that actually used confidential VMs in Azure. So we can do that. And when you are doing bare metal, or when you're doing on-prem, there we need bare metal. Okay. One last question. Can you use trusted zones in ARM? Not yet. No, yeah. It's all right. It's fine. Thank you. All right. I think we are out of time. If you have more questions, you can find us. Yeah, we are here. So if you have more questions, we'll be happy to take it. Thank you, folks. Thanks for attending.