 All right. Good morning, everybody. My name is Mike Gwitthrow. I'm a product manager on the Azure Community Service. Mark. My name is Amar Gowra. I'm a principal product manager for Azure Conference of Computing. Nice to see you all. Yeah. And as I said yesterday in the keynote, if you didn't see it, he and I are working together to bring confidential computing capabilities into Azure Community Service. So yesterday, we kind of gave a quick little insight into it. And so the purpose of this one is to kind of go a lot deeper and do exactly how we did it and how you can use it from a capabilities point of view. So really kind of get into the zero-trust architecture for containers with Cata and confidential computing. So first and foremost, we'll kind of go through the principles. Probably a lot of you probably know this, but we'll just kind of highlight levels at the room. Zero-trust architecture. Go over that. Walk through what pod sandboxing is on AKS with Cata, what that actually looks like. Then we'll go to the next level of that is Cata and confidential containers with AKS. And we'll go through some scenarios and demos and make it real kind of capabilities for everybody. So what are the guiding principles of zero-trust? Pretty simple. Verify explicitly, use least-privileged access, and always assume breach. What are the goals that we're trying to achieve as we start looking at bringing this stuff into AKS? Minimize the trust assumptions. Always implement strict access controls, leverage continuous monitoring and verification capabilities, leverage micro-segmentation, implement secure remote access, and provide visibility and quick actions in case something happens. So what zero-trust is and what it is not. So an approach to security, which treats every access attempt as if it's originating from an untrusted network. Trust no single source, standards equal security, and assume breach. We'll hit that data point quite a few times. What it is not. It is not literal. It's not an adjective. It's not for sale. It's not instant, and it's not a revolution. And now we can bring in those principles again. Verify, use least-privileged access, and assume breach. So what is kind of the architecture that we're kind of looking at from a capabilities point of view? So obviously we've got identities and endpoints tying into a zero-trust policy which brings in policy optimization and threat protection capabilities integrated with the network to go across the data construct, the app construct, and of course the base infrastructure that's running those components. And tie all that in with monitoring and analytics capabilities that are built natively into the platform. Why pod sandboxing? So as I kind of alluded to yesterday, there's a couple of key data points from our perspective on the ACAS side. If you haven't used ACAS or any of the CSPs from a managed Kubernetes perspective, if you knew Kubernetes from a raw perspective, the security boundary itself is technically the cluster. And so we find in a lot of cases customers are deploying a single application in order to meet their security constructs which leads to a very lightly used cluster where they might need three, six, nine nodes to run one pod as an example. So customers are saying, how can I get a lot denser and not compromise my security controls? So this is the reason we started looking at CADDA from a capabilities point of view so we can start changing the conversation. Obviously as I kind of alluded to before, CADDA doesn't answer all the multi-tenancy capabilities but it is definitely the foundation and the groundwork for how we can start to implement true multi-tenancy capabilities more friendly and a hostile-based perspective or our customer set. Essentially from our perspective, it's just native CADDA that we're running. You see all the CADDA 2.0, 3.0 stuff that's coming. That'll be upstream for us. We'll get natively brought into the platform. No code changes are required straightforward. It's just a runtime class you specify in your application YAML. Deploy it right in the cluster. You have the runtime class. You're in a nested VM running CADDA. You don't specify that. You're on the shared kernel with everybody else. Fully deployed across all of Azure. So all of our 64 regions, it is all there. It is based off of our Azure Linux distro, which is our in-house distro. You won't be leveraging a boot to or Windows for this. That might come down the road, but we basically built the nested VM capability inside of our Azure Linux distro. As I alluded to, this is built directly straight off of the CADDA OSS capability. On Azure, we have a fundamental capability where we don't fork or proxy any of the APIs. We leverage it natively. We are starting to work talking with the CADDA team. We're going to start dropping development capabilities into there as well. If we need something, we're basically building it in and kind of grow the construct from there. So let's get into what this looks like. Let's minimize this. There we go. So demo gods are having fun. Hold on. All right, there we go. Good. All right, so just to start out, if you haven't seen this before, hopefully everybody here has deployed AKS. They haven't. This is essentially our UX in the portal. I've just done a simple AZ-AKS create with OSQ Azure Linux, which makes sure that the underlying virtual machine scale set is running Azure Linux nodes underneath the hood. But it's just a simple 125 base cluster that exists out there. Now, as it kind of go through, just kind of walk through this setup. So the first thing I want to do is obviously authenticate to my cluster. Let me get out of caps. All right, get dash credentials, dash in. So this is demo Cata. If I can type, trying to type sideways, dash G. OK, so now... Oh, wait. Is it not pulling my resource group? Oh, I got to change subscriptions. Yeah, let that pick up. There we go. I can't type. OK, there we go. Account set. Dash, dash subscription. Now that's working. There we go. So now I can run my get credentials. There we go. So now I'm essentially into the cluster. OK. So now the next thing that we kind of want to show is that hopefully this is coming in pretty well. I have two pod specs that are kind of dealing there. Pretty simply, one's trusted and one's not, right? And so you'll actually see in the bottom construct down below, up above is the trusted one. You see it's a pretty simple pod spec, nothing there. But down below, when you look, you'll see that runtime class specified. For us, it's just Cata, MSHV, because obviously it's Cata implemented with the Microsoft hypervisor capability and VM isolation, which is the construct that this capability provides. That's all you're specifying. And now you simply deploy your applications. Right, so if I walk through this deployment, drop this directly into the Cloud Show. Why is it not pasting? So now if this will let me work, it is not pasting. Why is it not pasting? There we go. All right, so there is essentially the trusted pod. The key thing there is that this is obviously just deploying, no change, just deploying on the normal virtual machine skill set kernel. So now as I go for the other one, obviously the key point here is that I've got that runtime class specified. So what this is going to do is actually drop that directly into the nested VM on that virtual machine skill set. I knew it was going to dump on me. Almost there. What did it not like? No mouse either. All right, so once we have this in there, there. So now essentially we've got that untrusted pod created. So now how do we actually know that it's in the right set? So if I do a kube cuddle, get nodes dash o wide. Oh, I know I missed the L. So one, you can kind of see what's constructed underneath the hoods, right? So I can see I've got three nodes in a node pool. You can see CBL, Mariner, Linux. We haven't actually renamed it yet, but that is actually Azure Linux. Mariner is the in-house distro we've been using forever. We just actually went through a rebrand once we announced GA last month at build. So we're working on the backend code to kind of rename that out. But you'll see the kernel version. You can see there, that's the big piece here. So obviously the share kernel is 5.15, 111, CM2, right? So now I want to look at the nested piece, right? So now I can go in and look at the untrusted set. Oh, because it hasn't been assigned yet. I'll wait for that to sign. But just to complete this, to give them some time as well, so what essentially will come up here is that the big thing is obviously in a different kernel, right? Because it's within that nested VM construct. Pretty simple, pretty straightforward from our perspective. You either dump it in the share kernel or you drop it in the nested VM kernel. What we're working on is actually providing a little bit more guidance, because if you look at it from a base perspective, hey, look, I deploy a AKS cluster and by default you can deploy 30 nodes on a VM on a node. But we essentially allow you to scale that to 250, right? So you have a three-node cluster. That's essentially 750 pods that you can run on top of that cluster, as an example, just in that three-node set. So what does that exactly look like as I start carving that up with all those nested VMs, right? What's that density model, those kind of things like that? So as we start moving this to GA, which is around the corner, this is one of the guidance you'll see in our documentation, which will give you some architectural footprints to kind of look at as you're scaling this out from an enterprise and operational perspective. All right, I don't even know. Let's see if that actually deployed. I bet you it's still stuck in deploying. Yeah, okay, didn't find a node. But yeah, so if something fails, we give you pretty good error messages on why it didn't deploy as well through the normal kubectl skulls. But while it's scheduling, so obviously once this comes up, the big thing there is that obviously it won't be on the same kernel. Now, as we go back to the presentation, kind of turn it over to Amar. Or folks, I was seeing the demo, the live demo always end up like this. I think the error there was the VM size was not supporting nested, right? Not all VMs on Azure support nested virtualization. So I think that was the agent pool that was added. Probably we missed on using the one that supported nested. Anyway, so let's enter the world of conference computing and how this helps you harden and achieve your zero trust goals. I want to double emphasize on the least trust, right? You don't trust, quote unquote, a cloud provider and your own administrators, right? And you may be large enough companies where you have hundreds of Kubernetes clusters and there is a company or an offshore or a set of other non-employees managing your clusters, right? So there is a low trust on your, especially highly sensitive data. So what we have to offer here and what does this allow you to achieve? So it's built on Kata Coco, which is an open source project under part of CNCF Foundation. I think we heard a lot of talks about Kata Coco and the community aspect to it, so I won't delve in much. But before I set the baseline on what conference computing is, right? So we've lived in a world where data at rest, you can bring your own keys to encrypt the data in cloud or you do it on-prem as well. Data in transit, we all access websites on SSL or TLS, everything is encrypted. The last three-legged stool was what happens in the data in the memory? If I get access to the VMs, I'm an admin on the VM or a root admin and I dump the memory, is it in plain text? Yes, it is in plain text, right? So this was one vector that conference computing is trying to solve because it can do this with a hardware, right? This is not about a set of software capabilities to achieve this. This is a capability built by AMD, Intel, ARM, NVIDIA. So these are hardware-based CPU capabilities and features that achieve the scope. Because we do that, there are additional set of benefits or features that come with it. For example, the integrity of the code that is loaded. You intend to run this code but you exactly do you know if this is the code that got injected or was there a malware that intercepted your code and injected a code in production. How do you know that? How do you detect it? How do you stop it? Those are all the attributes that come because conference computer enables you to do this, right? So in Azure, conference computing is hardware root of trust. That means everything goes back to the CPU. You can attest this is running genuinely on an Intel, genuinely on a AMD hardware. What's my serial number? So I can get into a lot of details just in couple of next slides and also show you a technical architecture. How we put all of this together and how we're working towards transparency and openness of the whole stack. End of the day, we, as Azure, cannot put a line of code there and say, trust us, we're doing a done a job, right? Conference computing is about, no, it's trust but verify. Can I check the code that is running? Can I attest that the code this is in genuine? Is this actually genuine Microsoft code signed by Microsoft key, right? So we go down deep down and try to achieve these goals. So I think I flashed this yesterday in the keynote as well, just showing you, we've been in this journey for long. We understood the space so well and the scenarios, the customers, and we talked to are just, there is not a segment or a industry that we haven't spoken to out. Lot of ISVs in the blockchain space, people who manage cryptographic keys, tokenization, any credit card number you punch in before it goes into the database, it is encrypted, right? So that operation, so there isn't a scenario which we haven't heard about, right? So just plenty of services. If you look at the virtual machine offerings, we have AMD SEV S&P, which is the next version of SEV, which is much, much more powerful and more performant, as well as much higher security capabilities. To Intel, to NVIDIA, we have confidential GPUs. We have the first vendors to announce support for confidential GPUs. We run A100 today. We are b-switching to H100. H100 is the latest GPU that also runs our chat-GPD infrastructure, right? So with confidential computing capabilities, we are actively working with Cata-Coco community through partnership with NVIDIA to bring the capabilities into Cata-Coco, GPU supporting Cata-Coco. So that work is actively happening. Intel TDX is offering that is today available in preview. You can preview TDX if you like that offering. So anyway, I think the services, last one, there's attestation service free for anybody to use. You have ledgers, a ton of things in the space, and we're just getting started, right? We announced Databricks, because a lot of people do confidential data analytics, and they were looking for a platform as robust as Databricks to achieve and scale with them. So we added the confidential VM capabilities in the last month. All right, so end of the day, right? What are the product goals? What can I achieve? What problems will it solve? Infosec people or security folks? This is our goal. This is our goal. Obviously features go through iteration. We will be transparent and auditable of the full, what we call, trusted computing base. Anywhere where your containers run, the environment, the Linux kernel, and the container runtime is fully auditable and in the open, and we will also sign all of those binaries, counter sign it, you can reproduce the whole environment yourself. You can get to the same exact hash of your environment. And the third one is, this is critical. You have IP in your code, and you want to make sure nobody can steal this IP, ML models, your algorithms, or anything else. We have that capability. Cardo Cocoa supports it today, encrypted container flow. There is a OCI spec and a whole OCI crypt images project happening in the open that is doing this. We're going to embrace that very soon. And the integrity equally, right? I want to make sure, because I asked you to run this container, are you exactly running this container? Yes, we are working with project notary and notary signing to extend the capabilities of container signatures into attestation. This is a new work stream we stood up to just achieve those goals. We have Ratify and OPA, open policy agent, today in AKS supporting this capability where you can lock down your whole AKS cluster to only deploy containers that are signed by you or the parties you trust. You can lock down the whole cluster. We're going even further. You can attest that this is exactly the container that you intend to see. Immutable. Once the pods launch, you cannot change. If you change, that's a change in your configuration. For example, environment variables. You may be logging here today to a trusted source. Your admin, who's untrusted for us, really in the design, they can swap. And how do I detect? How do I stop it? That's what it's immutable. Once the pods are up and running, you want to redeploy or change your configuration. You need to bring up the pods again. Because someone authenticated with you based on the environment they expected it to be. Right? So smallest possible TZB, view stripped down. View come down, I think, close to 60 MB Linux kernel. Highly optimized container runtime. This is much slimmer than the hostOS Mariner Linux or Azure Linux, because we're running very ultra lightweight. Cata agent is going to be part of it. And some bare minimum Cata packages and container runtime will be part of that, what we call utility VM. And the last two parts are, I think, pretty obvious, but I'm going to have another slide that goes into a lot of details. So everything we're working here is with the community. This is what Open Infra is about. This is what we are about. And we take very pride in doing this. We are the founding members of Conferential Computing Consortium. We are actively contributing there on the standards so that every other cloud provider you run with or you want to run on-prem, you have the same standards. Rather than we doing it our way, another provider doing it another way where you have to rewrite your whole DevOps because of that, right? Conferential containers, we are actively contributing and upstreaming a bunch of these stuff, right? So, and the last part is Cata containers. We started our journey and you heard it from Michael about how we want to collaborate. We will meet the maintainers today as well and just discuss our strategy going forward and our continued investments from Microsoft in this ecosystem. I may have to kill my ship room. All right, so I'm taking a scenario, real scenario this is. This is what Royal Bank of Canada did with us, right? Local company, pretty popular in Canada. This is a scenario of exactly what they're trying to do. It's a multi-part, I'm taking a scenario, breaking it down and showing a demo. I think I'm okay with the time, hopefully. You have eight data set from a partner and another data set from another partner. Both are PII data, but very relevant when you match it. In the world of ML and AI, this is what you want to do more, right? Silo data sets can only give you so much business insights. Once you start bringing more data together, it can open up a huge lot of things. And this is the scenario we did and achieved and this is how it works, right? So you have a data set and a code and you have an execution space. This is your confidential container environment, okay? So the things come together, code and data, everything can be measured, counter-signed, counter-signature, can help you attest that this is the environment you want to run and end of the day, get business insights you like. You can transform this whole pipeline to be your continuous training for your ML models that use PII data, okay? I have a demo that I'm going to show you. I pre-recorded it. This is doing Kafka streams. All right, I think I hit all of these points. Anyway, I'll skip this. Our goals, you saw this yesterday. What we plan to achieve is putting all of these boundaries out of the trust boundary. All right, so with that, I have five more minutes. Okay, I'll do my best to take you through the tech stack quickly on how we put this all together. What are the moving parts? All technical folks who may be interested to, how does this work? We put it in the documentation. We are open about it. How we run our stack. We have a AMD SCV S&P machine which is capable of running nested. Every part there is an ultra light confidential VM, right? Super hardened ultra light confidential VM that does not use BTPM or HCL interfaces. What we call it as a direct Linux boot. Your Linux kernel is fully enlightened that this is an SCV S&P and it exposes the SCV S&P devices. We have got a runtime shim. This is a full Azure VM size. We're going to go preview soon. The whole architecture when you drill down, this is how it looks. All the moving parts inside that VM elaborated on the right side. Container management to policy, policy engine that allows. And we are here to collaborate, obviously. So we're working with Red Hat very closely on this collaboration piece to harden this and IBM as well as we speak in the community. So let's switch to demo. So this is the architecture of the demo. So what's really happening is there's a producer. I'm taking a telecom company which has your IMEI data, GPS coordinates. Very highly sensitive. Trying to run through a Kafka stream. I'm using event hubs here instead. And that message at the source that the producer is encrypted, pushes it to the message bus and message bus is encrypted as well, goes to an environment which is a TEE, conference containers. This is running Dapper. I don't know how many of you know Dapper. Dapper is an open source binding tool. I want to show you why it is so easy to do a lift and shift. That's the key point. You don't have to change anything to run a Dapper. And you gain insights to this. All right. So I'm going to switch to the demo quickly and walk through this. Let me see how fast I can go. Maybe I have three minutes roughly. I'll just go dump into the demo. OK. So I'm showing you how the Dapper was installed. Dapper is an additional plugin you can use on AKS. So just auto box. You have Dapper running. This is running on an untrusted side because we want the Dapper side card to run in the Kata Cocoa environment. All right. So I'm doing get bots to see if the Dapper is up and running. And finally, I'm trying to deploy. This is the Dapper code. I'm trying to show you what it's trying to do. So it's just looking for a stream of data coming from Kafka. It's picking up that stream and trying to do a crypto operation, which is doing an RSA decrypt because the message was decrypted, want to be decrypted. But who's giving the key? The decryption key is your private key. And that comes in from Azure Key Vault MHSM through a process called Secure Key Release, which is an added functionality where you are challenged to prove that you are running in a trusted execution environment before the key is given to you, just not your credentials. So there is a sidecar that does this job. We have an open source sidecar that does this. I'm doing just an apply of that with the sidecar that will go and orchestrate and bring the key down to me so I can run a decrypt operation. So that's the pub I was running. So I'm switching to the producer to show how I can start pumping messages to simulate this. So you have all of those four containers running in a single pod, and this is all what I showed you was a confidential VM pod. So this is running CataCoco utility VM. So it's a Dapper producer, so it should be pumping in messages, kind of simulating this whole flow with an IMEI number, which is a cell phone number, and other details. So while this happens, on the other side, there is a receiver which is consuming these messages. So this is an active stream as we speak, and you could see subscribers are receiving it, but there are some of them which says it matched. So these ID match happens only after decrypt operation. So this is happening all in the same environment, and we have this fully working and going for preview very soon. So just kind of showing you, even on Kafka or event hubs, we want to show that this is encrypted as well. So it has to be end-to-end encrypted, but trust nobody. This is about zero trust, and I'm showing you the messages where all of the IMEI numbers are using RSA encrypt. So nothing is in plain text for anybody to dump, and understand, oh, can I track this person down? Or some person of interest is interested to learn more. So we will be putting a GitHub sample for this for you folks to run. So this was my last part to this, and that is it for today. Thank you so much. Any quick questions? We'll hang out here. I know we're running out of time, but any questions for us? All right, if not, go ahead. What we are doing is all of our VM sizes on confidential computing run on top of specialized hardware, that is Intel SGX enabled or AMD, SCV, S&P enabled. What that means is when a VM comes up, it exactly boots up on the hardware that is configured with memory encryption. Yeah, SGX enclaves are fully supported. We've been supporting it for the last three years. We have Coffee Lake and Ice Lake, both version supported fully, even in AKS. So we have pretty much everything in the space of confidential computing we may be looking for. We have an offering out there for sure. We have NVIDIA GPUs also that boot up on A100. All right, folks. Yeah, go ahead. The H100 that we're working towards supports encryption and we have a PCIe bus connecting to a confidential VM. So this is a confidential VM, a confidential GPU with a PCIe, with a trusted channel in between them to do the secure communication and fully attest. Before the workload from a CPU gets scheduled to a confidential GPU, there is this transparent attestation that happens between the CPU and the GPU. And I had a slide maybe that could be relevant to answering your questions. In short, yes. With the next versions we have. In the preview today, we're just trying to get started. All right, folks. I think we're running out of time, but we'll hang out here if you have more questions. Thank you all. Thank you.