 Um, Siraj. Gwyn I met each other at the last Kupikon and we were both talking about this sort of technologies we were interested in and we'd been working on. Um, and we decided we were a really good idea sort of work out how we could bring these two projects together. So in the space of the last several months I guess we've been educating each other on these respective CNCF projects. And here we are today. So thanks to the CNCF. Thanks for coming. Hopefully we can sort of enlighten you as to what these projects are all about and potentially how you can combine them. Rwy'n gwneud ymateb. Rwy'n gwneud. I'm Matt Bates. Fy hoedda â'r cymdeithas â'r ymddangos cofide. Mae'r awdurdod ymddangos a'r ffordd cymdeithas. Fy hoedda'r cefnod ymddangos a'r ymddangos. Mae'r ffordd yn ymddangos cymdeithas a'r cymdeithas. Dwy'n gwneud yn cymddiad, mae'r cymdeithas hon yn ymddangos gyda jet-sac. Mae hyn yn ymddangos ymddangos cymdeithas ymddangos. Ac rwy'n meddwl y gallwn wedi'i rhan o'r rheoli. Ac rwy'n meddwl ar gwell yn ei godi. Sir Arsh. Hi. Can you hear me? Can people hear me? OK. Hi, I'm Suraj. I work for Microsoft. And I'm working on the confidential containers of stream project. Thanks. We're going to remain close to the mics so you can hear us. OK. OK. So just to introduce both the projects, we're going to look at both in turn. Wrth gyd, mae'r ystod o gynnig o ddweud cyfnodd yn ei cyfnodd y rhaid i'w rhaid i'w projiexio. Yn y ffyrdd, mae'r ei ddweud o'r llun, rhai ni'n ei wneud i chi. Ond yw'n ddweud i'r ffyrdd o'r llun o'r ddweud o'r ffyrdd yn bwylltio? Rhaid. Rhaid. A'r ddweud o'r llun o'r ffyrdd yn ddweud o'r ddweud o'r ddweud o'r ffyrdd. Rhaid. but more to the point. Spiffie and Spire in production, fewer hands. I would love to speak to you at some point and learn more about it, but for those of you who don't know, this is great, you're hopefully in the right place for the purpose of the next 10 minutes, we're going to let you know a bit more about what Spiffie is. So Spiffie is Spire, they are a CNCF graduated project, they've been around a long time, obviously a number of you are aware of the project, probably for various reasons, but Spiffie itself is secure production identity framework for everyone. So the purpose of this is it's a framework, it's a set of APIs, it's a set of interfaces all open for basically provisioning and validating identity for software systems. So if you're wanting to provide a consistent way of being able to identify software, this is a really good set of standards. And these have been adopted throughout the CNCF ecosystem in various different projects. We've got various different aspects to this and I'll go through each in turn, just very briefly. So the Spiffie ID is a way of basically being able to identify the service, how do you actually represent the service in an identity. So there's a way that in the Spiffie project that that's proposed and that's actually used by various different projects. So for instance, if you look under the hood in Istio, you will see that Istio uses the Spiffie ID. The Svid, the Spiffie verifiable ID is actually sort of an identity document. So this is like how you would actually represent this cryptographically. An in Spiffie and Inspires will see you can represent this as an X5-9 certificate or you can use a JWT token. But this is really about like actually putting it into a cryptographic form and then providing it to a workload. The way that you obtain the identity is via the workload API. This is a way that workloads, your applications, your services can obtain identity. And interestingly, it does so in a way that's sort of authentication-less. It does not have to authenticate. So the workload API is a node local API. We'll see this soon in the architecture. But the way of doing this is like through a process of atta station. It doesn't have to have some token provided beforehand. It can actually provide some evidence about itself. So the workload API runs on the agent on the node. We'll see this in a little while. We've also got a Spiffie trust bundle. This is a way about how you would represent the public keys effectively in the PKI hierarchy. So when you want to basically be able to validate trust. That's the trust bundle that gets represented. And we've also got this thing called Spiffie Federation. So if you wish to be able to federate different environments, different trust domains together, there's a way for that trust to be, the roots to be exchanged and trust to be established. So really what we're going to look at now is the Spire. So Spire is the reference implementation. So Spiffie, the framework at Spire, the reference architecture. And Spire is used by a number of very large end users that contribute to the community. So for instance Uber, TikTok, just to name a couple, use Spire in production. And obviously some of you here in the audience too. So really just kind of distilling this down and keeping things rather simple for the purposes of today's talk. Effectively we've got two components or three components I guess if you will. We've got the Spire server. This is responsible basically for managing and actually providing identity pretty much and it's backed by a data store. So we need to basically register things upfront. So it's your responsibility or perhaps through some automation to register your workload with Spire server. And you do so using sort of a set of registration entries. Effectively attributes that describe your workload and where you expect the workload to be and how you expect the workload to present itself. So that all gets stored in a data store pretty much. We've then got these agents that work on each of the nodes and the nodes can be running in Kubernetes, they can also be running outside of it. And I think the everyone aspect of this is important. This Spiffy is all about providing identity to all manner of different software systems, whether it's in cloud or on-premise. So this is your way of providing a very consistent way of doing it. So yeah, let's imagine this could also run in bare metal as well as of course in a cluster. Now this agent is basically responsible for running this workload API to implement that workload API specification. And workloads themselves talk to the agent, this being a node local API. And rather than actually having to present some kind of token, this is sort of an unauthenticated API, it runs locally on the host itself. It actually provides kind of discovers information about the process. So if you're running on Linux as an example, what will happen here is the agent will basically introspect the Linux kernel. It will discover information about the process. It will find attributes and information. It will maybe consult, for instance, the container runtime. It might speak to the Kubler. There's also running on that node. And it will find a set of information that it will use and it will attest. And all of those attributes get provided up through the API to the server and ultimately that gets compared to what was pre-registered. The idea being is you're validating whether this workload is valid and should be where it is and whether it should be granted identity. That's pretty much the flow here. The important thing is the agent's doing it for you. It's pretty transparent and the workload itself does not have to do that whole dance around obtaining identity, rotating it and all of that kind of difficulty. So that's pretty much how these things work. It's probably also worth noting there is also a process of node attestation as well. So it's not just about the workload itself, the testing itself, but it's also about the node. And so you may wish to validate, for instance, that it's running on a cloud instance as an example. Maybe perhaps you might want to go down to the level of the hardware, maybe consult things like the TPM. And there's a variety of different plugins in the SPIA project enabling you to do this. So you can just plug in to the ecosystem pretty much and use all of those different plugins. So that's really how things effectively look at a high level in SPIA. Why would you do this? You're probably asking. You're probably thinking, OK, I've set up all of this infrastructure. Effectively, what this enables you to provide is a set of cryptographic documents, X509 certificates, that you can then use in your services to establish something that could be your TLS connection. And obviously this could be within a cluster. This could be outside a cluster. This could be a pretty heterogeneous environment pretty much. And so in this example here, you've got an example. You can actually see the ESFID and how the SPIFI ID is represented. So it encodes information about the service. It includes things like the pod name, the namespace. And again, if you've used something like Istio or some of the other service measures, you'll probably recognise that sort of encoding of detail in the SPIFI ID. So obviously we can sort of use these certificates. And this example actually shows Envoy. So if you wanted, you might not want to do it yourself. You can, if you wish, consult and use the Workload API directly. There's a set of language bindings to do that. There's a set of helper projects to make it easier where you can actually be provided it as a set of files that you can consume in your application. There's a CSI driver that you can use. Or you can use the Envoy integration here as well. So you can have the support of SCS and Envoy will do the magic, of course, of providing you the mutual TLS connection using those certificates. So really what we want to focus on is the security of this. So obviously, going back to the architecture here, you can see we've got SPIA server. It's responsible for managing identities. It's responsible for issuing them. And so as part of that, of course, it has sensitive signing keys. Ultimately, this is all backed by PKI. And so that SPIA server is clearly very sensitive. And you have to do everything you can to protect it. So referring here to the Solving the Bottom Turtle book, which I'd highly recommend you read. It's a great read on everything SPIA and motivations for the project and the various ways that you can use it. So we're quoting here and there's the links that you can follow. But really, it's very important obviously to run something like SPIA server in an isolated kind of way. You don't really want this particularly to run in a sort of multi-tenanted cluster. Effectively in the threat model, workloads are un-frusted. And so therefore, you do not want to have workloads running in the same place as the SPIA server, which hopefully makes a lot of sense, pretty obvious. So first really good practice is to make sure that you run SPIA server on an isolated node and that could well be a separate node pool or indeed you might even want to run it on a server instance that runs outside the cluster. So that's certainly number one. Obviously there's some quotes here from the book and I recommend you take a sort of further read. Managing and issuing identities, really sensitive component, but obviously in the SPIA server we've got really sensitive keys and all of these designing operations take place in the SPIA server. So what we wanted to consider when we were chatting at the last pubcon is okay, we've got this sensitive component. How about we employ some further protection? We think about things with defence and depth in mind and so effectively if you think of the SPIA server and what it's doing the fact is that by default anyway has un-encryptive memories. All of these signing operations, the keys themselves, the issuing authority are in plain text. So if someone had access, a bad actor had access to the node in which the SPIA server is running, they could strictly speaking get access to that and really at that point of course it's game over. You've compromised the entire trust domain and they can start sort of minting certificates and impersonating services in that trust domain. So what we wanted to consider is like this attack vector. So at that point, I think it's probably a good time at Handau Siaraj to talk about confidential competes. All right, thanks Matt for the introduction of SPIA and SPIA server and how like it's a single point of failure depending on how you configure it. So let's talk about what confidential compute is but we'll take a sidebar on what the current state of data protection today is, right? So when data is stored on disks, it's encrypted, it's when it's on network, it's encrypted, you have solutions for both of these problems. But when you are processing it in memory, the data when it's brought in memory, it stays there unencrypted, right? So that's what confidential compute tries to solve. This third aspect, this third leg of data protection, that's what confidential computer brings in. So it's like a SPIA's logo, the stool. So you get like a third leg there. So what is confidential compute? It's a processor technology backed by AMD, Intel, it's within the processor. It creates a secure enclave and this enclave could be a VM or a process within the enclave or that enclave's memory is protected as an encrypted processor knows how to encrypt and decrypt that memory. And finally, that's the whole thing like memory protection. So within the enclave that application doesn't care about, it's all transparent, within the enclave it's all plain text like regular application would work. So why confidential compute, right? So there are a lot of like rising security concerns around, you know, containerized applications. Then, you know, you want comprehensive data protection like we talked about the third thing about memory protection. And adherence to privacy regulations, like especially here in Europe, where the data protection laws are pretty stringent. You want your data used to be protected or encrypted. So let's start with talking about the usual trusted compute base. Traditionally, you are trusting the hardware, the bias firmware, the host operating system, the hypervisor and the guest. But with confidential compute, you only trust the underlying hardware and everything in between, you can stop trusting. So that's what confidential compute is giving you. It's reducing your trusted compute base. So that way you have smaller attack surface. And that's what you get. So what are the target audience, right? So first is like anybody dealing with PII data, healthcare, businesses, or like financial businesses, governments, et cetera. And yeah, so anything that needs, so basically you can think of anybody who is running on untrusted infrastructure and they need higher security. You don't know who has access to the underlying host, even if somebody promises they won't do any illegal stuff but you don't know these organizations are big, right? So yeah, anybody with sensitive data for that matter. So let's see what confidential containers is, right? It's an open source CNCF sandbox project. The whole idea of confidential containers is that this technology enabled by chip makers, we want to bring it to Kubernetes. Like every pod that you want to process sensitive data in should be backed by encrypted memory. So let's take a look like how it works in general. So I think most of you are familiar with this Kubernetes diagram, block diagram, there is control plane on the left and then there are a bunch of nodes. Each node has a cubelet as an agent which is responsible for bringing up your workloads. So we'll zoom into the node here. So the typical interaction that happens is cubelet gets a request, it passes to container D and then it uses RunC to start a pod. This is the usual. But with confidential containers, we use a runtime called as Cata. Who has heard of Cata containers here? A lot of hands. So a tooliner for Cata is that instead of using RunC, Cata uses lightweight VMs to start up pods. So this is the same interaction with Cata. There's cubelet, container D but instead of RunC, it's Cata runtime which knows how to talk to the underlying virtualization technology. It brings up the VM, there is Cata agent, it's a PID one. That's what is interacting with the external Cata runtime. And the pod comes up and then that's how you have it there. But with Cata confidential containers, what you need is hardware that can bring up, which is capable of running confidential TEs. So... Okay. Oops, hold on. I think I messed up. Okay, anyways, never mind. So with T, what you get here is there is cubelet, the same container D and then... Oh, okay, it's switched. What I'll do is I'll move this here. Okay, that's what happened. Okay, we are learning here. So it's Cata CC. You have this hardware that can... Okay, yeah, it's working now. It's cubelet, it's Cata runtime, same thing, KVM virtual machine comes up, Cata agent exists. But there is another set of components that you need, which does attestation. So remote attestation is a big part of using confidential computer, confidential containers. Because you want to ensure that you are running on a confidential hardware because underlying hardware provider can lie about it. They can be like, yeah, this is a confidential VM. But how can you be sure? So that's where this attestation process comes in. These components that are part of the Cata VM, they talk to the hardware, create an evidence, send it to an external party. This party is... Here we call it relying party. It's your infrastructure where you can do verification. Now this verification involves... You see this evidence that you got. Is it really signed by AMD or Intel's hardware? So once that attestation passes, that's the basic thing you do. But you can also see if the kernel is correct, the initad is fine, the kernel parameters are... So you are ensuring that inside that VM, everything that you see is what you expected. And if anything is different, you just don't pass the attestation, right? Now what happens after attestation is successful is up to you. You can just be an acknowledgement or a negative acknowledgement, or you can release a key or a policy or whatever it is. In this case, what we are doing is we are releasing a key. Now this key is actually used to encrypt and container image. So we download this encrypted container image, decrypt it and a pod is started. So now you can use this encrypted container image if you want to protect your IP or the application. But if the application is not so sensitive, you can release something like a secret to download something from other cloud or other sources or whatever it could be, right? So moving on. The CoCoThreat model here is that it promises two things, confidentiality and integrity. Confidentiality because anything outside the TE cannot see what's inside. And integrity because you have done this remote attestation, you are ensuring everything that you expected is as is. And the other thing that we assume from the confidential containers point of view is that anything outside that pod, even the worker node, is untrusted. The Kubernetes control plane is untrusted. Everything outside that thing inside the cluster is untrusted. That's what we go with. So there is this basic demo here. Here I show how you can start a pod, read its memory and basically any secret that you have gotten you could see. I won't go through this demo here because we don't have enough time, but I have another demo that I'll show at the end. But go check out this demo when you have time. So let's talk about how confidential containers inspired comes in. I'll hand over to Matt here. Thanks. That was great to hear about the CoCo project. So what we thought about is how potentially you could take these two projects and take the best of both, as it were. And so with the Spire server in mind, we're going to focus on primarily Spire server at this stage. There's some scope for potentially putting the agent into the enclave. How we're going to focus on the server. Is there is real interest in wanting to protect the core signing keys? There's a recent quote here from somebody in Microsoft. This is a host breach. They're thinking about how to secure things going forward. They've noted that they want to be able to put identity signing keys, so obviously very sensitive, into both HSMs and use confidential computing. Now it's more widely available. So with that kind of in mind, how could Spire benefit from this? So what we thought is taking that defence in depth approach and CoCo providing the means now to encrypt memory, could we take the two and sort of protect the data in use? So as that signing happens, can we protect against anyone? Hostile that may want to see that operation and obviously ex-potrate the keys. And so this is really kind of what we sort of set out to do. We now do have a demo. I wish we were going to run. It's not live, but it's pre-recorded, so we're going to take a look at Spire server running in CoCo. Okay, so this is the demo. It's a single node Kubernetes cluster. We can see in the confidential containers this namespace that the confidential containers operator has installed a bunch of things. And once the operator is up and running, it creates a bunch of runtime classes. That's the interface for other applications to use confidential containers. So there's Cata, Cata, QME, SNP, all of that. Now what we'll do is we'll go to the worker node and we can verify this is really a CVSNP-enabled machine. So this CVSNP is AMD's confidential computer technology. So let's install Spire with Spire CRDs first, and then the Spire server. This is without any modification, like regular Spire thing. Although I have made some modifications here, like for persistence, what I have done is chosen to use EmptyDir with memory as a backend and not relying on the default. And then there is data stored, there is SQL data stored behind the scene. And yeah, the key manager is also in memory. So yeah, let's see if the pods are up. It's up and running. Let's see there is no runtime class or anything in formation. This is a regular runC-based thing. We can see from the logs that it's up and running. It has issued a bunch of passwords. Now, yeah, let's also register the agent and also the client. Now this is from the regular quick start thing, right? If you are aware of the Spire. This has nothing different. It's just the socket is changed and we'll try to talk to the local agent. The client is deployed and, yeah, it's the same socket path and everything. And yes, it is issued. Now, this is the regular thing that you would show anybody when you're showing Spire. But let's look at the Spire server here, right? Like on the worker node, let's get the PID. What we'll do is we'll do a core dump of this PID. And once we have this core file, we can look at its ASCII representation. So we can search for stuff like X509 search and like private key, for example, and stuff like that. So basically this is just to show that on a regular machine we can really see what's in memory and it's not that hard. If somebody has access to the host, yes, they can do whatever. Now, the same thing we'll do again, but this time around we'll use a runtime, the runtime class called Cata QME S&P. So the diff from the values file, like I can show you here, the interesting part is this runtime class. That's the only thing that's changed. Now, you can see the Spire server is deployed and this time it has a runtime class name as the value. And it's, yeah, looking at the logs, it has started issuing s-reads and everything. So just ensuring it's working. We do the registration of agent and client. Same client file, like nothing has changed here. And the client is deployed and we can see that the s-reads are issued. Now, again, this time around we go and find where the Spire server is, but we're not looking for Spire server, we're looking for QMU because it's running in a VM. We get the PID for this VM, store it in the ENVOR, do co-dump, look for this, look at this core file. There's no X59, there is no private key. And it's not like the file is empty. It has a bunch of stuff. It's just that it's not relevant. That was the demo now moving on. So you would argue, right? Like there are limitations to this demo. Like the DB coexist with the server or even the server is running on Kubernetes. So that's the point of this whole talk. You can run Spire on Kubernetes but with encrypted memory. And the DB coexist with the server so you could use a highly available database, but then the database has to be, like you could use a hosted service for this database. But then is that hosted service running on a confidential compute hardware because depending on what kind of sensitivity that database entries has is also something to consider. The other argument is that keys could be stored in a KMS. We can use KMS and not use anything in memory. But then what about the KMS credentials? Like where are they coming from? If somebody, again, can do the same thing, do memory dump. You don't even have to do memory dump. If you have access to the host, you can see at those secrets for those KMS. And then is the KMS backed by any hardware, like HSM or confidential compute hardware again? Like, you know, whatever the KMS is, security. So, yeah. There are these bunch of things to think about. And this talk primarily focuses on the server. What about the agent's rate? Do we need security for the agent? But then we don't assign so much importance to agent's security because the blast radius is smaller. So, yeah. To further take this discussion forward, I invite Matt. Thanks. Yeah, that's great to see the demo. So, there's a couple of further improvements. I guess we realised that this gave us the opportunity to do a whole lot more. So, this is just some of the improvements we think we can propose in the respective projects. So, number one is really adding plug-in support. So, Inspire has a great plug-in system for a variety of things. For instance, for the data store, for the key managers. And so, one thing to add here would be potentially plug-in support for something like Azure SQL and other vendor, you know, variety of databases that have confidential hardware. So, this would kind of give that guarantee that the data store, the backspire, is also protected as well because compromise of that could lead to workloads being compromised as well, or at least the trust domain being compromised. The conflict could also come from a KBS. So, that's that key broker service if you remember from the diagram that I showed. So, yeah, obviously we can make sure that we're also deriving it from a trusted source. That's gone through a process of attestation. There is, in the Inspire project, this work noting, and we probably ought to include a link here. We will add it to the slides in fact. There is actually a bunch of folks that have been working on a node attester for SCV NEP, SNEP. So, that will affect to be a test that the Spire agent is running on confidential compute hardware. So, that's something that is already in the project. It's, I think, experimental. So, yeah, we can provide a link to that just so you can find it later. There is also a scope we realise for a workload attester here as well. So, you might remember earlier in the talk, I was talking about workload attestation, you know, introspecting the Linux kernel, finding out information about the process. You could also, in this particular case, build a workload attester, and obviously there's a plug-in system here. So, you can go and build a plug-in that's external to Spire itself that could do the process of verifying that the workload is running within CoCo. So, this would be a particularly, I think, good component to go build. I think this is probably something we might go do. If anyone's interested, let us know. And there's also, if you remember from the Spire architecture, there's the server and the agents. The agents themselves also have keys. They have keys for the workloads. The blast radius is more limited intentionally. That's in the Spire threat model. So, it only manages the keys for the workloads that are running on the node itself. But yeah, there's the opportunity, I believe, to sort of think about running the agents within the enclave as well. So, this would be probably a good opportunity to explore this somewhat further as well. So, yeah, this is right, great. Hopefully a good opportunity to go ahead and sort of see how these projects can further help each other. So, I think we're probably running long time. So, just to kind of give some takeaways for you and we're obviously happy to take some questions. So, Spiffy inspire. We learned about Spiffy is the framework for providing identity to software systems. Spire is the reference implementation. We've learned a little bit about the Spire architecture, really at a high level, the Spire server, the agents and the workloads, and the need to protect the Spire server. Obviously, the blast radius is significant, so it's important to protect it, run it on its own dedicated hardware and we've seen that we can use Coco in order to protect the memory when in use. So, the signing keys and the signing operations can be protected pretty much. There is an opportunity for some extra plugins. So, yeah, as I said, data store plugins, key store plugins that use confidential computing, that's something we'd like to explore. And as I said, there's further opportunities for more attestation, something like a per workload agent is something that we believe could be in scope as well. So, these were two open source projects, please, please get involved. There's the links here to the prospective projects and some documentation. We'll make the slides available. They've got lots of reference links. So, yeah, please find the slides and please let us know your feedback. So, I think we may or may not have some time for questions, but thank you very much for being here. Feel free to ask or we're here if you want to come and ask us more directly. You can come to the mic and ask questions here. Haya. We use enclaves as part of the managed offering. So, we use nitro enclaves, but I think the idea is the same, right? Just make sure what you're running is what you intend to be running. How do you deal with the problem of making sure what you put in the enclave is what you intended to put in the first place? So, the security of getting to the point where you have an enclave at all. So, nitro enclaves work, I think, a bit differently where they ensure in software that the admin doesn't have access to the workload or VMs. With confidential compute, it's guaranteed by the CPU or hardware. So, what happens is in the attestation process, the evidence that is generated, it is signed by the underlying hardware, right? Like the CPU, AMD or Intel. And then, when you verify it on the remote attestation on the other side, you see the hashes for those applications are those exactly what you wanted. And if anything has changed, you see in the attestation report that it has changed. Does that answer your question? I was worried more about the point where you get to... For example, you put spire in the enclave. How do you know spire is what you put in the enclave? You need to know the hash in advance, right? Also, one disclaimer. I suggested a SQL because, not because I work for Microsoft, but because it's the only SQL server out there I found which runs on a confidential compute hardware. So, we need more offerings, hosted offerings if people wanted that are running in confidential compute hardware. No questions? I think we are out of time anyway. Cool, find us around here. Thank you for attending. Have a good one.