 All right, first of all, thank you so much for being here. Especially many of the maintainers and the members who have contributed to Spiffy and SIRT Manager, who have enabled us to be here and talk about securing edge workloads with Spiffy and SIRT Manager. My name is Sitaramayur, and I'm here with Riyaz, Riyaz Mohammad. Hi, y'all. And our job, obviously, at JetStack is to talk to many of our customers. And one of the things that we have heard many of her very often is the challenges around securing workloads, especially as they are deployed across multiple different clusters, multiple different domains. And as we sort of started to think about the motivation for this session, we essentially wanted to look at it purely from a perspective of, what does it take to provide a standard way of generating identities or workloads across clusters and essentially build trust? That's the simple motivation that we started to look at this from. Securing workloads, having the ability to generate an identity, and distributing trust across multiple different trust domains. That was the purpose with which we started thinking about the challenge that needs to be addressed. And this actually started off purely from many of the conversations that we have had with some of the customers. And specifically, a customer that is in the financial space. And while this picture that you see here may look very simple, the challenge that the specific financial institution had is massive. And we will demonstrate with some of the edge devices that we have here, how we have addressed that, how we have solved that, and what does it mean to address and secure workloads. So what you're seeing there on the picture is a bunch of terminals, obviously, and then a set of services that are caught there. The simplest way to sort of think about this is the bank here provides a set of services that are consumed by different merchants. And as part of that, the merchant terminal that's out there, science generates some kind of a private key pair, sends it across with Jot, obviously the bank validates it, and sends a token back. Describing this in a very, very simple form. And over a period of time, this token is used to sort of manage payments, refunds, and things like that. And this is the standard operating model in which various different merchants work, payment terminals, core bank services. The challenge that really started to sort of surface for this bank was when a merchant terminal was, or rather a merchant was off-boarded, terminals were lost, terminals were sort of, you know, repurposed for other purposes. And that sort of, you know, made a challenge for the bank one to figure out who is using the terminal. And it also started to come into the notion of a fraud where a fake merchant or a fake, you know, somebody who has an account terminal would essentially use this terminal to process refunds. And that sort of, you know, obviously, as you can imagine, translates to, or translated to millions of dollars of loss for this bank. And as we sort of think about this, you know, we said, what if, what if, what if every workload has a unique identity? And, you know, if you've been here, if you're here in this session, you're probably a little bit interested in understanding a little bit more about SPIFI, what it can do. If you're thinking about, you know, TLS and X-Fi 9, you're probably using or are familiar with cert manager. So what we're trying to do is basically mapping this, what ifs to various different scenarios under which you will essentially use cert manager, SPIFI, and a few other things that we're gonna talk about in the next 10, 15 minutes. So what if this identity was used by the workload to say who it says it is. And that is one of the things that we wanted to ensure that happens. And then what if trust was automatically distributed when a terminal is activated. So we talked about these terminals and the way they activate themselves with some kind of a bootstrapping process, talks to the bank services, and then it's all ready to go. The way we were thinking about this, there should be some sort of a trust model or a trust that can be established in a way where services on the back can know exactly what the terminals are and how they work. And the same way, how do you sort of, you know, manage or rather figure out a way to sort of, you know, revoke that trust when a merchant is off-boarder. So these are essentially some of the what if challenges that we started to look at, you know, purely from the perspective of, you know, addressing, providing an identity for every single workload, ensuring that, you know, each of these workloads can talk to each other in a seamless, mutually authenticated way and all of those things. And obviously, as we started looking at it, we said, of course, Spiffy. Spiffy is the answer to many of the things, especially in this context. And if you have visited the Spiffy, sort of, you know, the Pavilion, or, you know, if you've talked to anybody who is the maintainer of Spiffy, you know, if you have not, please do, because they'll talk a lot about the things that can be, that are, and that could be applicable to you, the work that you do within your organization. So, basically, from a perspective of Spiffy, you know, what is Spiffy? Secure identity for every workload, irrespective of where they're running. I mean, a very simple way of describing it, it basically allows you to authenticate mutually in an easy and reliable way. Very simple definition. Obviously, we want to get a little bit deeper on this to show how it works and what it means, but from the perspective of a definition, you should see that, you know, Spiffy essentially allows you to, one, attach some kind of an identity to a workload and based on that, securely and mutually and reliably allow access to other workloads as you sort of, you know, talk to each other. So, the last bullet there, obviously, is something that's important. Spiffy just graduated within the CNCF and it also means that if you go to the CNCF store there, you'll find a lot of jackets, t-shirts, socks, feel free to buy them, you know, they're nice and looks good. The other project that I want to talk about, and before that I want to just tie in a little bit of the basics of Spiffy. You know, we hear Spiffy a lot and I'm pretty sure a lot of you are curious or many of you may know what Spiffy does, but there are several characteristics of Spiffy that we need to understand before we sort of, you know, say what does that secure identity look like. At the heart of, you know, anything that you do with Spiffy is a workload. I mean, we keep saying, you know, there is a workload that needs an identity, but what exactly is a workload? Workload is basically any software. You know, it could be your application, it could be your database, it could be a set of services that you run, it could be something that you run in a cluster, within a node, the pod, you know, virtual machine, whatever that might be, any software that you sort of, you know, look at it from a perspective of functionally doing something is a workload. Simply put, the second piece, which is important to understand from a Spiffy perspective is Spiffy ID. And this is something that you'll probably, sort of, you know, hurt quite a few times. Oh, if you're using Spiffy, you need to generate a Spiffy ID. What does Spiffy ID even mean? Simply put, it's essentially just an URI, a URI that identifies a workload. And most often, and from the specification perspective, this URI is represented as Spiffy colon slash slash, trust domain slash whatever workload that you want to identify, some kind of an identifier. There is no real way of saying, you know, this has to be in a certain way. The example that you see there represents some kind of a trust domain, a workload that is in a certain namespace, tied to a certain service account that owns that deployment. That's essentially representing a Spiffy ID for a certain workload that is deployed within your cluster. The third important aspect to understand within a Spiffy world is trust domain. Trust domains essentially represent some sort of a security boundary, a cluster, a namespace, a virtual machine, a physical server, whatever that might be, something that you can draw a well-defined box essentially represents your trust domain. Spiffy S-WIT, this is, as I'm building this, it's to help understand that there are several other things that you need to consider. It's one thing to say that I have identified or I have an ID for Spiffy which is represented by some kind of an URI. What you need to do next is to be able to make sure that this identifier can be attached with some kind of a cryptographically verifiable document which is essentially an X-Fi9 certificate. And that's essentially what Spiffy S-WIT is. Something that cryptographically helps you identify something that you have signed because end of the day when you say I have a workload, that workload needs to be tied back to something that can be cryptographically verifiable. That's Spiffy S-WIT. Then there is Spiffy workload. Essentially something that allows you for the private key to be signed because there is also the signing process. Somebody has to sign your workload. Somebody has to look at what CA should I use for signing. There's all of those things that has to happen before that ID is generated. Rather, the Spiffy S-WIT is generated. So that essentially is something that is happening with something that we're doing as a workload API. The last one, which is also important, is the notion of trust bundles. Trust bundles essentially represents, from an X-Fi9 S-WIT's perspective, a collection of root CA's, right? So that's what it is. He's a sync to cross namespaces in your cluster. Yeah, basically any CA's that you want to attach against workloads, and you could have a workload A running in cluster A, could have workload B running in cluster B, could be signed off of a completely different CA root, something across with a completely different CA root. How do you sort of manage the trust between those two clusters or two namespaces or two trust boundaries, so to speak, right? So trust domains rather. So trust bundles is a way of making sure that each of those workloads has, or at least have a way of being modeled to trust a certain chain of CA's that you are using within the workloads. That's basically what Spiffy basics are. Basically all of this in some way or the other map to things that you're doing, whether you're deploying your workload, whether you're deploying any other application, things to think about is from a perspective of Spiffy is understanding what are these various elements that you deal with. Obviously we go from here to ensure that everything can be absolutely automated. These things are automatically generated. These things are automatically managed. That's essentially what we want to do. And from that perspective, the other project that I want to talk about is cert manager. Cert manager is something that many of you use, many of you are familiar, essentially something that allows you to automate, provision, manage TLS certificates at scale. People use it for their ingresses. People use it to mount it to their parts. People use it in the context of anything that requires TLS certificate that will allow you to securely provide access to your application. Something that's very commonly used. And there are several add-ons. One of the things that we talk about very often is only cert manager. Cert manager has a lot of supporting add-ons. I usually call them cert manager and friends because these friends essentially allow you to build much more robust capabilities for your security infrastructure within Kubernetes. And we'll talk about some of those friends as we sort of get through the rest of the presentation. We are now in incubating. So this was announced, obviously, just two or three weeks ago. Cert manager is now an incubator within the CNCF ecosystem. We have a lot of maintainers actually here in Detroit. So if you haven't visited the cert manager booth at the Pavilion, please do. And we can talk a little bit more about what cert manager does. And at the top of the title, you will see me adding a few of those pluses. And that's intentional because I'm actually building some additional add-ons and I spoke about friends. These friends essentially will provide you the ability to sort of do additional capabilities. So we talked about cert manager. Cert manager CSI driver, Spiffy, is essentially a CSI driver plugin that allows you to basically make sure that you have an automated way of injecting Spiffy S-Wits to your workloads that your workloads can use. A simple way of thinking about it is, anytime you need an X-File 9 certificate to be mounted in a part that is Spiffy S-Wit, that is, then you just use Spiffy drivers, the CSI driver Spiffy. All you do is in your deployment specification, you'll add some volume attributes and then automatically you will have a certificate injected into it. The second add-on that I'm gonna talk about is trust. Trust Manager is a project that is part of cert manager, essentially to help distribute trust, or distribute trust bundles, so to speak. So if you go back to the definitions that we draw for Spiffy, we talked about some of these things, ability to distribute trust, ability to manage trust, and trust manager essentially is one of the projects that allows you to essentially distribute trust across your various different trust domains. The third one is also a most important sort of add-on, which we call obviously the approved policy. If you are familiar with cert manager and if you have used cert manager, one of the things you probably have seen is every time a certificate request resource is created, there are certain properties. There is an approved, denied, ready. And I would say 99% of you, if you have used cert manager, you probably saw that approved flag always true because the approval controller that is built into cert manager by default is true, which means every certificate request is automatically approved. Whether they are ready or not depends on your CA and the issuer configuration and all of that, but at the basic configuration perspective, if you create a certificate request or some other generator certificate, that will automatically be approved. However, from the perspective of Spiffy as with, there are certain things in the specification, so the list that you see there is actually from the Spiffy specification that every X509 SWID needs to have certain characteristics and those characteristics are defined there and what we provide as part of the CSI driver is an additional controller, a policy approval controller that ensures that every time a SWID is generated, it complies automatically to the specification defined by Spiffy. So you don't have to write your own or author policies, this is automatically built into the CSI driver Spiffy plugin. So it ensures that every single certificate that's generated complies to those properties that you're defined there. So from the perspective of the solution, I just wanted to walk through them and I spoke about a bunch of things and I talked about all of them in an individual way. If you sort of look at it at a view from where you have to think about how all of these work together, what you really have is a bunch of different components. You have a certain manager there, you see policy approval, you see the trust manager, obviously CSI driver, I'm representing down here primarily because it's a demon set, it is attached to obviously the node and every time a part is scheduled to the node, basically CSI driver says, okay, so part is scheduled, there is certain specification defined in the specification of a deployment that says these are the volumes, these are the volume mounts, and I need to make a volume, I need to publish that volume and as part of that deployment, I also need to mount the volume. Those are the specific things that sort of happen and as soon as that process is over, the CSI driver plugin essentially initiates a request to a certain manager with the CSR, that CSR contains a certificate with that specific properties that we just saw in the previous slide. It carries all of the information about your workload, it carries all of the information about the service account that's attached to it, basically just by saying those few lines that you see there, volumes, where you want to mount the volume and what that volume itself looks like, the CSI driver spiffy, just those six lines or three lines, I don't know how much, I can't count, but a few lines that you have there essentially enables you to make sure that every part that you have has a spiffy as with automatically mounted on that volume mount. And from the perspective of what really happened in this third step is basically that CSR was generated and once that happens, cert manager does the certificate reconciliation, it goes through the process of ensuring that that certificate is going to be fulfilled. Part of that, it also has to make sure that the trust and everything is generated, rather all of the trust bundle that was created is distributed back. I spoke about the policy, it ensures that you're not creating some certificate by hand using the issuer that is tied to your spiffy because there is the notion of an issuer tied back to your organization CA which could be backed by the security teams, managed HSM or whatever that might be, right? So after this, obviously it's basically just mounting the certificates, basically the TLS cert, the private key, the CA cert, all of these things that needs to be available for the part, specifically in that war run, secret spiffy.io, you'll have all of these things readily available there. And then, once these are all automatically mounted, so it's available for this spiffy driver to rather the part to be to use that application or use that SWID whenever it talks to other applications. So you can imagine that if you have this one single deployment, replica of five, every single one of them has a uniquely cryptographically verifiable X509 SWID mounted on the volume mount of that specific part because when those parts talk to other parts or if you're talking to other things, every one of them have an identity and your communication is built on the principle of mutual authentication, mutual TLS. Last step, obviously, just as we talked about, you deploy, of course you aren't deploy or you delete parts and you destroy it because you're probably going through the cycle of deployment continuously. And once such a thing happens, obviously, all of the certificate material that is associated with that workload is gone. You destroy it because one of the things that you may be familiar with, when you create a certificate resource using cert manager, there is a TLS secret that is automatically created. With the case of spiffy driver, there is no TLS secret because all of the material that is required for that part is actually injected into the volume mount. And that part is deleted, it goes away. So there is no TLS secrets to manage, TLS ticket to track off, TLS ticket to our back on. And also from a security standpoint, some of the larger financial institutions have at least guidelines around not creating any TLS secrets, especially that has private keys and things like that in there in the material. So this sort of addresses that notion of an identity that is issued to a workload when it needs it, where it needs it, so that the other workloads, when they are in communication, they are all mutually talking to each other. That's the sort of a high level view of how all of these things work. Obviously, there is spiffy in word here, there's cert manager in word here, cert manager, CSI spiffy driver is in word here, trust manager is in word, policy approach is in word, lots of different projects, all sort of coming together to provide this ability to help sure that you are securing your workloads irrespective of where they are running. So the idea is that you're not just using one single component, you're using a combination of things to help satisfy that security aspects of how you secure your workloads. So I wanna now hand it off to Riaz to talk a little bit about the demo architecture. So he's got Raspberry Pis here, he's got all sorts of things to just sort of walk through, walk through the architecture of the demo itself, make it a little bit real, and then also come back and sort of wrap it up. All right, thanks, Sitaram. So for a demo architecture, we have a cluster that's running on K3S, it's a two node cluster that's running on Raspberry Pi. And then this is where all your client workloads are going to be, edge workloads are going to be running. And now we have another cluster which is a six node GKE cluster that's running in the cloud. So what you see here in the middle are all components that Sitaram already talked about. Set manager, speed driver, policy approver, trust. And these are all installed in both the edge as well as the GKE cluster. So what really happens, right? So let's look at a server-side component or a server-side microservice that is deployed on the GKE cluster. The CSI speed driver sends a CSR, set manager receives it, and it creates a certificate request. Now there's a certificate request policy that is bundled into the policy approver. The policy approver and approves the policy, set manager, set manager means X509S with and then mounts it onto the volume of the workload. Now this happens in both the server as well as the client. So each one have their own machine identity now. Now what about trust? How do they talk to each other, right? So that's where the trust bundle comes in. The trust bundle, so the trust bundle is distributed across both the clusters which have the CAs. So if you look at next slide, you would see that the client's CA as well as the server CA is within the trust bundle that is distributed in the GKE cluster. So this basically creates trust. Now once you want to revoke trust, what you do is you remove the CA from the trust bundle in your server application, oh sorry, in your server cluster. This basically revokes the trust between the two workloads. So it's like pretty simple. You're not going across doing East West traffic and you're both revoking trust as well as, yeah, you're enabling trust, sorry. Now that we have this, I'll go through a demo of what we have. So I have, so I've deployed, I've already installed the cert manager as well as the trust bundle on using Helm on my GKE cluster. Now what I'm going to do is I'm going to create an intermediate certificate. Now you would be creating, we are creating the certificate just for demo purposes, but then you would be using your PKI infrastructure to issue the certificate and cert manager would do that for you, right? Now this creates a certificate with the certificate contents as well as an issuer that is used to serve the trust domain. So let's do that. You see an issuer there? So now what I'm doing is I'm creating the CA certs as a secret and then also creating the trust bundle that references both the client and CA. Now this trust bundle, if you see, would be distributed across all the namespaces. So there's a set of config maps that are generated that are synced across namespaces here. Now when I see the contents of the trust bundle, I would see both the certificate of the CA, the CA certificate for the client as well as the server. Now I'm going to deploy the CSI speed driver and this I see that I'm already talked about. It's a demon set. So this may take some time because, okay, we'll have to have the approval running before we, so the CSI speed drivers running on my cluster now along with the policy approval. Now I'm going to create my application, right? So deploying my edge server application. And now what we'll do is let's look at what's the S width, what's the SPIFI ID of this app that's been, so if you see here, there's an S width that has been generated and sorry, an X509 S width that's been generated and there's a SPIFI ID that's attached to it. Now let's go to the client, which is my Raspberry Pi and I'm going to be switching to that context. And here I've installed all the cert manager, the trust project, the CSI SPIFI driver and all of that, so that's what you see here. And let me go in and see my CAs. Again, I created my trust already. This has all my trust bundles that are synced across my workspaces. And here if you see, it has both the CA and the server server. Now I'll deploy my client application. See if it started, it takes a while to start. Okay, cool. And now I'm going to be testing out my MTLS, right? Between my client and my server. So I have two methods here that I'm going to call. Oh, I'll have to, I'm going to do a port forward now so that I can access this. So what you see here is the client application that's running on my Raspberry Pi is talking to my server application that's running on the GKE cluster. And this is, there's a handshake. There's an MTLS handshake that's happening and we're getting edge activated message here. You can also, I have another one called edge pay and it says payment successful. This is what you would see in a terminal when you activate and start using the payment when you start doing payments, right? Now what we're going to do is we're going to update the trust bundle and then rework the trust. Now, once we do that, we should, you know, any handshake between the client application and the service application should not happen. So here what I'm doing is basically updating my trust bundle. And what you could see here is now I have just one CA. So I'm removing the trust for my client. So these are the logs from the server. So we did two activations and then we created a payment successful. We also invoked a payment and it said payment successful. So let's go back to the, so now if you look at it, when I try to call my edge pay or my edge activate, I get a message saying that device unauthorized. So it's as simple as just updating your trust bundle and unauthorizing your client altogether. Excuse me, let me check the logs on this. Can I switch to the presentation? Yeah. Excellent. Thanks, Riyaz. Hopefully that gives some context to, you know, how all of this work. That, those two pictures are definitely not ours. I have somebody else's pictures, as you can imagine. So I wanted to give a shout out to a couple of our colleagues, you know, who've been working on this for a long time, especially Josh, who is also here. If you want to spend time talking to him about several other add-ons for search manager, he's the one who builds PIFI, part of Jetstack. There's also a much more detailed blog about, you know, how to manage workload identities across multiple different cloud providers. Feel free to reach out to that thing as well. And most importantly, one of the things that we've spent time in the last, I mean, obviously a few months and, you know, at least a couple of years or so, really leading, reading, solving the bottom turtle. So if you haven't read this book, I strongly recommend you going through this because this essentially talks about a lot of different things that we just covered, but also in a much more detail. Also going beyond PIFI into the world of SPIRE and what does attestors mean? How do you sort of use specific technology-specific attestors and things of that sort? So that will tell you the next set of things that you can absolutely do. I want to also make sure that if you are somebody who uses search manager, obviously we talk about TLS as digital identities. If you want a wax seal printed physical certificate that we want to carry with you, come see us. So we are actually minting certificate by hand with a wax seal and then I will give one to you that you can take as a souvenir. So visit us at our booth. So that will give you something to take forward as well. So where do we leave from here? So obviously we showed some of the things that's possible. There was also a lot of things that we can work on. Some of the things that we identified that absolutely needs to be done is a model-driven ways. Today we saw that there were some things that we were doing, obviously managing trust in a way where you're creating this resource called bundle and managing it. So we want to be able to model that via rules and things like that, something that we've been thinking about. Obviously, trust distribution across trust boundaries is also something that we want to look at in an automated way. Today, you can manage it, but you can literally have to figure out how to distribute trust across multiple different things. But then again, we also look at what does it mean to have control plane that we've been working on to essentially manage trust across different security boundaries and how you manage it, how do you get it. That's where we are, obviously, now. There is a QR code to provide your feedback. We appreciate any feedback that you might have. Talk about it. And we can take questions if you have. How does the CERT Manager model for this process that you've laid out here, how does this interact with doing MTLS through a service mesh, which is another person that I've seen to doing this, is this something that plays with that or is this in place of doing that through a service mesh? How does that interact? Absolutely. So we do have integrations with service mesh. So there are different service meshes that sort of, for example, if you're using Istio, there is an add-on that specifically is called Istio CSR that essentially mounts CERT Manager as a way to provide that workload signing certificate into Istioid. So that way any of the workloads that is part of your mesh will essentially be delegated to be signed by the signing certificate that is managed by CERT Manager. So that's what Istio CSR does. It is an add-on that you would install in your cluster wherever your service mesh is. But as you sort of manage your service mesh, you would essentially not use the default built-in CSR server or that's mechanism through which the CA is used. But the mechanics of CERT Manager essentially enables you to plug your organization CA into the signing process so that it is possible. So obviously CSI driver and CSI SPFIS use more in the context of non-mesh workloads, but for mesh workloads there is absolutely that mechanism available. So some other question there? Hi, great presentation. So my question is, how does it work with multi-cluster or virtual cluster? Have you guys tested those cases? Yes, so in this case, this was a multi-cluster model. So we actually used two different clusters. Actually, we had two different clusters, one on Google Cloud and one was running Raspberry Pi. One is running right here. Yeah. So the idea is that the trust domains are across clusters. And we want to be able to distribute and manage trust across trust boundaries. Today, that management of trust across that is via a trust bundle that you distribute with the CAS that you manage. And that essentially connects the trust doors or the trust CAS or the CAS that you're managing across multiple different clusters to communicate with each other. So in this example, we were using a multi-cluster model to distribute trust. We'll be here. I'm happy to take any more questions. Excellent. Thank you so much.