 So, hi everyone. It's great to be here in PersonaCubeCon, and thanks for sticking around with us. I know it's late on a Friday, and everyone just wants to go. But we're here to talk to you about multi-cloud workload identity with Spiffy. I'm Jake. I'm one of the cert manager maintainers interested in all things identity in X509. And hey, I'm Charlie. I work on the enterprise side of JetStack products. And I'm also interested in all things authentication and authorization. We're both senior software engineers working at JetStack and JetStack product, and are both interested in Spiffy. We think Spiffy's really great, and we're interested in trying to encourage more adoption in the technology. And that's why we're doing this talk. During a company hackathon at the end of last year, we worked on a project that was based on using Spiffy IDs to improve the developer experience for cross-cloud identities, or use of cross-cloud APIs for cross-cloud workloads. And the work that we're presenting today is based on that. So, we're going to have a presentation where we're going to explain about how we think about workload identity and what that means for us. And then we're going to have a short and explain what we built and have a short demo of that. And you'll also be able to get, we'll also share a link to our code at the end if you're interested in having a look yourselves. It's a proof of concept, but the code will be there. So, what do we think about workload identity? What is a workload identity? So, an identity is a way for a workload to prove who it is, prove its authenticity to other workloads and other anything that it may be communicating with. So, we believe that a workload should be issued this identity just by existing, just by running in a cluster, just by running anywhere. It's not something that it should, it shouldn't have to have anything that it knows, anything that it, any secrets that should be given beforehand. It's something that it should get just by existing, by running in some environment. So, contrasting this to what we often see people doing with, even with Kubernetes workloads, is they're putting secrets in their, in their deployments. They're including secret tokens to talk to other services. Like, we believe that those aren't identities and identity is a different thing and identity is something that you have just by existing. So, yeah, the problem, of course, many of you will be aware of, you know, you're trying to get secret tokens into many thousands, millions of pods, many different tenants and many different clusters. This very quickly becomes very challenging to manage at scale. We also believe that the best way to represent workload identity that we have is with an unforgeable document and we review that, a good solution to that has been an X509 certificate. An unforgeable document is a document that represents your identity that can't be used by somebody else. It still proves who I am, what workload I am, but it's not a document that can be stolen by a malicious workload and replayed to somebody else. And that's the way X509 is as good as it's part of, the communication is part of a challenge and I need to have the private key that is accompanied with that certificate to prove what that workload is, who that workload is. And we also believe that workloads shouldn't need to have or do anything to get their identity, so we talked about how we don't want workloads to have a secret in order to go and work out who they are or to have to call somebody, we don't want them to have to do anything. They should find their identity available to them or their identity is available to them without them needing to know where it is. So what do we have at the moment? You're running in, for example, in a single cloud provider, available to you in a Kubernetes environment that looks like this, that looks like workload identity. So many of you will be running in GKE and EKS clusters, AKS clusters, managed Kubernetes offerings. These have all got something that looks a bit like what I've just described, the workload identity definition that we've just described. It's possible for a workload which is running in one of these clusters to get an identity without needing to know anything, without needing to request anything. And it can use that identity to call APIs in that cloud very easily. And we see this as, this part of the problem is, or this case is solved. What's good about these identities is that they're easily consumed by cloud SDKs if you're talking to a Google Cloud storage bucket, the Google Cloud SDK is able to make use of that workload identity, and the workload doesn't need to know that it's using the workload identity. It could be using a service account, but the SDK works that out for the developer, and the developer experience is quite good. Developers also don't need to worry about, because of that, developers don't need to worry about how they're going to get a key into their or a service account key or some AWS credentials into their workload so that they can talk to the things they need to talk to. So, yeah, they, so they don't need to do anything extra there. The other thing that's nice about these identities is that they're automatically rotated. So there's no hassle when you want to go and you don't need to, when a key is exposed for some reason, maybe somebody commits it to a GitHub repo or something, they don't need to go around and update all the deployment scripts, and you don't have developers managing these secrets at all. But we see a problem with these identities, and with that, we don't see that as sufficient to solving this workload identity problem of making or this goal of trying to make sure that all workloads have an identity. These identities are proprietary and they are most useful and most easily used in the cloud that the workload is resident in. It's easy for me to use my GKE workload identity to talk to Google APIs, but although it is available, Identity Federation is still another thing to configure that needs to be configured such that you can call APIs and other clouds. You also can't use this identity everywhere. You can't use this identity to talk natively to a Postgres database because Postgres uses a different user and password and a different way of authenticating callers. So the identity is one of many identities and it's not universally accepted. And so what's frustrating about this is even though we've got something that looks like workload identity and we even call it workload identity, we're still back to that same problem of a workload having many identities and managing many identities. So we see that, and then we're back to creating additional secrets and credentials outside so that we can talk to other services that we need to talk to, talk to that database or talk to that random thing that's still left on-prem. And that's a security issue where developers are managing these secrets with different pipelines in different ways, potentially, insecurely, and in a large multitenant environment it's really hard to keep track of it. Credentials are left around and so on. They're not revoked when they're no longer used. Of course the other thing about those extra credentials which have been managed is that they're not cryptographically identifying the workload anymore. They're just, as we see, just another password that could be if the misconfiguration were to happen picked up by another workload and used. Such that identity is actually available to anybody who has the secret. And we see that as a big problem. So what we would really like to see is that there was a production-ready standard for workload identity where workloads could have a single identity to talk to everything they needed to talk to regardless of where that was. If there was a cloud provider or another open source tool or another workload that had been a custom workload in your organization it should be using the same identity and we'd love for that to be a standard. So I'm now going to pass over to Jake to talk a bit about what we'd like, how we'd like this to look and what we've done in this area. Great, thanks. So you've probably heard about Spiffy by now because it's been mentioned a few times in this track. It stands for Secure Production Identity Framework for Everyone. We first got exposed to it four years ago in Kubecon, Seattle and we thought it was just a really good idea. It's the foundation from cloud agnostic identity control plane. It's an open specification and reference implementation inspire to authenticate securely between services over untrusted networks without using passwords or API tokens. So if you think back to our first slide, I think that's the first thing that we said. Then the Spiffy spec defines Spiffy ID and the Spiffy verifiable identity document. So these are standardized identity. The Spiffy ID looks a little bit like a URL. You can see one underneath consisting of the trust domain and then any identity in the various path components. It's human readable and it identifies a person or a service. The Spiffy verifiable identity document it's a specially crafted X509 certificate so it has a very specific format. More importantly you can use it to cryptographically identify the bearer of the holder. This is what we believe is the unfoldable document that should represent our identity. And finally if you're using your SBID or SBID to perform a TLS handshake mutual TLS, it means that both entities on each side of the connection should be reasonably sure that the identity on the other side is who they say they are. So yeah, it's unforgible. It's CNCF incubating project in open source. So since 2020 Spiffy's been incubating. It's a collaborative effort based on many engineering years of effort inside hell. Big companies that you've heard of have integrated their service identities. So we can take all that knowledge in CNCF and we are happy that it won't become proprietary at any point. So happy to drive adoption of it. And I think we're not alone in this. Lots of people are starting to think about using it. It's getting traction at these talks at least. And we've seen a few customers at Jetstack talk about it as well. So how do we take SBIDs and make them cloud native? So as a maintainer of the cert manager project I'll tell you about that too. So cert manager is a CNCF sandbox project. It's dedicated to all kind of cloud native X509 certificate management. But it's a kind of umbrella organization that encompassing several projects. And I'm kind of lucky enough that I spend a good amount of my work time being allowed to contribute to it. The first component we need is cert manager itself. It's a kind of very popular Kubernetes operator. Most people are already using it, even if they don't know they are. We can issue X509 certificates from almost any CA. So the project started at Jetstack where we work. And it was originally designed for easily getting public ACME or let's encrypt certificates for your ingresses. But we kind of support all kind of internal CA's. And if we don't support it yet, you can easily write integration. So you can easily issue X509 certificates from your internal PKI. And the idea of cert managers that we'll automatically issue and rotate short lived certificates for you, you don't need to worry about them. And it's a CNCF project. It entered the sandbox in November 2020 and we're going for incubation right now. So hopefully before the next you've gone, we'll be incubated. So we've got cert manager and now we need SVIDs. So we need our workload identity. So we have something called CSI driver spiffy. This is kind of officially in the spiffy project now. If you scroll down on spiffy.io you'll see it. We said earlier that workloads should not have to do or know anything to get their identity. So if you've been in any spire talks they call this like solving the zero throttle. You don't need to trust something. You just need to get your identity from somewhere. So a CSI driver is a way of exposing a file system to a pod. So it's part of the volume set up of your pod. It will happen before your workload starts. And the cubelet tells you your identity as part of starting a pod because by deploying something on Kubernetes you've explicitly told the cubelet I want you to start this pod. So we're a kind of pre-pod startup. We generate a CSI driver called. We generate in memory private key. We generate a CSI from this and we submit that to cert manager. So the important thing here is that we get an SVID sign from cert manager and the private key never leaves the workload. Yeah. We can also put in kind of approval and attestation steps at this point. We have something called approval policy which might integrate with what you're already using for your kind of compliance. But you could insert policy at this point and then, yeah, your identity is available in the pod before the workload starts. And the final part of any mutual TLS is you need the trust bundles for your trust domain. So we have a third project in this demo called trust. We don't actually know if it will keep that name because it's a bit generic. But it's kind of a trust route distribution management. So how do you verify that the SVID that is coming to you mutual TLS is valid? Well, a trust domain is basically a bundle of CA certificates and we'll just distribute them to workloads based on the correct context. Which workload should trust which trust domains. So we combine all of these. So all of our workloads will have SVID available before they start. So at the beginning we had a third point which was workloads should not have to know or do anything to get their identity. So how are we going to solve that? So we're kind of, we like a lot and we're trying to drive adoptions but if you look out in the ecosystem right now mostly the most users of SVIDs are just using it to configure MTLS between envoy instances. Which is great for the service meshes that are already using envoy. They can adopt SVIDs and all of your service mesh traffic will be completely encrypted and validated. This works with Istio and Kuma and all of the other projects that are on spiffy.io. But while they're using spiffy it doesn't increase adoption of the SVID because it's opaque implementation detail. By default you talk unencrypted connections to your envoy sidecar. Envoy does the encryption and verifying of the other end which is another envoy which decrypts it and then just passes it to you without any context. So you can't verify who is calling to you. This is sad so we want to increase adoption of spiffy and the only way to do that is to make it easy to use. We want to get to the point where everything will use SVIDs natively. So for our kind of mini demo we set out to recreate the state-of-the-art cloud provider experience that Charlie talked about where you just call them and they know who you are. But we want to use a standard identity document on different services like spiffy, ID. But the developer experience has to say the same. It has to be seamless to use. So they should just continue writing the same code they've already written and call their cloud services and it just works. So we're going to show you a live demo. I hope it works. I'll just quickly show you what it looks like so say you've got your app and you're talking to your cloud provider and it's just working. We have built a connector called spiffy connector. It's very unimaginative which can verify an incoming SVID and swap it for very, very short-lived temporary cloud credentials. So you still have them but they're very, very short-lived so you hopefully can't be stolen from them. We have a sidecar in the pod that is talking to this server over gRPC and this gRPC is using the ghost spiffy library to authenticate and validate the incoming SVIDs. We can actually get our SBID from anywhere because we support fire as well in case you already have it but we like our CSI driver so we're using that in this demo. The sidecar will automatically pick up the SVID. It will talk to the spiffy connector server and exchange it for very short-lived cloud credentials and write it in the kind of well-known location that the cloud SDKs are already expecting credentials to be. And then you just call the cloud APIs and it works. We know that you can do this kind of already with Kubernetes service account identity federation but this kind of idea is that we're using a standardized identity. It doesn't have to be on Kubernetes. A lot of people struggling with workload identities are like how do we get from our on-prem and hybrid environments to clouds and this is how it would work. We just believe it's more secure as well because your single identity document is this signed X599 certificate and if it's being minted from your internal CA you can check the audit logs and it will say you know exactly how many identities have been issued and who should be using them. So, it's time for the live demo. Are you ready? Yeah. Let's see if I can get this to you. There's a big lag so bear with me. I'm also going to zoom this in a bit. Apologies. We should have done this before but we're doing it now. It's a live demo. We can read that. A bit more? Good enough? Okay, cool. So, on my laptop I've got our demo running in a kind cluster. We're running those two green components you just saw on the diagram. We've got the Spiffy connector server which is issuing the short-lived credentials and we've got an example workload and we're going to first look at how these two are deployed and then look at the things running on the cluster. So, let's first look at the Spiffy connector server deployment. So, the Spiffy connector server is just a deployment. It's a single replica but it's importantly is configured with an ACL or a list of ACLs. So, here we can see that our example what we're calling match principle matches the Spiffy ID of our example application. So, you can see that that application or workload is able to get certain sets of credentials. So, the first type of credential it's able to get is a Google IAM service account key from the Google IAM service account key provider with an object reference to this service account that's running in my personal GCP project. It's also able to get AWS credential by assuming this role from a different provider, from an AWS STS assume role provider. The server is also configured with its own Svid. So, the server knows how to find its Svid by looking somewhere at a mounted set of files. So, that's also set in the service configuration. And that's so that the workload knows to trust the server and see that in just a second. The deployment itself is fairly simple. It's just a Spiffy Connector server. We make use of this config file. I've just talked you through. The other thing that the server does have is the server does have some credentials such that it is able to go and mint those short term credentials. So, we've moved that away from the long lived credentials away from the developer workloads into the Spiffy Connector server in our example. And so, that's what you're also seeing here. Again, as you can see, the Spiffy ID is also treated as a volume mount. And this the Spiffy ID is made available to the server using the CSI Spiffy integration that we have of St. Manager. So, that's how both the server and the workload are getting their Spiffy IDs. And these volumes as you would expect for the cloud-provided credentials. So, now I'm going to show you the example app which has a sidecar running next to it. And we're going to have a look at what that configuration looks like too. So, the application that we're running is a simple application which is going to move files from one cloud to another. You might imagine some in other more complicated multi-cloud workload example, but that's what it's doing in our case. That's what the, imagine that there's some use case that we want to move files periodically or whatever between two different clouds. So, the configuration that the developers have written is here. They are exposing a service where people are instructing the application on a given port and they are talking between two different buckets in the cloud provider. And these are configured here. These are the two bucket names. And it's also configured as a simple deployment. In this deployment we have two containers. We have the example application which runs and serves the application. And it has some volume mounts to share which are shared with the sidecar which it uses to access the short-lived credentials which are collected by the sidecar. And just before you're wondering like, hey look, you know, you said developer experience better. This looks like something that the developer would need to configure themselves. You could perhaps imagine if you were to use something like this that a mutating webbook would inject configuration or sidecars as you are familiar with for other tools. It's included here to help you understand what's going on. Mutating webbook would be rather magic and I felt it was clearer just to show what it would look like. Alongside the workload is this connector sidecar. And the Spiffy connector sidecar is configured. It expects to find a Spiffy connector server at a particular place in the cluster at this address. It also expects the Spiffy connector server to present a particular Spiffy ID back. It should have this ID back and otherwise it won't talk to it. And it's also configured to find its own Spiffy ID and that Spiffy ID or its own Svid and that Svid is presented to the server to identify itself. And I've just had a pop-up on my other screen. I'm just going to press skip the client. And so yes, and as you can see it also has these same shared volume mounts. So the Spiffy connector sidecar is able to write the short load credentials into the volume mounts and then those are made available in the workload container as well. So that the SDKs in the workload are able to call those cloud APIs transparently without needing to know where those how they are being authenticated to talk to them. So we're going to have a look at the infrastructure if you like for this that's running in the cert manager namespace. Cert manager itself along with the cert manager CSI driver Spiffy and the cert manager trust project are all running in the name space to make all of this possible. So as you can see we've got cert manager components and things here that this is allowing all of this to happen and allowing things like that volume configuration that we saw in both the server and the example application to work. So when that volume configuration is there the Svid is automatically made available. Now we're going to have a look at the pods in the Spiffy connector server namespace. So we've got one Spiffy connector server running. We can have a look at the logs for the Spiffy connector server to see what it's been up to. It's loaded its configuration. It's also supposed to show that it's also supposed to show that it's issued some credentials to our workload. Now we're going to have a look at the example workload. Hopefully its logs are more interesting. We're hoping that here we'll see that it's requested and got Google and AWS credentials. It says that it's starting a credential manager for this particular workload and that it's got both Google Cloud and AWS credentials and they've been saved in the respective paths. So that's the end of the terminal demo. I just want to as a proof of concept I want to show you this application actually doing something. I'm going to try and sorry I'm back in lag mode over here. Let me just get this out of full screen and try and recover my window that I thought I was moving over. Here it is. I'll do my best. Hopefully it's pretty complicated interface so if you can't quite work out what's going on you let me know. So the workload is a very simple server that moves files from one cloud to another and you can hopefully work out where these different files are at the moment there in AWS S3. I've got the S3 management console open here and if the conference Wi-Fi works then I'll be able to show you that it is actually here by clicking this button. You can see that the file is still sitting there in S3 and in my other tab here I've got a GCP bucket with no objects in it and I'll prove to you that there are no objects in it by clicking the refresh button again. Nothing has shown up. Now if I go back to our example workload and I click move files it should move the object from one cloud to the other and that's all the credentials that allowed that operation to take place were all made possible by the SFIFI ID and the SFID that that workload had. So hopefully we've proved that it's a live demo we've proved that it can work in some way. I'm going to switch back to try and find my Chrome window for the presentation and if I go back up here and go present a view I hope it will just continue oh no that's not what we want. We're very prepared it's okay. Well let's try and let's try and do this exitful screen and let's do bring this, I'm going to bring this back over here just tell it what to do let's do a present a view we've actually only got one more slide to work for that one final slide but it was quicker in my head there you go and now back here let's just make this window the right size as well and we can go to the next slide so hopefully what we have shown with this proof of concept is an example use case where SFIFI IDs are used as first class citizens we appreciate that you may have other ways of solving this problem that aren't as awful as we told you they might be what we really wanted to show here was that SFIFI IDs can be useful and you can use them and they can form valid paths and channels of communications between workloads so it's been really interesting to Jake and myself and my colleagues at Jetstack have been staffing the cert manager booth this week and we've had lots of people coming to us and talking about these fantastically complicated systems they have for getting certificates for different host names and different workloads and different clusters because they all meant to have the same identity but they're all running in all these different places and it's kind of like have you thought about using a different kind of identity have you thought about using SFIFI and so it's actually really validated and other discussions we've had this week have really validated that people are looking for some kind of standard here so we've got all sorts of ideas about what we want to do with this stuff like this is a toy example it was an example that Jake myself and our colleague Josh here in the front row did as part of a company Hackathon in a few days it was the easiest thing we're all familiar with talking about provider APIs all the time and it was the obvious thing we could come up with but what could you connect I'd really love to build a Postgres connector that allowed me to talk to Postgres databases and get links with the IDs to Postgres identities I'm excited by authorization as well what could you do if you knew the identity and processed the query such that certain identities were allowed to access certain tables and operations within the database you can imagine a world where things accept Spiffy everywhere like why should I need a GCP key in the first place why can't I just talk Spiffy straight to GCP or straight to any cloud provider or even any API so yeah that's it that's what we wanted to talk about thanks very much for listening and for staying around this late it's been a tiring week for me Jake and it's been a tiring week for many of you I'm sure so thanks for coming at the very end to listen to us plug Spiffy for a little bit just before we go on and hopefully have some questions if you want to chat to us those are our work emails we're also on Twitter you can find us or chat to us at the end we'd also like to say a quick thanks to our colleague Josh who's got the camera in the front row and I worked with us on the initial our colleague Sit-Around is waving at me from the back thanks Sit-Around Josh only worked with us on that the hackathon project where this came from but has also been important in building CERT Manager and CERT Manager CSI Spiffy which is the kind of the fundamental infrastructure that we see as being an important step in making Spiffy workload identity really work in Kubernetes environments so yeah a promise there would be code there's a link the first link is the Spiffy connector that we've presented there's the example application and things that's all in there I don't think it's production ready but you might find it interesting to look at it you can also go and check out the CSI driver Spiffy and the CERT Manager GitHub generally we'll just go and read a bit more about Spiffy come talk to us we really want to drive a world where Spiffy is used and the pragmatic way of doing that is to build more support into other things so we'd love to collaborate with you and just come chat to us we're on the Kubernetes Slack in CERT Manager dev channels but we're also on the Spiffy Slack which is probably a better place for it just come chat to us thank you yeah thanks we haven't given any instructions about questions but I don't know if people have any questions we have a microphone here oh there's a microphone might be good to try let's try the microphone right thank you very much for the talk it's very inspiring I would like to ask because maybe I lost that part how does the server which actually may run per cluster one per cluster so to say acquires these credentials from the cloud provider itself I understood that the connector connects to the server but I'm missing the bit where the server gets the credentials from the yeah so in our toy example the server has a long lived AWS credential which is a bit of a fraud because we just said don't do that but you can connect it with your we're moving it to a single point and probably in production it would be like vault or something okay so basically you would use something that the cloud provider provides so if for example cloud provider could use x509 directly that would be the best for you because you don't even need to connect the right and you can kind of do it with OIDC federation already so maybe you would do that as well alright thank you any other questions pass the microphone across you mentioned integration sorry with cloud provider so in that case wouldn't it be the way that when I look at for example audit logs on my s3 bucket or whatever I'm accessing that I would only see the identity of that spiffy server and also that server like needing access to everything that every workload needs needs access to or is there a way to kind of separate that out to be able to use those identities outside of the spiffy world yeah you would see just the spiffy server doing those you would see the short-lived credential that was issued by the spiffy server I suppose which you could you would have an audit log showing you that the spiffy connector server had issued you a credential and the credential was used in that way but yeah like it's not ideal but it's meant to just be an example really use case you would have to like piece it together from both logs and then figure out what was happening yeah I mean we said already our ideal situation is that the cloud provider just speaks spiffy and we wouldn't have this but we also no one's going to use spiffy if you can't use it right now you have adoption and then get people to notice and then start implementing native support okay have you looked at like for example in Microsoft Azure AD you can actually use certificates for authentication of applications so have you looked at a way of maybe integrating that to automatically generate those identities and then it gets in there I actually haven't looked but while we've been looking at the ecosystem in general while most things say that you can authenticate with MutualTLS they'll be doing some parsing of the common name basically just for identity which is not the same as what a SBID looks like so even if things claim that they speak MDLS it doesn't really it doesn't necessarily mean that they'll be able to abstract take an identity out of the SBID but we'll look at Microsoft because we want to support more things and this was just a short demo okay thanks it's not something we're familiar with as familiar with any other questions for we finish it cool well you can always chat to us we're available on various selects and our company page jetstack.io thanks very much