 Hello, hello, hello, everybody. Officially day number one, I guess. Day plus one or two, if you've been to some of the co-located events. Thank you for coming. I'm Evan Gilman. I'm a maintainer on Spiffy Inspire projects, both. I've been on those projects for, since they were started, I guess, I said for five and a half years, maybe, or so, in 2017. And the title of the talk today is Don't Mind the Gap. We're gonna be talking predominantly about how to use Spiffy Identities to authenticate the third-party resources, specifically to the three major cloud providers. There's a few tricks up our sleeve. We can talk about others have kind of clued into the trick over the last year or two, which some of you may have noticed. I'm trying to keep this talk as casual as possible. I've got 20 slides, it's been running 20 minutes. If folks have questions, anything like that, please feel free to raise your hand. In the middle of the talk, I'm happy to kind of stop and answer your questions, make it personal. I've got a mic up here, there's a mic back there, or we can shout or we'll figure it out. So, we'll talk first about the Spiffy Basics, kind of, we won't go over all of Spiffy, but we'll go over the things that are important to understand for the content in this presentation. Similar thing for Spire, cover kind of at a high level, some of the important things that you'll need to kind of make sense of the rest of the talk. And then, I'll step through kind of like three examples of how to do this, AWS, Azure, and GCP. They're all kind of, they all leverage kind of the same underlying technology, but have their little quirks and differences here and there kind of thing. And then finally, we'll have hopefully plenty of time at the end for Q&A, if nobody raises their hand during my presentation, which I hope they do. So, without further ado, you can dive on, oops, wrong one, Spiffy Basics. The first thing to know about Spiffy is the Spiffy ID, and I apologize, some people here tend to be newcomers, some people tend to not be, so bear with me. Spiffy ID is kind of what it sounds like. It's basically like a username for workloads, I mean, Spiffy deals with workload identity, this is kind of like the core thing. And as you can see here, it's like a structured string, it's structured as URI, it is compatible with 3986 or whatever that URI RFC is. It does have some additional limitations placed on top of that to constrain. So there's a lot of stuff in that RFC. If you spend adequate amount of time, you'll understand that. And so there's a few kind of critical parts. The first one is, of course, the scheme, which is Spiffy, which just kind of states this is a Spiffy ID. The second is this piece we call the trust domain. The trust domain is what I would call kind of like a security domain. You use a one-to-one relationship between a trust domain name and a set of identity issuers. So when you look at a trust domain name, you should immediately know, okay, it's coming from kind of this zone or these set of issuers. And you can see because it's part of the ID, all of the Spiffy IDs are qualified by this trust domain name. And then the final piece is kind of the name of the service. And this can be, you know, you can have multiple levels in there, the same as a regular URI, you can make it hierarchical, you can make it however you want. The spec doesn't say much about restricting kind of the values of these things. This is a little bit of a choose-your-own-adventure. But given the Spiffy ID, you can kind of see immediately, you know, which authority issued this identity and what is the identity of the workload that it's supposed to represent, right? The name of the workload is supposed to represent. And then these Spiffy IDs, they get encoded into a document. Spiffy supports two documents right now. It supports a X509 certificate and it also supports JOT token, ATM machine. And so there's some spec around kind of like how you embed this ID into a SIRT or into a JOT and where you find it, how you validate it, all that kind of stuff. So that's the Spiffy ID thing. And a little bit of what we call that document in SVID, Spiffy Verifiable Identity Document. Then we have this thing called the bundle. The bundle is that collection of authority keys, you know? So when I talk about a trust domain and there's some set of authorities in the trust domain that are responsible for issuing identities, you know, they have, these authorities have a set of public-private key pair. This bundle is a collection of those things. So, you know, you'll have one bundle per trust domain and the bundle captures all the signing keys that are authoritative for that trust domain. You can see that they're specified, but it's for this kind of SVID or that kind of SVID. And these can grow, you know, we do, Spiffy supports root rotation this way, you know, so these can kind of grow and shrink as time goes on. You know, you can introduce a new key in advance, rotate to it, remove the old key, that kind of thing. So these bundles are kind of living and breathing in a way. And then, you know, the keys that are referenced in this bundle then used to sign SVIDs for that trust domain. And that's how the validation happens. So then when we talk about Spiffy Federation, what does that mean? So that's like, hey, I want to take identities from one domain or once this set of systems and talk across this boundary to another set of systems into another domain, how does that happen? That's fairly simply just this collection of kind of bundle data that we just looked at. And you'll keep this map and you'll know, hey, trust domain foo, there's the bundle, trust domain bar, here's the bundle. And when you encounter one of these SVIDs, you can kind of do this look up. You can say, okay, this SVID is this Spiffy ID, therefore it is from this trust domain. And I select the bundle for that trust domain to do the validation. By doing it this way, it's a lot different than like WebPKI, which has kind of a set of, you know, keys and certs handed down from on high that are good for anything kind of thing, right? Spiffy like delineates this and says, okay, we want only authorities to be good for certain aspects. And this gives the kind of isolation I'd mentioned earlier, a security domain. You know, if one of these trust domains is compromised, it doesn't automatically translate into another trust domain being compromised because they're totally separate signing infrastructure and totally separate keys. So we usually see people use these trust domains, things like staging production, Coke Pepsi, you know, cross org, cross environment, that kind of stuff. So that's Spiffy, or the important parts for this talk at least. We'll talk a little bit about Spire for a minute. Spire is a software implementation of the Spiffy specs. So when we talk about certs and trust domains and all this stuff, well, like who is signing those certs? You know, where do I get them from? All this kind of stuff, that's what Spire does. Spire is written in Go, it's designed to run in as many environments as possible. Spiffy is platform agnostic and Spire is kind of like, what should I say, like the darling implementation of Spiffy. And so in as much as Spiffy is intended to be run out of everywhere, Spire is as well. Runs on Linux, runs on Darwin, or run on your BSDs and as of recently it'll run on Windows too. So these are kind of the high level components on Spire. Spire has a server component and an agent component. The server component is the piece that does all the signing, it's got the keys, it's got all that good sensitive stuff. Then we have these agents, these agents run nominally on a per node basis and they're the ones who are actually providing the identities to the workloads, right? One of the really important things about Spiffy is that it solves this problem that we call secure introduction or secret zero problem. So, you know, we need to play software and it pops up. How does it speak to the first system? You know, how does it begin that first authentication process? A lot of people try to solve this through injection of credentials, baking things in the containers and all kinds of other stuff. Spiffy doesn't require any of that. It has this concept of what we call a workload API and that's what these agents provide. So when a workload is deployed and it pops up, it talks to these agents over this workload API. The workload doesn't have to have any kind of secret or anything like that and it doesn't need a token or anything, it kind of just says, hey, who am I? And then it's the Spiffy implementations job, in this case, Spire, to figure that out and the end result of that process is the issuance of an identity. And so when we talk about doing this without any kind of secret being necessary or token or what have you, you kind of need a way to describe things, right? And you need to declare, hey, you know, Bob is Bob, Alice is Alice. So Spire has this concept of registration in order for a workload to get an identity from Spire, it must first be registered. And the registration is basically saying like, hey, look, we've got this workload, her name is Alice. She runs generally in this cluster over here. This is her general shape and look, she's this tall and whatnot. When you see this workload, you know that's Alice and you give her the Alice identity. So this is, together, this information is what we would call a registration, which is effectively that description. What is the workload's name? What does it look like? And where should it be running? Then the final piece of all this Spire component stuff is what we call the Spire OIDC Discovery Provider. This is a separate piece of software. It's shipped as part of the Spire release. It's a supporting utility. And it can run kind of next to Spire server or next to an agent, whichever makes more sense for you. And its job is to kind of massage the Spifi bundle into something that looks a little bit like OIDC. OIDC is a technology, it's OpenID Connect. So technology that was, I should say, a standard that was defined predominantly for user auth. But it's kind of a hijacked a bit here and in other places too, to do also kind of workload auth. It's a very, very simple HTTP-based process, regular web TLS protection around it. So this OIDC Discovery Provider supports ACME. So you can deploy this thing, it'll come up. It can grab a web PKI cert and it can kind of serve these keys up for you in a public fashion. And then what it does is in this massaging here, as you see, the OIDC spec is very, the Spifi, the bundle spec is very, very similar to the OIDC key spec, however, there are some minor divergences. And so what this OIDC Discovery Provider effectively does is munch this thing and expose it with a web PKI cert. So you can see here we redacted like some of the X509 related key material. And we also redact this particular use field which we have found to be problematic with some OIDC validators. So this is kind of the final piece to make the whole thing work end to end. So our previous diagram, if we replace the right-hand side with AWS, AWS can kind of stand in here and hit the Spire server and grab those keys and do validation the same way that another Spire server might be able to do this. The diagram I have here doesn't show the OIDC Discovery Provider, but it would be running kind of in this diagram co-located with the Spire server as a part of its pod or something like that. And so how exactly does this work? Oopsies. AWS has this concept of a federated web identity. There's a service in AWS called the STS. I can't remember if it's simple or secure token service. I'm pretty sure it's the last two. And what it enables you to do is to go and exchange a credential for an AWS credential. So what we can do using this endpoint is we can come to it with a Spiffy ID with a Spiffy credential and we can exchange the Spiffy credential for a time-bound temporary AWS secret access key. And the magic about this is that, hey, you know, Spiffy doesn't require you to like store secrets on disk or do anything. It does all this stuff at runtime. It doesn't require you to do secrets injection, secrets management. Any time that you have to talk to S3, for example, you're gonna need an AWS secret access key and there's a lot of pain around managing these things. Oftentimes they don't expire. People have MFA policies on their account but then they have to get the non-MFA secret access keys for their workloads because how do you do MFA with your workload? All these kinds of things. So the way that we do this is we create this mapping, this trust relationship in AWS and we set up this web identity provider and you pointed at the OIDC discovery thing that runs next to your Spire server. And once you've set this up, you can then create this mapping list. So here on the top you can see that there's a role ARN. So here what we're saying is this particular AWS IAM role is mapped to this particular Spiffy ID on the bottom right, spiffyexample.com, S3 test app. And so when somebody hits STS and they've got this particular identity and this particular audience, this is with Jot token, by the way, then the caller is entitled to receive temporary credentials for the role described at the top. And this is pretty awesome because it negates the whole where do the keys go? Where do they live? I put them in a vault. How do they get injected? How do I rotate them? Would things break if I rotate them and all this kind of stuff? So this is kind of the basic trick. It is made way cooler by this little project that Square wrote. You'll notice a theme in these integrations that all three of the cloud integrations I'm talking about today have been written and published by the community. We have a really strong, wonderful community Spiffy Aspire and people publish these kinds of things all the time and supporting code to go with it. So kind of standing on the shoulders of giants in a bit in the parts of this presentation. So Square has written and published this tool to help you do this authentication at AWS with a lot less headache. Even though we've already removed the headache with AWS secret access key, there's more headache to be removed. And the way that this works is by shipping this little utility that hooks into the standard AWS config file. And so basically all you have to do is you drop this little config file on disk and you also drop this with the AWS Assumeral Utility on disk. And so long as your app uses the AWS SDK to talk to S3 or Redshift or whatever else you're using, the logic to hook and look for this config file and call out and do all this stuff is already there. So these apps have this AWS SDK, they can be running anywhere, they can be running on-prem, they can be running in Azure. You name it, it can be running under your desk. So long as they're able to get a Spiffy credential, then they don't actually have to be written with the Spiffy SDK. They can just use the regular out-of-the-box AWS SDK with this little helper utility. And what this helper utility does is it knows how to talk to the workload API, grab a Spiffy identity, go talk to STS, grab the temporary STS credential, and then return it back to the app. So we have a lot of people who have migrated apps out of AWS that are none the wiser effectively because none of the app code has to be changed. All they need to do is drop these configs on disk and you're off to the races. So that's pretty cool. I'll show you a little bit about Azure, similar thing. They have a service that you can go to and it knows how to do the OIDC swap and you can bring your Spiffy jot there and they can map that back to an Azure identity. List, blog post, identity digest blog, this is an excellent blog. Uday, I believe, as an identity architect at Microsoft, he's been there almost a quarter century, wrote up this really great how-to on how to use Spiffy identities to authenticate to Azure and he also published this diagram, which is kind of, you know, spelling out the flow that I was just describing, right? You have a Spire server, you have this OIDC thing, you've got a workload, it comes up, it gets a Spiffy ID, it goes and talks to this endpoint in Azure, it does this exchange, it gets back, Azure creds. And then with the Azure creds, it can do whatever it wants. Now you know that Kubernetes cluster, that big green park there, like I said, that can be anywhere, it doesn't have to be in Azure because the Spiffy identity is platform agnostic. And then finally, we have GCP, similar trick here. This is from Christoph Grotz, he's an engineer on a Google Cloud team. Google Professional Services actually published this blog post and this utility, which is similar to, similar to the magic that Square published, this utility is designed to kind of let your on-prem instances talk back to GCP without any application modification the same way that the AWS thing does. Rather than using a file and a helper process of the call out, this integration actually turns on a proxy and this proxy steps into the Google metadata endpoint and emulates, effectively emulates the answers that the Google metadata endpoint would give you if you were running on Google Cloud. So similar to the AWS thing or software can kind of pop up but it already, or you're using the GCP SDK, it knows it can talk to this metadata API, it reaches out and it finds this proxy and this proxy has the smarts to get the Spiffy identity to go and make the exchange with GCP and then bring back your GCP cred. Question. The question was that this diagram presupposes that trust is already established. I think that you're asking about that screen from the AWS, yes it does. There's a configuration that you need to do in GCP similar to AWS where you point it back to your OIDC provider, your Spire server. There's a certificate involved there, usually one PKI certificate involved there. So this diagram presupposes that that configuration has been done already. The direction, so the question here is the direction of request is it from Cloud provider to Spire server or is it Spire server to the Cloud provider? Well, in terms of the Spire server, or Cloud provider to the Spire server. So the Cloud provider periodically will reach the Spire server, grab the recent public keys, pull them back, the workload obviously reaches into GCP. So you can see, I talked earlier kind of about how it's all kind of a similar underlying technology here that enables this with different kind of nuances on each one, different Cloud providers decide they wanna do this thing differently and so the way you can figure them, the way you can see them kind of subtly varies from case to case. However, the underlying mechanism is relatively universally applicable. A lot of software supports this kind of OIDC auth. So today we talk about AWS Azure GCP but there's a lot of off the shelf software and other third party services out there that also support OIDC auth. I hear about them all the time. And so if you're using something that supports OIDC auth it's off the shelf where it's a service or something like that. There's a good chance that it will be compatible in this way. And so the key takeaways here and number one is that Spiffy Jots are OIDC compatible. If you look, there's like a little logo, it's like an O with an I, a little circle thing. A lot of these software support OIDC already and those that do, there's like a 90, 99% chance, 90, 95% chance that it's just gonna work out of the box. Spiffy and Spire, both platform agnostic, they can run anywhere. Spire is super, super pluggable. So all the kinds of things that Spire needs to rely on underlying platforms are all pluggable. So there's AWS plugins, there's plugins to interact with hardware TPMs, there's plugins to do all this different kind of stuff. So the real thing here is that a lot of times access to these services has been kind of gated by where you're running your workload and that is no longer. If you want to use the Google machine learning stuff from inside your data center, you can do that very easily and you don't have to fool around with any of these keys. And yeah, so that's one weird trick to avoid all the pain and management around kind of managing these tokens, storing them, encrypting them, injecting them, all this stuff. All the Spiffy identities are all kind of short-lived, fast-rotated, everything's fully automated from the root down. So this problem kind of just vanishes. If you wanna learn more about this stuff, we have obviously our GitHub repos. These are a couple of the repos I talked about today with the helper utility from AWS and Square and also the helper utility from Google Cloud Platform to do Spiffy off from anywhere. That's all I have for you today. Open up for questions. I'll leave this up for a minute but I just wanna take pictures and then I've got a QR code for talk feedback. Thank you for coming. It's great to see you all. Yeah. Nope. Here you go buddy. Thank you. So I'm in the process of trying to cook up a spire deployment for my environment right now and the challenge I'm encountering is that the list of trust domains keep growing and growing and growing. And so there's also, it appears to me, a tight relationship between trust domain and spire deployment. So let's say I'm gonna have 50 trust domains, now I have to manage 50 spire server deployments. Are there any plans to consider consolidating multiple trust domains in one server deployment? So I don't have to, I'm fixing one problem that I'm creating an operational burden on the other hand. Is there any advice or thoughts around that? Yes. Well, I have some thoughts, I don't know about advice. Ha ha ha ha. The kind of exploding trust domain problem is a problem that I've seen a number of times, particularly at shops with larger scale. You know, I think this also kind of comes down to a trade off, you know. There are some tricks you can do. So the first thing is that, you know, the spire server can be deployed in what we call as nested model, so you can have multiple spire clusters participating in one larger trust domain. Oftentimes people reach for many trust domains because this is kind of like the easy thing to do with their model. For example, if I'm just stamping out a bunch of kind of like equivalent Kubernetes clusters, the easiest thing to do oftentimes is just to say, okay, well each one of these is its own trust domain. There are drawbacks with that, you know. This federated auth we talked, I talked about earlier where you look at the trust domain and you choose the correct bundle. I mean, Envoy supports this natively, other software support this natively, but not everyone does, you know. EngineX doesn't, a couple others. So anytime that you talk about this federated auth, already you're kind of talking about the cost. The other piece of this is that each one of these bundles, there's a size. When you have a lot of them, they get quite large and there's a lot of network traffic involved in that and the agents suck these things down. So we're exploring ways to kind of make this problem better. We do have like the largest expired deployments are over a million agents, so they're like way, way bigger than largest Kubernetes deployments. So these things can't be done, but we tend to see usually these really large deployments go for like two, three, four trust domains kind of thing. And then they lean on kind of these nested architectures and other kind of tree-like topologies to achieve the scale and reliability that their deployments need. So I would say, I guess my advice would just be like, hey, you know, the trust domain thing is kind of like a big hammer. And like kind of go about it as like, put it in the places where you really need it and not the places where, you know, that's just kind of the easiest thing to deploy. I hope that makes sense. Without knowing more about your environment, it's hard to make it more concrete. Yeah, would you be opposed to one large trust domain and the receiver of the credential, like looking at the latter part of the SPIFI ID and like dissecting that? Because it's minted in, so like, if you want an environmental segmentation, would you like, would it be okay to have one large domain? But let's say a receiver is only in like a staged environment and they could look at a bit in the URI and say, oh, this is staging and then authorize on that. Do you think, is that an anti-pattern according to you? You definitely want to check the workload ID. I mean, there are some services that it's okay to just say like anything from staging. But in general, you really want to check the workload ID. Either things I think about like NTP, DNS, things that are like centralized services, you know, that makes sense for, but I would recommend like the trust domain name should be part of your authorization policy, but so should the workload ID. Thanks. Hey, I think I have a small question. So as I understand from the SPIRO registration to the SPIRO ID, it is kind of a fixed one-to-one mapping that you specify a list of selectors and that is mapped to a specific SPIRO ID. I wonder is there any plans to make this mapping a little bit more flexible and dynamic? For example, by, it could be as simple as some like a format substitution or maybe a limited expression like which is there. So that with a single SPIRO registration that can be used to map to like maybe a thousand or a hundred like SPIRO ID for a thousand and a hundred of workloads. Let me make sure I understood your question. You're asking if it's possible, like if we've been thinking about having SPIRO break away from this kind of like one registration to one workload ID pattern and instead like be able to configure SPIRO with a Regex or something that would then issue identity from that or? Yeah, something like that. Maybe a more concrete example is for example, I have a selector service name, the value could be anything and I want to embed that service value into a SPIRO ID so that I do not need to repeat that value part. You do not want to repeat what, I'm sorry? Like the selector, the value of the service name. Yeah, yeah, yeah. One of the problems that we have with this is the security model. So SPIRO is designed to survive node compromise. So if you know any node in the Kubernetes cluster or any physical node or otherwise is compromised, the security model is such that, there's an agent running there. The security model is such that you can not just take control of this agent and go into any identity you want. And in order to put that security control in place, you kind of have to know which agents are authorized to mint which identities. And so it's difficult to kind of say hey, this agent can mint any of foo or bar namespace identity because you still need kind of need mappings for that and oftentimes the agent identity is like very disjoint from the namespace of the workload. And so this is something that I've been thinking about personally, like what could we do to avoid kind of this registration overhead, you know? In Kubernetes, we have the SPIRO controller manager, which is a controller that can watch its deployments as they're placed and will automatically manage all these registration entries for you. And you can kind of annotate or label these deployments with spiffy IDs that you would like them to take on. And so some of this overhead is negated in that way, but I think there's more we can do. And right now some of the core kind of, core design and security principles those fire make it difficult. I don't know if that's exactly what you were asking, but I feel that it's something related to like registration management and like defining identities for a large number of workloads in advance and this kind of thing. Yeah, thank you. Hey, so in the examples that you showed, it looked like you're effectively swapping your spiffy ID for like short live cloud credentials. So is there kind of any work going on or any interest in like the cloud provider supporting Svids like natively so you wouldn't have to do the dance? I have not heard of this, but you see that, you know, this GCP answer was written by Google cloud. AWS has also began to like AWS app mesh has native integration with spire. So you can bring your spire and plug it into AWS app mesh. So I haven't seen cloud providers directly trying to pick up spiffy off, maybe in the future they will, but we do see them as we do see them kind of engaging and making this story better. Any more questions? Thanks everyone. Oh, one more. One more, one more. Sorry, false start, false start. I think you skipped in the talk, but there is a process in AWS to get the trust of the OIDC provider. So what is the OIDC provider for the spiffy identity? Like how does that connection get? So you're a spire server and your OIDC provider you can ship it kind of in the same pod. The OIDC provider talks to spire server and gets all the bundle data. And then it kind of turns on a little web server that serves the OIDC discovery document and a few other kind of documents to make OIDC work. And it also munges these bundles to be kind of like aligned with the format that OIDC validators expect. It speaks Acme and grabs a web PKI cert. And so when you configure your cloud provider to talk back to this OIDC provider, part of this step is saying, here's the DNS name, you are expecting web PKI. Some of the clouds like AWS for example, will say like, hey, like paste in the thumbprint of the actual cert that you're serving over there and we'll validate that thing. So that's how the trust between the provider, the cloud provider and the OIDC on-prem stuff. All right. Thanks everyone.