 Hello, everyone. Good afternoon. I'm Chirag Kapadia and I'm a software engineer at Uber. And I have my colleague, Ryan Turner, with me, who's a maintainer on the Spire project. And today we'll be presenting how you can leverage Spire for all your production identity needs. So this will be the agenda for today's session. We'll start with an overview of Spiffy at Spire, then talk about Spire architecture, plugins, and X59 certificates. We'll then cover a use case for mutual TLS encrypted MySQL connections using Spire along with the demo. So let's get started. So what's Spiffy? It stands for secure production identity framework for everyone, which is a set of standards for securely issuing identity to workloads. It consists of three main components. First is the Spiffy ID, which is a URI string that uniquely represents the name or ID of a workload. And a workload here could be anything like a web server, a database, a microservice, or any single piece of software. The Spiffy ID URI format is Spiffy colon double forward slash followed by the trust domain and path. The trust domain here represents the trust root of the system, which for example could be the domain name for an organization. And the path is unique for a workload. The whole Spiffy ID string can thereby be used to uniquely identify a workload. Second is the Spiffy verifiable identity document, also known as ESWID. It is a document which proves the identity of a workload. It carries the Spiffy ID and is used by a workload to identify itself. For example, a workload trying to authenticate to an API on a web server. The current supported document types are X509, which is the format for public key certificates. And Jot, which is a JSON web token. And lastly we have the Workload API, which is a set of API specifications for workloads to be able to securely fetch ESWIDs. Next we'll discuss SPIR, which is short for Spiffy runtime environment. And it provides production ready implementation of the Spiffy APIs. Primarily, SPIR works on two main processes around workloads. That is workload registration and workload attestation. Registration is a way to tell SPIR how to identify a workload. And SPIR uses that when it's trying to attest a workload. Attestation is a way for SPIR to figure out details regarding who the process requesting an ESWID is and uses those details to find the corresponding registration information in order to issue an ESWID. Usually workloads use SPIR issued ESWIDs to authenticate to other workloads. For example, mutual TLS encryption using X509 ESWIDs or better token based authentication using Jot ESWIDs. Next we'll discuss some details about the SPIR architecture and no more about the components and the APIs involved. SPIR deployment consists of a SPIR server and one or more SPIR agents. SPIR server is primarily responsible for managing workload registrations. And it also acts as a signing authority to distribute workload identities to SPIR agent. SPIR agent runs on every node on which workloads run. And it is primarily responsible for attesting workloads and issuing identity documents to them via the workload API. So typical workflow in SPIR can be described in three main steps. We'll start from the top. Firstly, registrations are made to SPIR server via the entry API. Each registration entry would specify which selectors a workload should have to be able to fetch an identity. An example for a selector could be a specific label on a Docker container or on a Kubernetes pod. Secondly, each SPIR agent communicates with the server via the agent API. When the first time an agent starts up, it authenticates to the server via a process called node attestation, during which the server verifies the identity of the node on which the agent is running on. Once authenticated, the agent fetches signed workload identities from the server and caches them in memory. And lastly, workloads running on a node would communicate with the SPIR agent via the workload API. Agents gather the selector information for the workload using the workload attestation process and returns the workload an identity document from its in-memory cache or by requesting it from the server. The behavior for SPIR server and agent can be determined by a variety of plugins. There are different types of plugins supported by both server and agents, which allows SPIR to be highly extensible and customizable. Each plugin type is a JRPC API interface and SPIR has built-in implementations for each type. SPIR also allows custom implementations for these plugin types by providing a plugin SDK. We are not going to go into detail about each of the plugin types, but I'll highlight a couple of important ones. The node attester plugin on the agent gathers nodes identity from the underlying platform. For example, if it's an AWS instance, the AWS node attester plugin fetches the instance identity document from AWS. The agent then uses that information as a proof of identity when it's trying to authenticate to the server. The node attester plugin on the server is then used to validate the node's proof of identity document presented by the agent. For example, the AWS node attester plugin on the server would call an AWS API to verify the instance identity document sent by the agent. Another important plugin type is the workload attester plugin, which is an agent plugin. It allows the agent to find selector information for a specific workload. An example for this would be the Kubernetes workload attester plugin, which queries the template to fetch information about the pod, such as namespace, pod label, service account, et cetera, and uses it to generate selector information. Next, we'll go into some details about X519 passwords. So what are X519 passwords? They are primarily X519 certificates, which encode information following the SPIFI specification. An X519 password contains a hierarchy of certificates where a leaf certificate would represent a workload, and a signing certificate would represent CA certificates that are used to sign those leaf certificates. These certificates would then chain to a trusted root CA, which, for example, could be an upstream authority inspire or the spire server CA certificate itself. An important specification of X519 is that all certificates in the SVID would contain the SPIFI ID representing an entity in the URI SAN, that is the subject alternate name extension in X519. For example, the signing certificate for spire server would have its SPIFI ID in the URI SAN field, and likewise, the SPIFI ID for a workload would similarly be encoded in its leaf certificate. Here's an example of the default leaf certificate format for the spire issued X519 SWID. As we talked earlier, it contains the SPIFI ID of the workload in the URI SAN field. The issuer has country US and organization as SPIFI. The subject has country US organization spire, and a unique identifier string, which is unique for a workload. It also has digital signature, key and cipherment, and key agreement specified in the key usage field, and the TLS web server and web client authentication specified in the extended key usage field. Lastly, it has the CA flag set to false, which indicates that it is a leaf certificate, and not a certificate authority. That concludes the topic I wanted to cover. Over to Ryan for the remainder of the presentation, followed by a demo. Thanks. I'm Ryan. I work at Uber, and I'm a maintainer on the spire project. Chirog did a nice job of describing the current documents that spire issues for workloads, and now I'd like to cover some challenges that you may face with these identity documents. A very common use case that a lot of people have is when you want to authenticate to some off the shelf or open source software that relies on specific certificate attributes for authentication, such as the subject, common name, or organization fields. Some examples of this are database systems like Postgres, My SQL, and at CD that all support authentication with mutual TLS, but they rely on the subject field, whereas spiffy authentication is based around the spiffy ID in the URI subject alternative name. Or you may be integrating with software that authenticates workloads using JSON web tokens and require specific claims other than the standard subject audience and issuer claims. Another example where spire issued certificates may fall short today is for revocation use cases. Although spire offers a reliable control plane for issuing short live certificates, there may still be some cases where you want to issue longer live certificates to workloads using spire. Consider a workload that receives an identity on startup but is not always connected to a spire agent, such as in a serverless or edge computing environment. In these cases, being able to integrate spire certificates with existing revocation infrastructure using OCSP or CRLs provides value because it mitigates the impact of an identity being compromised. Additionally, you may also have some workloads in your build pipeline that perform code signing using certificates issued by spire. In order to restrict that only certain workloads have code signing privileges in a trust domain, you may want to set the code signing extended key usage bit in the build workload certificate. Or perhaps you have more advanced use cases for certificate path validation and want to enforce path validation policies in your spire PKI using the policy constraints extension from X509. So the overall question is how can we solve these kinds of problems related to identity and authentication without having to build separate certificate infrastructure? Well, to fill this gap in the project, spire now provides a credential composer plug-in interface where you can write custom code that influences some of the fields included in Svid signed by spire server. All the X509 Svids and JSON web token Svids issued by spire server can be customized using this plug-in interface. For X509 Svids, the credential composer plug-in interface specifically allows you to customize the subject, DNS, subject alternative names, and any other X509 V3 extension except for the URI subject alternative name, which always needs to contain the workload spiffy ID according to the spiffy specification. And for Jot Svids, you can set any claim value in the Jot apart from the subject, which also needs to bear the spiffy ID of the workload based on the spiffy specification. Now let's see how the credential composer plug-in fits into the overall spire architecture by looking at a flow where spire agent requests a workload Svid to be signed by the spire server. Credential composer plug-ins are external server plug-ins, meaning they run as a separate process co-located with the spire server. They implement the credential composer GRPC protobuf interface defined in spire's plugin SDK and host the plug-in APIs over a local Unix domain socket accessed by the spire server. Spire server can be configured with multiple credential composer plug-ins, allowing you the flexibility to chain multiple plug-ins together and decompose your customizations into discrete plug-in implementations. When a spire agent requests an Svid from spire server, before the server signs the Svid, it will send a GRPC request to any configured credential composer plug-ins with the spiffy ID for the Svid and the set of other attributes it plans to include in the Svid. Using this information and potentially relying on plug-in configuration or external data sources, the credential composer plug-in decides whether to change the set of Svid attributes that the server will sign. It then returns the final Svid attribute set to the server to be signed and the server completes the signing operation and returns the Svid to the spire agent. Now I'd like to show a demo of a real use case of how we can use spire along with the credential composer plug-in to provide certificates to a MySQL database and a stateless service that accesses this database and use those certificates to set up a mutual TLS connection between the service and the MySQL database. There are two initial problems that we need to solve. One is that MySQL doesn't know how to authenticate the spiffy identities issued by spire by default. MySQL authenticates mutual TLS callers based on the subject common name field of the client certificate, but spire doesn't set this field by default. The second problem is that MySQL doesn't know how to talk to spire agent to fetch an identity. I'll walk through how we can solve both of these problems with the credential composer plug-in and an init container. First to set up the environment, we register the service and the MySQL database in the spire server. Next, spire agent syncs the entries for the service and the MySQL database from the spire server and requests X509 Svids for them. The spire server calls out to a credential composer plug-in implementation we've authored in this demo to determine whether the X509 Svids should have any custom attributes set in them. The credential composer plug-in we've written in this demo has a simple implementation that looks at the path of the spiffy ID for the Svid to be signed. And when the spiffy ID path matches a configured prefix of slash MySQL slash client, it interprets the final path segment of the spiffy ID to be the MySQL username. And it sets this username in the subject common name field of the Svid. The spire server then signs and returns the Svids to the spire agent which keeps them in its in-memory cache. The stateless service will fetch its identity on startup from the spire agent using the GoSpiffy library. The MySQL pod will also fetch its identity using an init container. It will persist the certificate, private key, and bundle to an emptier tempFS volume shared with the MySQL server container in the pod. The MySQL server container will then start up and fetch its identity from disk and start listening over TLS. We've now shown how we will provide identities to a MySQL database and a service calling the database, but what happens when the MySQL database's certificate is going to expire? MySQL supports dynamic reloading of TLS configuration by rereading TLS files from disk, but it doesn't know how to get an updated identity from spire. To solve this problem, we have a sidecar container running alongside the MySQL server in the same pod that keeps a long live stream open to the spire agent over the workload API. This container receives updates from spire agent when a new identity is signed for the MySQL server. On receiving an updated identity, it saves the new TLS files to the tempFS volume and triggers a reload of the TLS configuration in MySQL. Let's now see this in action. Going to go ahead and switch over to a live demo. Okay. So I've gone ahead and pre-built a Kubernetes cluster with all of the components shown in the slide. So let's take a look at the current state of the environment. So first, we can see that we have a spire deployment here with a spire server and we have a three-node cluster that we're using today, which each node in that cluster has a spire agent running. Next, we can take a look at all of the registration entries that we've created into the spire server. So we can see that there are four entries. The first entry corresponds to the stateless service. You'll see that this spiffy ID has a path prefix of slash MySQL slash client, which is the path prefix that our credential composer plugin will be looking for in order to determine whether or not the ESFID should be treated as a MySQL client certificate. We also have two identities registered for the MySQL pod. One identity corresponds to the MySQL server and we also have an identity for the sidecar TLS reloader container. You may notice that the TLS reloader container spiffy ID also has a prefix of MySQL client. We need this because we are going to enable TLS to be required for all connections to MySQL. So the TLS reloader itself needs to be a user registered in MySQL and use its own client certificate to authenticate to MySQL. Finally, we have a node alias registration that associates all of these workload registrations to our spire agents. Next, I'll show that we have a MySQL container running in this MySQL zero pod. If I describe this pod, we can see that it has three containers set. So there was an init container called TLS bootstrap, which we talked about, which fetched the initial credentials from spire server and persisted them to the tempfs flamin the pod. We also have two containers running the MySQL server. You see is running the MySQL image and the TLS reload container, which is periodically updating the TLS configuration for MySQL server. Next, I'll quickly show the MySQL server configuration. You can see that we're requiring TLS to be used for all clients and we set the TLS certificate private key and CA files. These are the files that are written onto the tempfs volume by the TLS bootstrap and TLS reload containers. Next, I'll show a couple users that we created in MySQL. So I'm going to grab the MySQL log here and we can see that a couple of create user statements were run when we set up the environment. One for the TLS reloader container. This user requires the client to present a certificate with this subject. As Jirag mentioned earlier, the country, US and organization spire are default properties set by spire when it signs X519SFIDS and in the credential composer, we have added this TLS reloader common name field for the MySQL TLS reloader user. We also have a user created for the stateless service, which similarly requires the client certificate with matching this string. Next, I'll quickly show the implementation for our credential composer plugin. This is the whole implementation here. Essentially, all we do is look at the request sent by spire server, fetch the spiffy ID from the request, and specifically look at the path of that request. If the path matches a set of configured prefixes, which we've configured to be MySQL slash client for this demo, then it interprets the final path component of this spiffy ID to be the MySQL username and it sets the MySQL username in the subject common name and returns that back to spire server for signing. So now let's take a look at how MySQL was bootstrapped using the TLS files that were obtained by the init container. I'm going to show the logs from the TLS bootstrap init container and in the logs, we print out all the certificates that we're able to fetch from the spire agent workload API. So you can see that the MySQL server identity was fetched, as well as the TLS reloader identity also assigned to the same pod. And we also have the trust bundle for this spire trust domain of example.org that we're using in this demo. And we don't log the private key, obviously, for security reasons, but we do log here that all these files were successfully written to disk. So that's how MySQL gets its initial identity. Now I'll show that we have one pod running in the default namespace, which is our stateless service that is going to exercise this MySQL database using mutual TLS. So this service implements to REST APIs. I get API on an endpoint called API v1 users, which just prints out the contents of a table that we've created in MySQL called users, which is very simple. I'll show what that looks like here. So you can see that we've preloaded three users into the database. We have a username Alice, Bob, and another user named Carol. We also have a post API on this service, which allows us to create a user. I'm going to now create a username David, and we see that that operation was successful. If we run the same get API again, we now see that the David user is present in the list of users. To prove that this is actually using MySQL and not just printing out previous state that saved in the service, we can look at the MySQL logs. So earlier I showed the users that we created in MySQL, one of which was this Spire MySQL client, which is corresponding to the spiffy ID of the stateless service that we registered in Spire. You'll see that it connected to this database using TLS, and it issued a few queries corresponding to the API calls that we just issued to the service with Carol. Select star from users, and then insert, and then another select star. So that shows that the service is actually successfully connecting to the database over mutual TLS. Also show the rotation. So in MySQL, we have the TLS reload sidecar container. If we print the logs, we can see that these certificates that were also fetched by the TLS bootstrap container are also fetched from the TLS reload container, and updated over time as Spire agent pushes new certificates. So let's print out the values of these certificates so we can actually read them. I'm just going to paste in a short function that makes it more convenient to dump these into SSL, open SSL. So first let's look at the MySQL server certificate. So you see that the URI subject alternative name matches the registration that we showed earlier in the Spire server, and this has a DNS subject alternative name set to the local cluster DNS. Next let's look at the MySQL client certificate, which is used by the TLS reloader container to authenticate itself to the MySQL server process. You can see that it has a different SPIFI ID in the URI subject alternative name with the MySQL client TLS reloader path. You can also see that the subject common name was set in the certificate to be this TLS reloader user, which we created in MySQL. Finally, we can do a quick examination of the trust domain bundle, which was saved on disk as well. You can see that the subject alternative name contains the trust domain. We can also visually verify that the subject key identifier of this CA matches the authority key identifier of the other certificates for the MySQL server in TLS reloader. So you can see this subject key identifier is prefixed with 5FF0, and here we also have the authority key identifier with the same prefix and also here. So we can see that this is the actual CA. You'd still want to verify the signature, but at least for demo purposes it kind of shows it's a real CA, and it has the CA bit to true. And then lastly, we can look at the logs for the stateless service to show that it is also getting updated identities. So I will go ahead and print out its certificate, and you can see that it has the SPIR MySQL client identity with the SPIR MySQL client username set and the subject common name. Note that in this demo we've also created these identities with very, very short time to live. They have like two minute validity periods, so they've been rotating in the background since we started this demo. So this shows like this is working stably over time and not just once. So that concludes our demo. And with that, we have some links to the source code if you're interested in more of the details. Feel free to peruse that and open any issues or pull requests. I've also added some links to the credential composer plugin interface if you're interested in creating your own implementation for your SPIR deployment. And there's more context about this plugin on the GitHub issue linked here if you're interested in seeing more about the discussion of this component. And then as always we're available on Spiffy Slack. It's a very active community so feel free to participate. A lot of people ask questions and there's a lot of great support there as well if you have questions. And then here's our GitHub link as well. So that's all. Thank you for your time. Thank you that was really cool. A quick question. If you use like SPIR with Istio, can those extra fields get passed through to like the sidecar certificates get assigned? Yeah so I haven't personally used the Istio integration but I do know that it does support SPIR as a CA provider. I don't think there's any restriction on using this plugin in Istio. I think you should be able to use it just like any other SPIR deployment. Thanks. Thank you for the demo. It looks really cool. A couple questions. First for the internal CA, do you have support for signing certificate rotation? Signing certificate rotation? Yeah. Yes. So the signing certificate is automatically rotated by SPIR. So there's a plugin that we didn't really talk about in this presentation but it's called the upstream authority plugin. You can use that plugin to chain your SPIR PKI to an upstream CA as well. So it's an interface just like the interface we showed with the credential composer. So you can write your own plugin or use one of the existing plugins. We support I think HashiCorp Waltz. You can write the CA file to disk. There's a few options that you have there. So yeah SPIR does that today. Cool. Yeah good to know. Also like how does the agent actually bootstrap to get the initial kind of SVDs for the workloads? Yeah. So Chirag talked a little bit about the node attestation process. So that kind of bootstraps the trust between SPIR agent and SPIR server. Once that initial trust is established the agent gets its own SVD from the server and it uses that to create a mutual TLS connection to the server. After that it periodically is trying to sync all of the identities that are available to that host or that node. And then it caches the registration entries that are created and for all those entries it pre-caches some x509 SVDs that it signs from the server so that when the workloads are spawned on the host usually the SVDs are already pre-cached and available. I see. Cool. Last question. So you know if you look at the architecture server is kind of a single point of failure. In a production like environment there are probably multiple servers. So how do you manage them and how do you keep them in sync? Yeah. So we support a high availability mode where you can run an active-active configuration. So in a production deployment you would definitely want to have multiple instances of the server. They all act as their own independent signing authorities with their own keys but they can all chain up to the same upstream authority which goes back to your earlier question. So you can chain them all to the same PKI but they can all actively run independently of each other so if one goes down you hopefully still have other instances that are available. I see. But you know how do you keep all the users in sync you know of them or they manage different users? Yeah. So they all share the same database. There's a backing MySQL or Postgres database or SQLite database that they all share. I see. Sorry. One more question. So in the Uber like as a global infrastructure I guess. So do you have a single CA like a single database or do you actually have each multiple region and each region has its own server? Yeah. We have a sorry we have about one minute left. Yeah so Uber has like multiple deployments of Spire because we run in multiple public clouds. We have run in on-prem. So we have deployments per our availability zone concept. So that if one zone is unavailable then we have other instances available. Okay cool. Thank you. Yep. If we're using Spire to mint Jots how does the trust bundle get delivered to like we have an authentication header. We pass it to the client. I like how do they validate the authentication bundle. Is that. Yeah. So Spire server has a bundle endpoint that it exposes as an HTTP endpoint for use cases like this. There's also a component called OADC discovery provider provided by the Spire repo and it publishes out these keys in an OADC compliant way as well. Hello. I have a more basic background question. So I guess when you choose a SPIP inspire I'm wondering what are the other alternatives you kind of considered like maybe someone could just use some cloud public cloud providers offerings or maybe there could be some other alternatives. So I'm just gonna want to want to more about other options you considered and why you specifically chose a SPIP inspire. Yeah. So for us we run pretty complex heterogeneous infrastructure. So we run like I said before in on-prem zones and public cloud. And we've recently changed different public cloud providers. So we really don't want to lock ourselves into a particular public cloud identity implementation and we also need to be able to authenticate across all those clouds. So SPIP inspire we felt like was a really valuable tool for that. There are other solutions in this space like cert manager is one example. If you run within a Kubernetes environment there's other tools like hash core vault as a PKI engine. So there's many solutions out there but we felt like this was kind of the most robust and has like a pretty scalable system. We've been running this stable in production for over four years now. And I can say that's really held up pretty well. Thank you. Cool. And I think with that we're out of time. If you have any other questions feel free to grab us but thank you very much.