 Okay. Um, I think we can get started. Um, Hey everyone, uh, welcome. Uh, we are going to be talking about OIDC and workload identity in Kubernetes. Uh, starting off with a brief intro. Um, I'm Anish Ramshikar. I'm a software engineer at Microsoft. Um, I'm part of the Azure container upstream team working on security projects. I'm one of the maintainers of Secret Store CSI driver. Uh, it's a Kubernetes ECODS up project. Um, and I'm from Seattle, Washington. Hi, everyone. I'm Mashidosh, and I work at Elastic. And I'm also a maintainer of Plus 3 API provider as a. So let's get started. Um, this is the agenda that we are going to talk about. So we are going to give an introduction around workload identity. So how many of you know about workload identity so far? Wow, that's great. Um, so when I was learning workload identity, uh, I started to look into some basics. So we'll also do a refresher on authentication and authorization. It will not hurt us. And then Anish will help us understand workload identity in details. He'll go into how it works and explain in the context of Kubernetes. And we have a demo for that. And after that, we are going to talk about a couple of usages about workload identity in Kubernetes. So let us get started. Um, so workloads deployed on Kubernetes may require access to external resources on public cloud. Or maybe a workload in Kubernetes would like to talk to another workload in Kubernetes and they need to identify each other. So workload identity is a way to authenticate workloads. Um, and you can see in this little diagram that I tried to make when I was trying to understand workload identity. That, let's say you have two pods, pod one and pod one. So you can consider these two as a workload and there is some identity associated with these two pods which will be used to authenticate. Um, let's, okay. So let's get into some basics of, uh, you know, some jargons that you want to debunk here. Um, entity is something individual. It could be an individual or it could be a service which, and identity is a set of attributes that can help you identify these entities. And authentication is a way of proving that entity to someone. And authorization helps you to tell that what are the set of actions that an entity can take. Um, so we have this, uh, you know, the traditional model of authentication. You have an app, you have server and server has protective resources and you could put your username and password to access the resources from the server. And it had a lot of drawbacks. You know, you couldn't do more fine-grained tuning to that. And if you wanted to revoke the access, it was terrible. Um, and also providing your username and password could be a risky, you know, if you want to provide that for a third-party app. And here comes OAuth 2.0. So OAuth 2.0 is a standard. It's based on Rx6749 that tries to standardize the way of, uh, you know, delegated authorization. And if you see, there are four roles that I've written on the slides, like resource owner, resource server, client and authorization server. Um, so resource owner is someone who owns the resources, the protected resources. And resource server is from the server from where you could access those resources. And client is someone who you will, you as a resource owner will delegate authority to access the resources. And authorization server will help you to get access token to help you access the protected resources. Also, it is important to keep in mind that OAuth 2.0 does not define any standards around authentication. Um, so let's move to this little diagram that I've made here. So you can see the client tries to do a request via resource owner. And you can see in step two, I've written authentication. And OAuth 2.0 does not define anything around authentication. So somehow let's say the resource owner authenticates to have a delegated authorization to the client. And now you see AS that is authorization server. It passes an access token to the client app. And this client app can now exchange, present this access token to use the protected resources. So this is a client credentials flow in OAuth 2.0. There are other flows that are more relevant for web applications. But I just wanted to give an example on how this access token mechanics work. Um, so now let's talk about OIDC, which is one of the things that we'll talk when we talk about workload identity. So OpenIDConnect specifies a couple of standards to help do the authentication. And one of the things that we touch upon OIDC is mostly around ID tokens that OIDC specifies should be a Jot token. And this Jot token will have claims, for example, email, name, et cetera, to identify the entity. And yes, let's see how OpenIDConnect works. So you see this diagram. Again, you have an application and you have an authorized server. The client sends a request and there is the request data that you can see. It says profile OpenID that is important when you are sending a request, mentioning that I need an OIDC thing. And the client gets the authorization code and now the server sends this authorization code to the authorization server. And you see the difference here is that the authorization server is now sending two things, ID token and access token. So the server now can decide to do something based on this ID token. And the server will know how to use that ID token because it will have a lot of details about the entity. And again, the same process. So how does our previous diagram look like if we put OIDC in your perspective? Yeah, this is just for a so here. And now, Ani, help us understand workload identity. Okay. So we had a refresher on OIDC. Now, jumping into workload entity in Kubernetes. So first thing we start off is by defining what is a workload. So in Kubernetes context, a workload is an application that's just running in a pod. A pod is a set of containers that contains the business logic. And each pod runs in a namespace and it has a unique name and it also has a service account tied to it. So when we look at workload entity in Kubernetes, some of the use cases that come up, the first one is the pod trying to authenticate with the Kubernetes API server. So applications running within the pods in the cluster often need to interact with a QB API server. And for this purpose, what they do is they use the Kubernetes service account that's tied to the pod. And this is crucial for applications to perform actions like scaling pods, accessing Kubernetes secrets, getting data from config maps, or also inspecting the cluster state. The second most common scenario is authenticating communication from between the workloads, so workload to workload. So for pod to pod communication in Kubernetes, it is essential for ensuring that the security and the trustworthiness of interactions between the workloads in the cluster. And then I think the third common use case that we see with workload entity is for pods that's running in the cluster, they need to access an external protected resource. So this external protected resource could be a database, an API service, or any managed service that's running in a cloud provider. That requires authentication and authorization. The first two scenarios have been broadly covered in a couple of other talks, and then that's something that's widely talked about. So for the purpose of this talk, what we want to do is focus mainly on the third one, which is trying to access external protected resources using workload entity. So when we look at workload entity options, some of it that comes to mind is what you see here on the list. This is in no way the entire list. But just looking at it, the first one is Kubernetes Certificates API. So Kubernetes has had support for native certificate provision flows for a long time now. This can be done by just creating a certificate signing request. And then a CSR is used to request that a certificate be signed by a denoted signer, and then it can be approved or denied by an approval before it's actually signed. The one caveat with this is it requires implementation of an approval which means there is upfront work that needs to be done in order to consume this in a secure way. And then the second one is Service Mesh. So if you're using a Service Mesh like Istio, LinkerD, or Envoy, these tools often provide built-in features for secure authentication and authorization between the services, including MTLS and fine-grained access control. And then the third one, Spiffet is a standard, and then it spires an implementation of Spiffet APIs that performs Node and Workload Attestation to securely issue SWIDs and two workloads, and then they verify the SWIDs that's presented by other workloads. So the thing with Spiffet is it does workload-to-workload authentication really well, but in addition to that, the jobs that are generated can also be used for workload entity to access external protected resources. So before I talk about the fourth one, Kubernetes Service Account, one thing is out of the three options that we talked about, Service Mesh and Spire are add-ons that need to be additionally installed in the cluster. It's not something that comes by default, and the Certificates API definitely needs some amount of groundwork to implement an approval. So this brings us to the last option that we have on the slide here, and that's the one that we're going to be discussing about in this talk, which is Kubernetes Service Account tokens. Okay, so why Kubernetes Service Account tokens? The first thing that comes to mind is simplicity. So Kubernetes Service Account tokens are built into the Kubernetes platform, making them the simplest and the straightforward option for managing identity with Kubernetes cluster. You don't need to set up any external services or additional components to use them, and then the second one is native integration. It's tightly integrated, it's conformant, and it's tested. And when we talk about Kubernetes Service Accounts, typically there are two of them. One is the default service account token, which has existed in Kubernetes for a very long time now. Some of the issues with that was one, it was automatically generated and stored in a Kubernetes secret for every workload that was created. That was until recently, and there's a long cap that has been worked on to stop doing that and move to a more secure, projected service account. But in terms of other issues, like why we say it's not secure, is the jobs are not audience-bound, the JSON web token. So basically any recipient of the service account job for the default service account can masquerade as a presenter to anyone else. And then the other issues were these jobs were also not time-bound, so they wouldn't expire. The lifetime of it was basically tied to the Kubernetes Service Account existing in the cluster. And the scalability issue was every time a service account token was generated, a Kubernetes secret was created, and then that secret was mounted into the volume, which means if you had many pods consuming a service account, then you just had a lot of secrets. So in comes projected service account token. So it's a bounded service account token that's time-based. These service account tokens are useful for workload-to-workload communication and can be used for accessing external resources. And they're also bounded service account. So basically the Kubernetes API server will enforce the required attenuations. That is the time binding, the audience binding, and all of those. Okay, so this is how you, as a pod, can request a projected service account token. So on the left side, what you have is a volume config. Basically, the name there is the file name where the token will be mounted inside the pod. And then the audience field is configurable, and it's also required so you basically can say, like, I want a token for a particular audience. And the expiration seconds here denotes how long the issued token needs to be valid. And then when we look on the right side, this is what the new service account tokens look like. So the audience field there is what is configured in the volume config. And then the issuer claim, and then if you look at the expired and I issued that, like, that should basically say one hour. But in addition to the standard claims that's available in the job, there's also Kubernetes-specific claims that are nested under Kubernetes.io that can uniquely identify a service account. So there's information about the pod for which this token was generated. And there's also information about the service account for which this pod was generated. And it's just not the name, but also the UID. So if the same service account is deleted and recreated, the token is no longer valid. Okay, so workload identity federation. So we talked about OIDC, we talked about workload identity, what a workload is, and service account tokens. So workload entity federation is what ties all of this together and enables you to basically access an external protective resource. So workload entity federation allows you to use the existing authentication stack in cloud providers to authenticate and authorize workloads running in a Kubernetes cluster. Federation basically allows you to bridge the gap between the cluster's identity system and the cloud provider's identity. So enabling a seamless and secure interaction between them. Federation basically, the workload entity federation follows the OR2.0 exchange protocol specification. So you provide a credential from your IDP to a security token service, so STS, over here, which verifies the identity on the credential and then returns a federated token in exchange. And if you look at the flow diagram, basically we have a Kubernetes workload which is sending the projected service account token to an STS. The STS uses the OpenID Connect discovery protocol to get the discovery document from the well-known OpenID configuration. And that discovery doc contains the jox URI, the JSON Webkey set, which contains the public key of the service account signer. And using that, the STS can validate the authenticity of the token that it just got. And then there's also a trust relationship, which basically says, if I have a token from a particular issuer and that matches the subject, then give me a token. So the STS performs those validation and returns a federated token that can be used by the app. So this flow diagram is essentially the same as the previous, similar to the previous one, but it has two additional components. So we have the cubelet on one side and then the external protected resource, which covers the end-to-end flow of what we are talking about today. And also, as an example, in this one, we are referring to the Microsoft Identity Platform. And in terms of the entire flow, cubelet generates the token, gives it to the workload as part of the pod startup. The pod can exchange that token, so it basically sends that to the Identity Platform. And then the Identity Platform at that point is going to check the trust on the identity, validate the incoming token using the discovery protocol that we talked about, and then it issues an Azure AD access token to the workload. And at this point, the workload that's running on the cluster can send the Azure AD access token and it can access the resource. And an example of that is a pod trying to get a secret from Azure Key Vault, and that brings me to the next slide, which is a demo. So as part of this demo, what we're going to do is we're going to try setting up a kind cluster the hard way, which involves hosting the issuer URL and the discovery document on the JSON web pieces. And after that, I'm going to deploy a demo application that tries to use workload entity in Azure to get a secret from Key Vault. Is the font visible or should we show me a bit? I have to see there. The first thing we're doing is we're generating a service account signing key pair that will be used by the kind cluster and this step of setting up an Azure storage account for hosting issuer URL, this is something that we are doing for the demo because the OIDC issuer URL and the discovery doc needs to be publicly accessible on HTTPS. But if you have any cloud provider and you create a cluster with it, like all of this is handled by them. And then they also ensure that the service account signing keys are rotated on a regular basis. Okay, so this... Sorry. So this is an example of the open ID discovery doc. And as you can see, it calls out the issuer that would also be present in the token. And then the jox URI is basically the JSON web key set where the public signing keys are. So, okay. So this one is uploading the discovery doc to the storage account so that it's publicly accessible. And once that's done, the next thing we could do is verify that we can get it using Curl. Okay. So it's accessible. The next thing we do is we're going to upload the public key in the jox doc. And then for the purpose of this demo, I am using a CLI tool that we built in Azure, which is called Azure Workload Entity CLI tool. And what it does, is it will publicly pass in an input file which contains the public key, and then it will generate the jox document for you. So it outputs that to jox.json, which if you know there is a key ID, and then that's the public key for the service account. And that is also being uploaded. And then the next thing we do is we verify that endpoint is also publicly accessible because the AD needs to get access to it. And then the next thing we're doing is creating a client cluster. It's a fairly simple config, but the things to note is we are reusing the keys that we generated at the start, the service account signing keys. And then in addition to that, we are configuring the service account issuer to be the URL that we just hosted. And that creates the cluster. I'm just going to skip those. Okay. So the next thing we do is create an identity on the cloud provider side. So in case of Azure, I'm creating a managed identity to which I'm going to try the authorization rule. So the identity is created, and then the next thing that's doing is setting a policy in Key Vault to say the identity that we just created has get permissions for secrets. And this is the trust that I talked about while talking about Federation. Basically, this command is setting up a trust on the cloud provider identity to say if I give you a jot that has this particular issuer that is there and then has this particular subject, then you can trust that and then give me a token. And then as part of the demo, I'm also installing the Azure Workload Entity Webhook here just so that it makes my demo easier. But this is not something that is required. The purpose of the webhook is just to inject the environment variables that I need for the demo. But in addition to that, it will also add the projected service account token that can be used for workload identity. Okay. And then in terms of the application, so basically we're creating a namespace called KubeConDemo, and then we create a service account that can be used by the demo app. And then the annotation that I'm doing here is also specific to Azure. It doesn't apply to every cloud provider. And once we do that, the next thing I'm doing is deploying a simple pod, and this pod is implemented using Azure SDKs to get a secret from Azure Keywall. And all it does is loads up the projected service account token, exchanges that using the workload identity federation flow. So once we do an apply and then wait for the pod to be running, we're going to look at three things. The first thing is I want to show you the projected service account token that was injected into the pod. So let's look at the volume spec. And towards the end, we see that the Azure identity token is there. And then the lifetime of that is basically one hour. And the audience field is something that's static, that's configured for Azure workload identity. And then the next thing that we're going to do is look at the actual projected service account token that was injected into the pod. And I'm using the step CLI here to configure it. But the key thing to note is in there we have the audience field that we had configured. And then the issuer is also the service account issuer that was configured for the cluster. And obviously the additional claims. And then the subject is a kind of determination of the namespace and the name. And then the last thing that we're going to do is see if in the logs that the pod is able to access it. So the pod got the secret that we were involved in using workload identity. Thanks, Anish. Though I just wanted to ask one question, and it's like you already mentioned this webhook thing. You mentioned that it's not required. So if I deploy this workload or pod and have this configuration to do the projected service account token and establish the federated credential, we will be good. So the webhook was only for injecting Azure SDK. But as long as you put the projected service account token volume in the pod like you don't really need the webhook. Awesome. So let's touch upon the other uses in Kubernetes. So cluster APA provided as well as a project which I contribute to and it has started to use workload identity. We'll see in more detail how that works there. Also Kubernetes has a workload identity. And folks don't know what cluster API provided Azure is. It's right on the screen. It's a sick cluster lifecycle project that helps you provision Kubernetes cluster on Azure. There are other providers for Google, AWS and etc. And one of the basic ways to give credentials to the Azure cluster API provided Azure manager is using service principles. So that manager is able to create resources on Azure and that is roughly using username passwords which is not so decent. So when we implemented workload identity in Capsi we saw the demo that Anish gave. So this is on the top you can see the spec of the Capsi manager which tries to have a service account token and this is what gets projected into the pod and you can see on the top in the bottom this is the path where the token gets and then it uses the SDKs to talk to you know Azure AD to access protected resources. And you were seeing in the demo where Anish used in CLI to create identity here. I'm just trying to give you an example. You can go onto the Azure portal and create an app registration that is equivalent of what was done in CLI and once you create that application on Azure you can now establish the federation. This is we can call it like creating the federated trading sales and you can see that we put the cluster user URL and the name space of the workload or the pod that is going to access the client ID and audience. Right? So once we are done with all of this this is more Capsi specific but the only thing that you now need to do is tell the client ID and the tenant ID to that pod and you should be able to create clusters on Azure or talk to Azure to access resources. So the other usage that we want to talk about was Secret Store CSI driver. It's a self-project and it basically implements the container storage interface and what it does is it can talk to any external Secret Store and it can get the secrets and then it mounts them in tempfs and in the demo we looked at an app which was custom basically implementing Azure SDK to talk to Keywall but if you were using a Secret Store CSI you didn't have to do any of that you basically would have a generic code that reads from the file system and then you would just install the driver and the provider that would use workload entity to talk to your external Secret Store get the secrets and mount it for you. So this is a very high level flow of the Secret Store CSI driver I know it's a lot even for the high level but what happens underneath is the Kubelet will generate the service account token on the node and it does this when a pod gets created and says hey I want a volume using Secret Store CSI driver and the Kubelet calls the CSI driver on Unix domain socket so these tokens are sent as part of the RPC call and at this point the CSI driver sends those service account tokens to a CSI provider which again is a Unix domain socket and we support the providers today like Azure, Google, AWS, HashiCorp with many more out there that have been implemented but the CSI provider would use the service account tokens and then workload entity to talk to the external Secret Store get the secrets, sends it back to the provider and then the provider sends it back to the driver and mounts the secret as tempFS and also this token request since CSI driver was added in the Unix 120 it was majorly it's majorly used in Secret Store but it could also be used by other CSI drivers for disk attached detach basically instead of using a common identity for the driver you could actually rely on the workloads identity for performing operations okay so we wanted to put a meme so that's why we have this slide towards the end of the talk and we talked about the what and how of workload entity so far so we would like to end our talk with some of the why's so one thing is applications running use a secret or a certificate to access protected resources in cloud provider the issue with this is secrets and certificates pose a security risk and the other thing is they could also expire and once they expire the result in downtime and then the other thing is managing secrets is just hard so workload identity federation essentially solves this problem because you no longer have secrets and in addition to that workload identity also lets you assign distinct fine grained identities and authorization for each workload in your cluster these are some of the resources that you can look for if you understand more in service account token or workload identity we talked about in context of Azure because that's where we contributed to and we were experimenting with but workload identity in GKE is also something that's possible it's a good read there and we are done with the talk so thank you very much and if you have any questions feel free to ask one more thing you can send feedback is there an equivalent to this in AWS can I map a role into a workload or workload identity is supported by most of the club providers but basically you can map a service account token to one of that so I played with this a little bit one of the core problems that I have in Azure is that I need to I think I need to create a service account per pod right it depends so like name space is a security boundary and you would have one service account per workload essentially effectively to separate workloads I have to go through that process of creating an individual service account now of course I have a thousand service accounts that I then have to manage sure I don't have any more secrets but I still have to manage those thousand service accounts you I think typically you already have service accounts tied to your workloads just to have the R back tied down rather than having a default service account but within the name space you would need to manage a service account per workload like yes but also on the other end you would create a federated identity credential so you would establish trust I think your questions around having many FICs like is that the concern so more do you have a recommendation for a more scalable solution to you know managing identities rather than you know tying them directly to service accounts in Azure specifically so I think on the Azure side typically you would already have a user assigned managed identity or an Azure AD service principle and in terms of actual configuration it comes down to just configuring FICs on that right like you would because you already have service accounts in the cluster so it comes down to managing FICs rather than like individual service accounts and then I also know we're working on like a wild card support so you would be able to do something per name space rather than doing like individual ones yeah I'm waiting for the wild cards but yeah I mean I would love to chat with you absolutely guys thanks for talking question for you if you're running multi-container pods so say some business app container and then sidecar containers to do something and you're using workload identity federation are you aware of any controls so that say if a sidecar was compromised that workload identity the federation that the compromised sidecar container wouldn't be able to leverage the federation that's been made available for the business app yeah that's a good question okay guys on the pod level so the thing is if you look at the claims that's added the subject is just the service account name and name space which is tied to just every container in there so in case of a compromise you can't go to ask granular as a single container like it would just essentially mean the entire pod is compromised yes do you know is anyone working on anything like that do you get more granular within the pod no but again I would love to chat yeah and if you have more questions we can chat up here