 Welcome to the last session of the last day, which is special because everybody here really, really wants to be here. So thank you for coming. We had a bet going, and I won the bet because so many of you showed up. So I win, you win, we all win. All right, just to get started, as we're asking questions, interacting, remember this is part of the community. We have a code conduct. It's there if you would like to read it in detail. Treat others well. We also have closed captioning available on the live stream. So for those who are watching, that's how you can access it. We will be having time for questions at the end. So for those in the room, well, there will be microphones for those watching live. There's a Slack channel on the CNCF Slack. So join that and we'll take a look there as well. And then finally, thank you to sponsor for the recordings. They're awesome at getting recordings up quick, which we all appreciate for reference. So this is the SIG-OFF deep dive. I'm Jordan. This is David. This is Rita. We help lead SIG-OFF along with Mike Denise, who couldn't be here today, and Mo Khan, who is I think gonna heckle us later during the question and answer period. So first of all, what is SIG-OFF? What do we do? We are responsible for the parts of Kubernetes that control authorization and authentication to the API server and components in Kubernetes. And so this also includes other aspects like auditing and some policy, and then some of the identity aspects that feed into those systems. We have more sub projects now than we used to. That is good. If you would like to help and you're interested in one of these areas, please check out the SIG-OFF page on the community repo and find the people who are working on that. There are many areas where we can use more eyes, use more hands, but some of those are audit, authenticators, authorizers, the certificates. So getting identity and then approving identities, request for identity, encryption at rest, which we'll hear a lot about from Rita today, and then some of the other bits like policy management, pod security policy, which is no more, and pod security admission, which is here, things like that. Along with some external projects like the CSI driver, the secret store CSI driver. So those are just kind of an overview of the sub projects we have. Today, we wanted to walk through sort of the current state of some of the enhancements we've been working on, things that are graduated, things that are in flight and things that are coming up. So to start, graduated. Many years ago, when you would create a Kubernetes cluster, you'd get a secret for every service account, which would have a long lived token in it. And this made us sad. And so we came up with a better way, which were these ephemeral tokens, but we still kind of had the old secret ones hanging around. And so we've been on a long, slow, careful roll out to get rid of those. And so the first step was to just stop making the problem worse, stop continuing to create new ones. And so that turned on by default in 124. It's graduated in 126. So from now on, when you create clusters, you don't get default tokens auto-generated. If you want them, you can still get them, but we're not gonna go like spray bad, long lived credentials into secrets for you. So I graduated. Lots of things in flight. There are other aspects of this token reduction work that is still ongoing. We're still trying to track when you're using existing ones and warn and add metrics so you can monitor use of those. And then eventually, eventually, if you haven't used them in a long, long time, we will have the ability to clean up ones that just got created, haven't been used and are a security risk. So this is still ongoing. This is something that's been asked for a lot. The ability to ask Kubernetes, who am I? Kubernetes authentication is a little bit intricate. It can be a little bit complex. And so an API that actually lets you say, who am I and have the server tell you who it thinks you are is a useful thing. This graduated to beta in 127. It's a pretty straightforward, well understood API. So we expect that to reach GA in 128. And so then you would be able to say, keep control off. Who am I? And it would tell you. With that, I'm gonna hand it off to Rita, who's gonna talk about some of the KMS enhancements. Awesome. Show off hands who have used KMS encryption V1. Just kinda, all right, I'm sorry. I'm gonna hand it off to V1 before you now have a clue. That's how you know you're getting better. Sorry? Oh, sorry. That's how you know you're getting better. So yeah, before we start talking about V2, I wanna kinda cover some of the enhancements we've added for both V1 and V2. Some things like, hey, for data encryption, we've moved on from AES, CBC to now GCM. So starting from 125, you actually get right with GCM by default. And it could also fall back to CBC. And for KMS V2, it defaults to GCM. And for folks who actually have to rotate their secrets, now you have dynamic reload. So you no longer need to restart your API server, yay. And that you get in 126. And we get this question all the time is, wow, why couldn't I encrypt my custom resources? Well, now you can. So now in 126, you can also encrypt your custom resources. And last but not least, why can't I encrypt everything? So in 127, now you can with wildcard support as well as wildcard group, you can actually encrypt all or for all resources in a group. And here's an example of what the encryption configuration resource looks like and a friendly panda. All right, so before we talk about V2, so let's address the elephant in the room. Why not V1? What's wrong with V1? Well, to start off, for every encryption request that comes through, a new data encryption key is actually generated every time. So now you might think, wow, that's a lot of new generation because each data encryption key means now I have to go talk to the KMS provider every single time. And that's exactly what happened. This meant a slow cluster startup time because every time you restarting the server that meant every single request actually went to the remote vault. And furthermore, most external KMS actually have rate limiting. So on top of the extra hop and the network latency as well as rate limits, that roughly translates to about 160 milliseconds per request. So as you can imagine, this is not something most companies want. And then let's talk about key rotation, right? Earlier I mentioned before 126, you would have had to restart your API server just to get the updates from the configuration file. It was very manual and error prone. And also because we didn't have enough health, sorry, because we didn't have enough key IDs in the request itself, you actually didn't know what object use what key. So it was really hard for people to know which key can I actually rotate without putting my workload at risk, right? And also for health check and status, all of that was actually part of the encryption and decryption call. So in order to get the status of your plugin you actually had to make an encryption decryption call. Why? I don't know. And then we didn't have enough observability, right? So because there was no unique ID that actually did the correlation. So it was really hard to tell which request went through the API server than the plugin than the actual remote KMS, right? So all of that were basically the gaps in V1. And here comes V2, right? The idea behind V2 is we don't like those gaps and we wanna make sure Kubernetes can actually be, can actually offer something that is more production ready. So starting alpha in 125, beta in 127 and targeting stable in 129, we're looking at really enhancing all of these, bringing all these benefits as part of V2. So starting from 127, instead of generating a data encryption key for every single encryption request, we now actually have a data encryption key reused. So what that means is you could actually reuse the same data encryption key without having to go through that extra hop, right? And that roughly translates to 80 microsecond per request, which sounds a lot better. So that meant you no longer have a slow startup for your API server. And also we added health check and status as its own API. So you can actually check the status of the plugin without having to encrypt or decrypt and observability, right? A new UID was added so that you could actually correlate between all the different three actors. And then also there's a new proto format that was introduced being stored as part of SCD. So when you look at the SCD data, you actually know which key encryption key was used, what the data encryption key was used. And last but not least, there's also, because we actually keep the key ID as part of the status API, you can actually now rely on that information to help you decide when you can rotate the key. And because DEC reuse, it is part of the plugin can actually tell the API server, hey, I wanna use a new key encryption key and therefore that key ID can be used to tell the API server now, go ahead and regenerate a new DEC. So for more information, go ahead and check out that link. This is just the Kubernetes website. And if you wanna get more involved, please check out the Slack channel. And here's and just kind of give you a look at what the actual proto looks like. So as you can see, it's prefixed with Kates encryption, KMSV2, and then the plugin name and then the object itself has the actual encrypted data, as well as the key encryption ID that was used, as well as the encrypted DEC that was used and any annotation that you wanna add to help you correlate the data. A picture is worth a thousand words. And also, it's really hard to explain this stuff unless you've been working on it for a long time. Just to kind of give you graphically, what happens when a user actually creates a new object? The request comes in and the API server says, hey, do I already have a data encryption key or not? And if it already has it, it's gonna use that to encrypt the data and then it's gonna store the encrypted data in SED. Now, what happens if the DEC doesn't already exist? So when it generates the DEC, it basically sends the encryption request to the plugin and the plugin basically encrypts the DEC with the remote key encryption key that is in the external KMS. And then once it encrypts the data encryption key, then all of that information is part of the response that gets returned back to the API server. And for a decrypt request, so like a get or a list, same thing happens to the API server. It sends the API server retrieves the data from SED and if the encrypted data encryption key is already in cache, it's gonna use that to decrypt the resource. And if it doesn't already have it, it's gonna go use the key encryption key ID to talk to the plugin and then retrieve the information from the remote KMS to do the decryption of the data encryption key. See, it's really hard to use. And then again, we've done all the optimization so that we wanna make sure part of this journey, anything you can cache, anything we can reuse, we'll do it. And then last but not least, we talked about the status request. As part of the status API, this is being used to check the health of the plugin. So about every minute, it basically calls the status API to ensure that, hey, is the plugin still healthy? And if not, every 10 seconds or so, it basically checks why it's unhealthy. And the plugin also returns like, hey, here's the current key ID I'm using and if the remote KMS gets rotated, it's gonna return the new key ID to the API server and then the API server will use the new key ID to generate a new data encryption key. And with that. Thanks, so one of the features I'm really excited about that's coming, being developed now is cluster trust bundles. So for a long time, we've had a certificate signing request API and for a long time, we have punted on how can you actually trust the, how can you distribute trust for the certificates that are signed? And this is a mechanism to do that and to distribute whatever other trust you want in your cluster. Shown here is just what the API roughly looks like. You have an API resource with a signer name and a trust anchor, which is essentially your CA bundle. And the idea is that you'll be able to look up the signer that you want. So working through a practical example, the example here would be if you have a certificate signing request and you have a particular signer that is signing it. Trusting the QAPI server, you can now look up the CA bundle to verify it. And you can either do that currently, you can do it manually. But coming soon, we will have a way to have a projected volume. That PR is open. It's got some more reviews outstanding on it. But when that merges, you'll be able to have it injected into your pod and it will let you set up essentially your own schemes. You'll be able to either get ones that already exist if they get published or you can create your own signer and say, for my app, I wanna sign requests with this signer. I'll have some mechanism for doing it. Now, further in the future, there is a plan to actually have a way to identify individual workloads. And it's a pre-cap stage. I've linked it here in the slides, which should get published. So you can click the link later. And you can read through it. I'm expecting a cap probably in 128, I'd say. And I'm really excited for that because it sort of then tells a whole story about how you can have pods communicate and trust each other in a fairly easy flow. There are a couple other designs that are still in design stages. The structured authorization config is one that is going to allow us to have multiple webhooks. So right now, you can have a webhook and you cannot have more than one, and that limits what you're able to do. This is gonna allow us to inject, say, a deny authorizer ahead of RBAC if you want it, and have an allow-based authorizer afterwards. There's some details around how we can have a good failure policy for deny authorization that's early in the chain. And if you look here, you can see there's an example of cell expressions, which are looking like the way to do it. Cell is coming in from API machinery. If you're following what's happened with CRDs or with validating admission, it'll be familiar in its usage here. And I can see this being really useful as you go into your deployments. Another piece of structured configuration is OIDC. So we have limitations today around, you can have one OIDC provider. This is gonna let us get to more than one, and it's going to, again, allow dynamic reload, and it will also allow you to do some more advanced mapping steps, and cell is everywhere. So I should have actually included a link to cell in general. Maybe we can update the slides before I publish them. But it'll allow you to do some more advanced mapping features based on a cell expression. You're gonna wanna be sure you get it right in your authentication stack. There's some examples here of the things you're gonna be able to do. It'll be mappings and validation sort of rules. And with that, I'll hand this back over to Jordan. So there's another in-process design that you may have actually heard references to in some of the other talks. I think some of the SIG network folks were talking about this, and maybe some of the SIG storage folks as well. So the ability to describe references across namespaces has been something that's come up repeatedly over the years, and we've always sort of just said, namespaces are the permission boundary, don't cross namespaces. Which, as a starting point, has not been bad. We've actually gotten a lot of mileage out of that. It's relatively easy to understand, but it does leave gaps in scenarios where you actually have different actors who belong in different namespaces, but still want to explicitly grant access. And so SIG network came up with an example of this where a gateway is living in one namespace. So the administrator for that has its own namespace, appropriately. But then you have individual users managing their own services, wanting to provide certificates for use by that gateway, and they're in their own namespace. And so how should these two actors collaborate? And what SIG network came up with was the idea of a reference grant. So the user with a secret in their own namespace is gonna record the intent to grant access to their secret to some other gateway, and they give very specific coordinates. This secret to that ingress, or that gateway, sorry. And so this was useful. They implemented this as a custom resource, got a lot of use out of it. And then SIG storage said, oh, well, we have something similar. Like we have volume snapshots and persistent volume claims, and we have a similar cross-name space thing we want to model. Can we use your API? And SIG network said, well, sure. But it's weird for SIG storage to use a SIG network API, and isn't this all SIG auth's responsibility? And so they showed up with this API and a request to bring this in as a SIG auth in an API, which we appreciate, that's great. Two use cases is a great thing to have before you start abstracting something. So we took a look at what they had, and reference grant is a really good way to capture the intent of the user who wants to give access to a resource. But there are a couple other pieces of the puzzle. There's the controller, the gateway controller, the volume controller. How does it actually get authorized to get this secret or get this volume snapshot or whatever the resource is? And so there was a question about authorization there. And then finally, how does the controller author indicate when they get a secret that they're doing it on behalf of this thing? And so what SIG network came up with reference grant was a good piece of that solution. And we're looking, we had some great discussions this week. I'm actually really excited about it. We're looking to sort of expand what they did with this reference grant to also have ways for the controller to indicate what it's making the request on behalf of and for the administrator to avoid needing to just give broad permissions to the controller. So as we sort of bring this into SIG off and look to make it a generic thing that lots of controllers can make use of, we're hoping to sort of holistically solve that whole problem. The user intent to allow the controller ability to access and the administrator's ability to give that controller the most scoped privileges it can. Do you wanna talk through this? Yeah, here, yeah. And like every other SIG, we have a lot of cross SIG work. I mean, you just mentioned a bunch of SIG networks. That's true, but we actually work very closely with the API machinery. Specifically, we mentioned sell for mission and then storage version API as well as CUBE API server identity, just to name a few. And last but not least, we wanna give a huge shout out to all the contributors who have really been helping us drive all these enhancements in SIG off and we could not have done this without them. Thank you so much. And in case you're wondering how you can get involved, we have a Slack channel in the Kubernetes Slack as well as a home page with all the detailed information for how you can get involved, when are the meetings. We also have a SIG off triage meeting every Monday morning for US Pacific time. If you wanna join that, also we have a mailing list. If you have questions or issues, obviously there's GitHub issues. And then also we have a bi-weekly meeting where we meet Wednesday 11 a.m. Pacific time where we talk about all the interesting things that we're doing and where you can bring your designs as well. With that, I think that's it. Yeah, thank you. So we did wanna leave some time for questions. I think we've got a couple of mics. So if you have questions raise your hand and Mike will find you for the mobile folks or remote folks will be looking on Slack if you have questions there. So open it up. Hi, nice presentation. I have a somewhat slightly tangential question. So we use EKS a lot. And what we found is that AWS is on one hand good at authentication and on the other hand very bad because if you use EKS and you want to do any sort of integration which you would normally do on the control plane. So on your control plane notes, you of course do not have access to your control plane on the EKS. So I was wondering, are there any thoughts floating around about interacting with authentication without having access to your control plane if you want to manipulate or plug in or in any other way influence it. I mean, it would obviously be very hard because well, if you could do that that would also be a security hole, but what do we do? I think Mo seeded the audience. So a couple of the ability to have more than one OIDC authenticator, the ability to have more than one authorization webhook and the ability to respond to changes in that configuration dynamically are building blocks that would be necessary to support something like that. Whether a particular managed provider would actually expose control over that is totally up to them, but those are building blocks that would absolutely support something like that in a much more seamless way. So that's what we're focusing on now. Those benefit everyone and would enable providers who wanted to give control over that. Yeah, yeah, that was pretty much what I was hoping as well. It's just up to the managed providers, of course, as it always is. Exactly. But yeah, at least that was on the right track there. Thank you. It has to land in Kubernetes first. Yeah, of course, of course. And then it takes about 10 years before it lands in EKS. But at some point it might be possible, which is good news. Uh-oh. Hi. I'm one of the other leads for SIGAuth. Is this on? No. Sure. Can you hear me? Yes. So just a quick thing, it's on top of what they said. So like, let's see, I know there's EKS folks here, but like, so EKS does let you configure OIDC with all the limitations of the current OIDC support. But back when I was at VMware, I did build a little project called Pin&Ped that you can run on top of your EKS cluster. So if you wanna bypass any of those restrictions, just run it, and it does. And I wrote it, so I'm pretty sure it's right. So, nice, thanks. I'll also say that it's possible that some providers are currently using the one webhook or the one OIDC integration we support currently for their own IAM or their own OIDC. And so it's possible that they would actually be willing to support more, but they're using the one slot already. So for ones that don't want to open it up, they don't have to, for ones that do want to, but are currently limited by the one slot, this would help them. All right, thanks for the really good presentation. So one quick question on the kind of configurations that you had, the component configurations, are they files on disk and reloadable? Yeah, they're gonna be files on disk and they will be hot reload. So within a minute, it'll pick it up and then use it for you. Okay, perfect. And is there any plans on the kind of, there's cluster role aggregation and that kind of stuff, but is there any, not related at all to the previous, yeah, sorry, there's a second question. So cluster role aggregation exists, but is there any kind of idea or kind of progress on role aggregation? So not kind of on the cluster wide scope. I didn't have any immediate plans. If you are interested in it, I think we would model it differently. I think we would agree that we would probably model it differently, but if there was a use case for it, maybe. The difference is that we share broad cluster roles that are run inside of each namespace, but if you get down to the level of creating your own role in a namespace, generally know the ones that you created and it's local to you, so there wasn't as obvious a need when we made the original. Okay, cool, thanks. Last session, last day, is anyone still awake? There's a hand. Do you have any thoughts or plans around a V2 secrets API that doesn't expose the secret content in lists? It's not the first time the question's been asked. The hard thing is, it's not just a V2 secrets API, it's a different kind of Kubernetes API, so it's almost a V2 Kubernetes, which is a bigger lift. There aren't any plans right now. Some of the conversations this week we're talking about additional possibilities for things we could authorize on, so listing with restricted to particular selectors, field selectors, label selectors. There are abilities to list with particular content types, so you can list and say, I only want metadata, but obviously we don't authorize on the content type you accept, and so if you give someone list permission, they can use the full list or the metadata-only list. If we start talking about additional inputs to authorization, that could be one we consider. Even metadata-only, though, is pretty suspect since that includes annotations which often get confidential information serialized into them, so there's not a clear path there, it's not really on our radar right now, yeah. But I really am interested in trying to authorize on some additional fields, so show up at a couple meetings when we talk about it. I, yeah, that's one of the folks, let's go first. Hi, thank you, wonderful presentation. I appreciate this list of where people can find you, and I would love to hear from one or more, or each of you, which place do you think it would be really awesome for a new contributor to jump in? Maybe it's one of the specific bugs, or maybe it's one of the documentation gaps, or maybe it's one of the new features, but I would love to hear where you think a new person could really jump in and do something interesting. I'll start first. I think for a lot of the features that are in stable, I think that's a really good place to start, to try it and then report bugs, or documentation, because I think a lot of times we're too close to the solution, it's hard for us to see what's missing. Like for example, with the KMSV2 stuff, by having other people try it, we realize there are bugs that we didn't think of, so, and documentation, right, always need more documentation, so I definitely think stable and implemented caps are a great place to start, and also come to the meetings, a lot of times we're looking for feedback as well, right? I'll say that I think some of where you wanna start depends on what you want to do. So if you want to come in and report a bug and fix a bug, triage meeting is a great spot for you, right, if you wanna come in and say, hey, I really want an integration like this, something on a mailing list with a fairly crisp description of what it is you're doing, or maybe even a prototype sounding, something sort of like this is a great place to start, and then from there, maybe the sign meeting. Hand off, I think, to add Jordan. I was just gonna add sort of a third category, if you are looking to integrate giving feedback on designs while they're still in progress, it's way easier to accommodate feedback before we build a thing than after it's in beta and the shape of it's pretty fixed, and so some of the things we talked about today, especially things like reference grant that are still, we have a couple of use cases, but you could imagine many more, please look at the things that are coming up and think about whether they solve a need that you have, or if they're close to solving a need, but don't quite, or you're not sure. Jump in during the design and feedback phase, it's way easier to accommodate tweaks or at least consider use cases we hadn't at that point. And the spot to do that would probably be slack channel, mailing list, PR comment. If you know of a particular design, the enhancements repo, there's an issue with the number, when any time you see a cap number, number, number, that's an issue in the enhancements repo, and there should be links from there to the design. If you don't know what you're looking for, yeah, slack or the mailing list, it's a good place to start. And every one of the slides has a cap number to it, and it's linked to that cap. I think we have time for one more question. Yeah, so another one for me. The trust bundles for the certificates, initially this is intended for issuing certificates, but if you take it the other way around, say you federate two clusters, maybe not really on cluster level, but perhaps on Istio level, and you want them to be able to trust each other's CA, Istio, of course, sets up its own CA, because why have one if you can have two? Would it be possible to store that CA with maybe intermediate CAs into a chain, into a trust bundle, and then read that from the other cluster so that way you can exchange CA information? Yes, that is definitely possible. And there are also spots, if you go and look at that enhancement, for where we would be referencing cluster bundle, or cluster trust bundle names, or signers for use in other APIs. So I would definitely suggest you have a look, just didn't fit in the margin. Yeah, yeah, the one place where it currently breaks, Istio rewrites probes and health checks because otherwise the health checker wouldn't trust the CA that is applied by Istio. So what it does is it captures all your traffic, except your probes, because otherwise it wouldn't be able to successfully probe your pods. But with this, it would essentially be able to say, well, it's been signed by the CA, and here's the resource representing that CA, and now you can trust it, so you don't need to do any rewrites anymore, which would be neat. If I followed all that verbal description properly, then I think we're thinking along the same lines, but the comment about it's a lot easier to change this before something goes beta, now is your opportunity, it went out as alpha in like 127, so go ahead and give that thing a try and let us know. Excellent, thank you. All right, well thank you all for being here. I'm amazed that you all are here, and I'm happy to see you all. Thank you for coming to the conference. Thank you.