 Thank you, everyone, for coming. My name is Rob Scott. I work at Google on GK Networking. And today we're going to be talking about referential authorization and what it might look like in Kubernetes. This is particularly relevant to me because I've spent a lot of time thinking about Ingress and Gateway API and how, unfortunately, insecure many of the implementations of those APIs are. And hopefully this can help. Hey, everyone. I'm Mo. I work at Microsoft. So this talk really appeals to me because it intersects multiple community roles that I hold. So I'm a segoth lead, as well as a member of the Kubernetes Security Response Committee. So yeah, I'm really looking forward to having this stuff. So secrets. So I like to keep my secrets to myself. And I hope you do as well. And unfortunately, that's not always the case in Kubernetes. So in particular, if you run Ingress Controllers today, you give them access to all your secrets, whether you thought about it or not. But this is not a good thing. It's actually a really bad state. The isolation between the data plane and control plane of most Ingress controllers is actually pretty weak. And when I say control plane, I mean the go code that is your Kubernetes controller. And your data plane is the actual raw C code that's running your networking stuff. And as you can imagine, when you put these things close to each other in ways that aren't necessarily what they were designed to do, well, you get CVEs. So I'm going to pick on Ingress Engine X for a little bit here. And when you talk about CVEs, you want to find the really good ones, or the really bad ones, that are high severity. So we've had seven so far. And I was like, OK, I'll pick the worst one. Oh, I'm sorry. They were all high, so they're all bad. So we'll pick my favorite one because it's just funny. So did you know that you can embed Lua in your Ingress config and have Engine X process it for you? And you know, what could go wrong? It's amazing. So what if I had some Lua that wanted to read a file on disk? And maybe I wanted to put the contents of that file into a variable. And then maybe I wanted to have an endpoint on my Ingress that just output that nice low variable. And so now, if I can make an Ingress, I can just, I don't know, echo out the service account controller token that, by the way, you decided to grant full read access to all secrets to. So it's not great. There goes all your multi-tenancy, all your fancy security controls because you used the Ingress controller. So how do we fix this? So we made a new version of Ingress Engine X. And then we turned that feature off. That's how we fixed it. We just said, don't do that. So what can you do today without all this fancy stuff that we're going to talk about? Well, Ingress controllers like the Istio one do take great care to separate their control plane, the Kubernetes controller, from their data plane, the actual networking stack. So pick an Ingress implementation that has that separation. So that way, even if you do embed Lua config, it doesn't end up causing a CVE. The other thing, it's not particularly ergonomic, is you can run multiple instances of your Ingress implementation in isolated namespaces and only grant them access to that specific namespace. So instead of having one mega Ingress controller, you can have many small ones. If you have a CVE in one, you lose isolation for that one, but not all of them. So it's a way of limiting the blast radius for when things go wrong. So Mo helped paint part of the picture here of how we ended up thinking about referential authorization, but there was also another discussion going on in parallel. We were obviously thinking about the limitations of safety of Ingress controllers, but at the same time, we were getting feature requests to cross namespace boundaries. And what do I mean by that? I mean the ability to reference something in another namespace. So for example, in Gateway API, which I'm very familiar with, there was a request to reference a TLS certificate in a different namespace. So let's say your TLS admins, for example, are all running in one namespace. They want to manage their certs in that namespace, but your infrastructure admins or your app admins may want to have that configuration in a different place. And they don't need to see each other's secrets. So why couldn't we just have some kind of cross namespace mechanism that could do that safely? Then similarly, what if you have all your infrastructure, all your load balancing and routing configuration in one place in one namespace, and all your applications that you're trying to route to are in different namespaces? What if you could do that safely? That would be really nice. And then along those same lines, storage, SIG storage was thinking about, hey, what if we could provision a new volume from some other namespace? So you had all these use cases coming together, and this was as we were developing Gateway API, and this was something that we really wanted to support natively in API. So one of the projects that we looked to for inspiration when we developed Gateway API was Contour. And Contour had a concept like this, and this allowed you to say via annotation because everything in Ingress was in annotation, unfortunately, that you could say TLS certificate namespace prod certs, and this meant that any reference to a secret was actually to that other namespace. And we took that to the next level with Gateway API, and now in Gateway API, you can just specify the namespace directly inside the certificate ref. And you may say, well, hold on a second, that doesn't maybe sound like the safest thing ever because you really want your Gateway to be able to read any secret in your cluster. Maybe that's not the best, and that's correct, and so this on its own is not valid. You have to pair that with the other half of the handshake is what we call it. So if one half of the handshake is this reference pointing out, you need something else to say I trust that reference. So we introduced an API called a reference grant, and that API is pretty straightforward. It says I, as the owner of this secret, trust references from gateways in the prod certs namespace in this case. And this is actually working fairly well. This exists in Gateway API today. It's also used by SIG Storage now, and as we're starting to see more usage of it, it became clear that maybe we should try and make this a more formal, broader concept than just Gateway API and actually try to bring this into tree. But this on its own is still full of limitations, and those became clear as we started the discussion of how do we take this into tree into Kubernetes directly. In its current state, it relies on controllers to implement all the logic. So it relies on your controllers implementing this well, and unfortunately, this doesn't do anything to limit the read access of those controllers. By default, the controllers can still see everything, and they're just limiting what they expose or configure. And then controllers still needed to really understand how to follow the references. And this sounds kind of obvious, right? If you're implementing Gateway API, you know where the references are that you care about, but if you're thinking about implementing an authorizer based on this, you need something that's a little bit more generic so that anything for any API could work. And related to that, in the current mode, there's no way to differentiate different kinds of references in the same API. So say, you notice that from Gateway's two secrets, what if you're saying from pods? Pods have a million different object references. Which one are you trying to give access to? It's not as clear as it might seem. So those are the limitations of what we have today. Let's talk about what's next. So what are the goals for an API, an authorizer, that can help us fix these issues, right? So we want the authorization access to match what is actually necessary for the controller to work at runtime, right? You can no longer statically configure broad read access to secrets, for example, right? The other thing that was important for us is the users that want to adopt this API, primarily storage related use cases and networking, their APIs are already segmented by this concept of a class. And so we felt like, at least optionally, we should support that as well since it's important to those environments. Rob has already discussed the reference grant stuff, so we want some mechanism to extend all of this to across namespaces. And probably most important of all, whatever we do here, it needs to be good enough and easy enough to use that people will actually adopt it, right? So we have a few personas in the APIs we're building. We have Alex, the API author. So APIs and Kubernetes tend to be CRDs. We have Kai, the controller author, and we have Rohan, the resource owner, okay? And you might be thinking for a second, why are Alex and Kai different individuals? Well, the reality is in simple projects, they're not. You write your own CRDs, you write the controllers that match and everything's fine. Gateway API is complex enough where you have effectively the reference CRDs and many, many implementations, right? So that's why we kept these as separate personas. So what does Alex do, right? So Alex has the core responsibility of being able to understand the schema and therefore he can find the references, right? So it's easier to sort of think about this through example, right? So we have this reference pattern resource named to be determined. And it is saying that, hey, when you see a gateway resource at a particular version, right? Versions matter a lot for finding references because CRD schemas evolve over time. So you can't just pick a specific place where your references live. But at the V1 version, Gateway API does have a class concept so you can optionally set how to find the class. And then, how do you find the reference? So in this example, we've used JSON path. We're unlikely to actually use that in production because, well, it's not a well specified language. But if you squint, you can kind of see that it's gonna iterate over the listeners, find the TLS certificate references, and then use that to find the name of the actual secrets. But again, the authorizer doesn't know what the actual resource you're referring to. So we have to also tell it, hey, by the way, these are secrets, like Kubernetes secrets. And then we have the purpose, which is I want to use this for TLS serving. Now let's continue on to the controller author. The controller author who's gonna write the code, build a Docker image, Helm charts, all the YAML stuff. The thing they understand really well is what is the identity of the controller. So in Kubernetes, we tend to refer to that as the subject. So in this example, we have the contours service account. And contours a gateway controller that supports class. So it is the contour class. And now we want to describe that, hey, I want access to secrets referenced from gateways for the purpose of TLS serving. And if we think back to the previous YAML, we can see that these match up, right? That the from, two and four on both of these match up between the controller and the API author. So that brings us to our last persona. And this is Rohan. Now the previous two personas that Mo talked about are fairly complex, but they're also something that we don't expect many people to have to interact with. There aren't that many people that are writing controllers or writing APIs. So they are relatively complex, but the resource owner is most people that are interacting with Kubernetes. If you're interacting with Kubernetes, you probably own something inside a cluster. And for the most part, you don't need to do anything. But if you want to allow a cross namespace reference to a resource you own, you can do that with this model. And to do that, we have brought back the idea of reference grant. So if you remember that reference grant I was showing way back in the beginning, this is very similar to this, but it matches more closely with the API as Mo was just showing. So we have a from, that's going to look familiar. So from gateways to secrets, and not just any secret, but to the Acme TLS secret that I own. And then finally, for that purpose of TLS serving. So these all have to match up. So if that contour consumer, for example, from before was implementing this, but for a different purpose, they wouldn't get access. So all of these things need to line up, the purpose, the from, and the to. So that's a lot to take in. I can understand if at this point you're just awfully confused at the whole idea of this. So let's try and take a high-level view of what these different things would be. So there's the reference pattern, that's the API author. And that's really just the API author defining where are these references coming from. And then optionally, if it's an API that has some form of segmentation, where that segmentation is. Then there's the reference consumer. This is the controller author. So instead of deploying an ingress controller with read access to all secrets in your namespace, you deploy one of these instead. And that would say who to give access to and under which condition. So my ingress controller or my gateway controller supports references from gateways to secrets for the purpose of TLS serving, as an example. And then finally, the last thing is reference grant. And the odd chance that you do want to allow cross namespace references, completely optional, then you can do that with that resource. So let's take a look at a gateway and how this all fits together. So remember the first persona was Alex, that API author. And Alex is really pointing to two things. One, gateway has a form of segmentation and that exists at gateway class name. So you're providing a path to that. And then second, you're providing a path to where those secret names exist. So that foo example comm search. So they're these complicated JSON or cell paths. But these are things that need to be written once along with the API definition. And then you never need to think about them again. Next is the reference consumer. This is instead of bundling our back to read all secrets or whatever, you're just bundling our back and saying, hey, I want to implement this form of references and specifically only for gateways of class contour in this example. So not just are you getting all secrets referenced by any gateway, but just gateways that are relevant to you. And then finally, let's imagine that this gateway makes a cross namespace reference. That's where reference grant comes in. If you want that secret owner to be able to allow that reference to exist, they need to create a reference grant to say, I'm completing the other half of the handshake. I trust references from that gateway to my secret. All right, so with all that said, I have to give complete credit to this next section to Leo. He's right in the front row. This proof of concept would not have happened without him. Part of working through all of this is you think about these APIs, you put it all on paper, and then does this even work? And Leo actually wrote this all out to make sure that, yeah, this could actually work as an authorizer. So let's do a very simple example of this working in real life. So to get this started, we'll just export a controller service account, this just environment variable, because we're gonna be coming back to this service account a lot. We're just going to be testing whether or not this specific service account has access. And we're trying to grant it access without any kind of our back. So next, we'll start just because we haven't done anything else. We'll use kubectl auth can I, as that service account, access this specific secret. And of course, because we haven't done anything, no, we don't have access to it. Now the next thing you're gonna wanna do is you'll, as an API author, you'll create that reference pattern. And the eagle-eyed among you may notice that the actual YAML I'm using here is not reference pattern, it's called cluster reference grant. This is unfortunate that we have cycled through at least three different names for this resource so far. And the POC is using this name. This is the longest-lived name so far, but we could use help with naming very clearly. So, it's the same thing as reference pattern. It fills the same purpose. We'll go ahead and apply that. Okay, and then the next step is we need to attach that to something, right? Like, we need to say which controller, which identity is following that pattern. So in this case, we're going to say the demo controller service account that service account we've been using before is following that pattern. We'll go ahead and apply. And now you say, well, okay, does it work yet? We connected a grant, a pattern, whatever, to a reference consumer. And you might say, well, that's all you need, right? It should work. So we'll do kubectl, auth, can I? And no, we still don't have access. So remember, the whole purpose here is this is referential authorization. So if a reference doesn't exist, then you're not going to be granted access. This only works if there's actually a reference. So it's really the minimum viable access a controller would actually need to implement an API. So let's create a gateway. And this gateway is going to refer to the secret that we've been testing for a while to see if we can have access. So we'll go ahead and apply this gateway. And with that gateway applied, let's check and see if we have access. And we do, so it works. So that part is great. But you may be thinking, well, okay, authorization is only so good if it grants you access, but you kind of want the inverse too. Like if any of this goes away, we need it to immediately revoke that access as well. So let's test that out. We can delete any one of those pieces and the access would be revoked. But let's delete the reference consumer in this case. And we'll try that access again. And yeah, no access. So end to end, this works. It is still very much a POC. But some of you may be saying, oh, okay, yeah, this looks great, let's ship it, let's run with it, you know? Let's go. As you may tell, you might be able to tell by now, we have a ton of open questions left. We've shown that it can work, but we do have plenty of open questions that we need to work through still. As you can tell, we still have no idea what to call these things. We've been fiddling around with a few names even in this presentation, sorry for that. And we wanna have names that actually make sense. So if you have any ideas, we're all yours. Then how on earth can we make this easier for controllers and users to adopt this? Because this is only actually useful if controllers and users can make use of it, otherwise we'll just default to the easiest path, which is giving your ingress controllers read access to All Secrets because it works. There is also a real pain point for controllers with this method. Controllers, by default, the easiest thing to do is to watch a long list of things. So All Secrets in this namespace or in this cluster, this requires a different approach that would be a lot of individual watches. There are tools that can help with that, but it is a significant change and we need to make that a better UX if we expect any controllers to actually adopt it. And then kind of a bit of a niche question here, but how much interest is there in a namespace scope reference consumer? We've kind of focused on these cluster-scoped level of access, but what if your controller really only runs in a single namespace at a time? Are these same patterns useful? Not sure, but another question that we're still trying to figure out. And then finally, well, went back a little too far, the other thing I wanted to cover is that what on earth should this grant access to, by default? So far we've just said read access because that's what we care about so far in all the use cases we're aware of. But would you want to use the same pattern for write access? Like if you follow a reference, should that grant create update? I don't know. So far we're just doing read access, but open two additional use cases for this pattern. And then finally, the ultimate open question, should we do something else entirely? And with that, I'll hand it back to Moe. So the alternative that we're considering is if you squint and think about all the pieces that we talked about so far, if you were willing to make some changes to the objects that you were trying to grant access to, in particular if you were willing to label them somehow, could you squint and make all of those properties, the for, the froms, and the twos, into special labels? So you can kind of see this pretend label of like, hey, I have a purpose for my gateway and it's TLS. And you could logically extend that to like, hey, I have a class and this is the class. So this can be attractive because in some ways it feels simpler and easier to explain. So why might we not do this? So if you look at all of the design that we talked about with the Referential Authorizer, it's critical to notice that we are not changing anything about how authorization works in Kubernetes. We are just building an incredibly complex and fancy authorizer, but we are not changing what authorization means. The second you start talking about something like label selectors, you actually have to change the meaning of authorization in Kubernetes because there is no way in Kubernetes today to ask an authorizer to make a decision while considering something like label selection in that mix. So subject access review is the core API that defines how authorization works in Kube and there is no field in there for label selection. So we'll have to add that and they'll obviously percolate through the ecosystem. The other aspect is other than list watch and delete collection, the Kubernetes REST API does not have the concept of label selection. So importantly, GET does not have the concept of label selection, nor does CREATOR update. So we'd have to work through that. How would we go about labeling things? Obviously you can ask your end users to do it and that might be okay, but it seems to be better to have some form of automation around this. So we might end up having to build some pretty fancy controllers to do that. Labels are pretty complex, in particular you can represent not relationships in them. That probably is not a great idea for your authorization stack because it's kind of hard to reason about a bunch of ants and not so all mixed together trying to figure out what did you actually intend to do with that. And inevitably, once you start talking about touching authorization with label selectors, someone's gonna show up and say I would like field selectors. And for good reason, I want my nodes to be able to be authorized to only field select on their pods. That is what they do today but they're authorized to look at all pods. They aren't restricted because there isn't a way to restrict that action today. With all these concerns, you might be asking well why consider this at all if there's all these negatives? The core benefit of an approach like this is it is trivial to implement in a controller because effectively all you have to do is update your list watch semantics to be still cluster wide with a particular label. You can do that today. You don't get the authorization benefits but your controller can do that today. You don't need any special libraries, right? The stuff that Rob is talking about, we're gonna write a custom go implementation to help you do 10,000 watches and make it efficient and work perfectly and all that. But if you decide to not write your controller and go, well then you get to re-implement that entire library also. So what can you do until we have any of this, right? So I mentioned this earlier. You really want to have an implementation that has strong separation between its control and data plane, right? Unfortunately, you can't tell just by looking so you're almost certainly gonna have to talk to someone and figure it out, right? If you're willing to take the pain point of having many small copies of your ingress controller running in individual namespaces, that's a choice. Please turn off all the Lua config. Just don't do it, find some other way. But in reality, right? If there is a way to harden that controller or that ingress implementation, find the maintainers, ask them how to do it, ask them to document it and deploy it in that way. Yeah, so this is a big project potentially and we could use some help. So what's next? Well, first off, if it's not already clear, this is still in early stages. We're bringing this to KubeCon because we really would benefit from feedback sooner than later. We want to hear about your use cases about if this model is beneficial, if something like RBAC label selectors is actually just way better or simpler for you. Feedback is really, really helpful at this stage. This is one of those rare opportunities where we have an opportunity to meaningfully affect a new authorization model inside Kubernetes. So if this is something that interests you, it's a great opportunity to get involved. I've linked our CAP number there, 4387. You can find the POC that we demoed earlier. That's Kubernetes SIGs reference grant POC. And then also, we're on Slack. Probably the best channel is SIG Auth, Authorizers Dev, but we'd love to get some additional feedback, use cases, et cetera. And if you're interested in getting involved more, whether it's development or what, we'd love to have you. And with that, I think we've got time for some questions. Should we run the mics over? There are mics if you stand up on either side there. My great talk. I actually have a question about the POC. And I was wondering, what is the actual policy enforcement point there? Is it at the level of Admission Webhook? Yeah, I completely glossed over that. That's a good question. This is Admission Webhook, an authorization webhook. Thank you. So you do have to, I guess it's not a deny authorizer. So you can put it wherever you want, but you have to not grant broad access in other places. Thanks for the talk. In your example, the four-purpose thing field, it feels like it's one-off, much-label case as well. Is it the same idea? Because it's just one label you're allowed to apply. Yeah, I completely get that. We didn't do a great job at explaining that. But let's imagine you're talking about an API like POD, API where there's a lot of different object references. Each one of those, let's say you had an object reference, two different fields. They could each have a different purpose. Or as another thing, you could have a different purpose for different kinds. Like, let's say you had the same object reference, but it could reference either secret or config map. Maybe those would be for different purposes. But it's really just the author of the API that would document and define, these are the purposes that you can connect and use with my API. And then a controller author would just say, I'm using this specific purpose. This is what I need authorization for. Does that make sense? Yeah, it's just like it's some new concept, I think. Yeah. Introducing new concepts. Yeah. I think it also plays in when you have the reference grant bits. It helps the controller understand the context of the situation as well. So that's kind of why it's percolated out that way. Yep. So with the label grant version, like the alternative of sticking labels on the resources, does that require a label both sides? So if you're going cross namespace, you have to label the request and the object that you're requesting access to. Because the only reason I asked this is I was in another talk upstairs earlier, and the guy was talking about how, oh, look, you just gain access to right labels. And unfortunately, people use these, and that's a dumb idea. And has that factored in? Yeah. So that's a really good question. So the label selector alternative works great for the first use case of limiting secret access. It's unclear how it translates to cross namespace references. It's possible that maybe we build some kind of controller that looks at something like reference grant and translates it to labels on our back. Not quite sure yet. But it does a very good job at solving the first problem, unclear about cross namespace yet. Cool. Thank you. Cool. I think that's all the questions. Feel free to come up and chat if you have anything else. Thank you.