 Hi friends, so excited to be here in person. I'm actually from LA, but I live in Denver now, so I'm pretty stoked. Gonna try to find like some Theron, all right. I'm Filipino. Any Filipino community members in the audience maybe? I don't know, come say hi. Let's check later. Yeah, also my dog is here finally. First dog at a conference for me. So, I've been pretty involved in the Kubernetes community for a bit, and I used to work with the Fine WaveWorks people here, and I mean fine like in the good way, not like the adequate way. So yeah, also was really involved with the Flux project. So I'm a GitOps experimenter, just like all of you friends, and it's really important for us as practitioners to get together and to build good tools so that we can have the experience with GitOps that we deserve. Cause I think that we can all agree that right now our platforms and our tooling, we can see the vision of where they take us, but we're not quite there just yet, right? And so today's talk, oh, I work with the MRTON Zoo now. Today's talk is about the experience that we've had so far building Flux's multi-tenant API using some interesting Kubernetes features. And the reason that I wanna talk about some of these ideas is that we're not finished implementing, but we would really like to see the adoption of some of these techniques with regard to security models and authorization and authentication, the mechanisms by which we authorize ourselves, to some of the other projects that are involved in the space as well so that we don't just have some of these benefits in Flux, and we don't even just have some of these benefits of the way that we stitch APIs together across namespaces and kind of things like that with multiple objects, not just in Flux, but maybe in other projects where we're messing with gateways and networking or maybe policy, so there's a lot of space here to explore. So when we're building a Kubernetes controller, we typically have our own registered API types, right? And so I've got my kind of thing, and then maybe I'm working with some other kinds of things, whatever my domain of programming that I'm trying to reconcile and automate. And then those kinds of things, sometimes they have child objects, right? So maybe my first kind, it needs to read and write to config maps, and then it also needs to read a bunch of credentials from a secret, right? That's pretty common. Lots of things need secrets so that they can get credentials to access external APIs and then control things. And then my second kind, oh, that also mutates ingresses or whatever. And so I have my use cases in mind, and I make my API and I write my controller code, and then that controller needs a service account, and that service account needs a role binding to a role that lists a bunch of these kinds of things, right? So I need my own personal kinds of things so that I can update the thing, and then I also need all of my child types of things so that I can go and mutate those or read those. And so you can see, it's sort of a non-trivial problem, but at least it's like well understood because when we come to the process of writing our controller, we know what we're working on. But in the GitOps controller landscape, right? Say I have like a couple of types of things, like web hooks and like sources that represent Git repos or OCI registries or whatever. And then I have my other resources that are actually designed to declare how I wanna mutate the cluster, right? Like my applyer type resources, right? With Flux, we have the customization, we have the helm release in other projects. We might have an app or an application depending on which project you have. And so we still have that same thing where it's like okay, our controllers, they need some objects, they gotta go get some secrets, maybe they gotta go talk these objects together, right? Like I can't just apply nothing, I have to apply some sort of source, like a Git repository. That's what we do in Flux at least. But then, now I don't know what my controller needs to do. My controller has to be able to create deployments and secrets and certificates maybe, or other kinds of object types. And we don't know that ahead of time for each person that might want to apply some things to the cluster. And then things get a little weird too because maybe we're applying some stuff and we're not actually applying things to the same namespace. Or maybe when we want to change things like network policies or add namespaces to the cluster and those things might be at the top level of the cluster. So they're not even inside of a namespace. So this applyer type relationship, this identity, it needs something more granular per object. Every time that I want to declare that I want to apply something over and over again, I can have multiple of these things. And I want to granularly define which one we, or for a particular apply object, what RBAC permissions it has. And so this idea of the applyer identity, what do we do about it in Flux? There's a couple of mechanisms for authentication that we can use to get this applyer identity into the controller for the period of time that it's doing just that one thing. First we have the service accounts. And this is really common across all of the projects, right? Like regardless of if you're using Flux or CapController or Argo, CD, you're gonna be able to, for a particular group of resources, whatever app you're applying to the cluster, you can just say, hey, when I do this, I want it to use this particular service account. And usually the way that works is the apply controller has RBAC access on its controller service account to fetch some secrets. And the cool thing about service accounts is they're this very peculiar mechanism which will break apart a little bit, but basically the API server in Kubernetes, when it sees a service account, it's like, oh, let me go make a token for that thing. And I'll put it into a secret in the same namespace next to the service account. And so if we have access to those secrets as a controller, then we can go and get the service account secret and then we can pretend to be it when we talk to the API server. And so this is a way for us to change our identity from the controller service account or user, whatever you're using to authenticate the controller to the cluster, we can change it to the service account identity through the token, by just changing the way that we talk to Kubernetes a little bit. But there's a little bit of a problem with this, and I really wanna advocate that we start stepping away from this technique. We have these pieces, right? Service accounts make these essay token secrets, and then we allow our apply controller when it's reconciling the apply object to go and get that secret and change the way it talks to Kubernetes, and then it produces most likely a deployment, right? It's reconciling some folder from a Git repository, and there's like deployments and config maps and all sorts of stuff in there, right? And we wanna run some pods because we're using Kubernetes, and we want some compute. Well, service accounts are designed for pods. It's a very particular use case, actually. We're sort of abusing Kubernetes mechanism here when we wanna use essay token secrets inside of a control loop temporarily. But they're meant for pods, and every pod has that spec service account name, and so you can actually, if you happen to be deploying resources to the same namespace that you're applying custom resources in, those pods can be defined to actually, they can accept that service account identity, right? They can take it on. But this service account identity is likely purpose-built for reconciling your application. It's designed to modify and deploy workloads to the cluster and to run your custom code or somebody else's. And so say you take an app update of somebody's rendered application or something from a third party, and they decide to specify the service account name of the thing that reconciles itself. Well, now that pod has the ability to deploy workloads to your cluster, right? So one solution is to actually just always apply. If you're using service accounts, you can just go to a different namespace, right? Now there's no service account that's adjacent to the pod in the same namespace, and then you don't have that path for privilege escalation. It's sort of a privilege escalation because your pods are not really meant to be reconciling themselves, right? If you wanna deploy pods from your pod, you should be using a separate service account. So there's this other bit. It's a little bit more nuanced, but we often talk about GitOps because it's really good at helping us bootstrap a cluster from the very beginning of time to our desired state quickly. And we want that to happen fast, but service account tokens actually, if you're making a lot of them, they require a cryptographic operation inside of the API server to make these JSON web tokens. And that takes time, and sometimes it might take a long time if you don't have enough randomness available. So that's another benefit is if you can find a way to not use service accounts, then you don't have this extra mechanism happening behind the scenes, and you also don't need your controllers to be able to see all of your secrets, which is maybe not such a good thing, right? So I wanna talk a little bit about user impersonation. This is a mechanism that we became really curious about with the Flux project, and we started writing up a proposal. It's proposal 582 on the repository if you wanna go and look at it. And service accounts, they're basically just normal users. Every user just has a string that represents it inside of the Kubernetes API. There are some users that are maybe tied to Gmail accounts. There are some users that might be tied to some SAML flow or some social off flow to GitHub. And they each have a string with a domain name or some sort of delimiter and namespace that tells it where it came from. And so service accounts, they're just username strings. It's actually possible to impersonate a service account directly, and I can demonstrate that right now here. All right, so like I myself, I'm authenticating with a certificate to a Kubidium style cluster, and I can get pods, but I can't get pods as some other user. Like, oh, okay, well, this API server response here, it's saying forbidden, some user can't list the resource pods, right? So I have changed my identity, and that's because as a cluster administrator, you have the permission to impersonate anybody. And oftentimes our GitOps controllers do have these very close to super user level privileges. So this capability is already here. And if I also attack on a group, system masters, then suddenly I can get privilege again. And remember how I mentioned that you could impersonate a service account? Well, here is me impersonating the pod garbage collector. But you can see that not every service account has that permission, right? The one from the default namespace, that one can't do that, right? And so if you've never played with this impersonation feature before, even just as a cluster administrator, I would really encourage you to try it out, super cool. But what can we do with that, right? So if a service account string is kind of just this like system colon service account, then you have namespace and then the name of the service account, can we copy that? Like what if we gave our controllers the permission to impersonate users? And then every time the controller impersonated a user when it was operating an object from a particular namespace, it had some prefix and then attack the namespace on in the same exact way. And then it put the name of the user it was trying to be. Then we can get the same kind of security properties of the design of service accounts in Kubernetes, right? Something that's designed to be restricted to a namespace. But without the mechanism of the crypto and without having to use these special additional objects that we don't actually want pods to be able to use. And so that's what we can do with our APIs. And we can build any API around this, right? This is not something that's just unique to GitOps or to Flux. You can also impersonate groups. Every service account, if you didn't know actually has a group, there's a couple that gets hacked on there. So there's system service accounts, which is every service account across the entire cluster ever. And then there's the namespace group as well. So if you wanted to target a particular number of service accounts from some namespace or say that you as a cluster administrator wanted to roll bind to all of the service accounts from a team's namespace, you can actually do that with a group. Super cool, right? This is worth mentioning, but with role-based access controlling Kubernetes, it's very important to know that namespaces roll bindings can target service accounts and any username. And so you can actually target service accounts from other namespaces. A lot of people don't do this sort of like allowing access from objects from one same namespace in another namespace, but it's natively possible already. And that's important for like the rest of the proposal. But we'll like say with Flux, we put these groups on there as well. And so we've been talking a lot about the in-cluster identities, right? Focusing on service accounts and user names that are happening inside the cluster. But we do have the use case with GitOps controllers to actually control what we want our declarations to be from one Kubernetes cluster as a management cluster style of topology. And so if you have another cluster that you wanna apply to, you can use this in-cluster, applyer identity, right? Like all you need is like a certificate or a token or you can exact a credential helper and you would use a Kube config for that. So a lot of GitOps projects, right? They have a Kube config file that they load into the data of a secret. And then you can give that secret to your applyer object in Flux, we have the customization and the home release. And then now instead of talking to your own Kubernetes cluster with your credentials, you talk to another one and you can manage objects and garbage collect and health check, all that stuff. And that's cool except that Kube configs are kinda dangerous. So they can execute binaries. And also if you're running a Kube config from within the file system of a controller, you could access the controller service account files directly because they get mounted into the file system. So there's a couple of sanitization recommendations in the proposal here. One is to use a Kube config bin directory that you can look path on the command path or command fields of the exec helper section of a Kube config. And the other is to sanitize files for the token field and stuff like that. Basically just reject the Kube config if it's trying to do something malicious. The other part of the proposal that's probably more cerebral but more important is that we can design APIs that allow very interesting and granular tendency properties. So this is about using policy to allow objects to talk to each other and to allow access to each other across namespaces. You see in our GitOps security model, right? Say we have a source type object like a Git repo or a registry and then we have this supplier. Well these things need to hook up, right? Like they need to link together. And the apply needs to be able to fetch some HTTP resources about the source object that got unpacked from some registry. Well if these are in the same namespace, just like how pods can mount secrets and use service accounts from the same namespace, that's kind of not a big deal, right? Like you put the source and they apply it together. But not everyone wants to structure the way that they work like that, right? Some people wanna manage their apps from one namespace and they wanna have like another team that manages the helm repository in another namespace. And so how do you get these things to be able to be used in an intentional relationship where every supplier in the cluster can't access every source in the cluster, right? Now you have this policy problem. How can I get some tenancy going? And one approach to that is to add this access control list and we're starting to see this in some service and gateway style APIs as well. It's the trend that's happening in the industry. And it's a very quick thing to reach for. And the way that it works is say your source controller and your apply controller. You have some trusted code sections in there and when the apply controller fetches a source object, it checks to see is the object that I'm operating on in a namespace or in some category or whatever that is allowed in the access control list of the source object that I just fetched. And you trust the apply controller to then say, if it's not in that access control list, it stops reconciling and it throws an error. And this is great, it's simple, it's easy to accomplish. But I think that this way is a little bit better. And basically, if you can delegate to the cluster administrator, that don't trust the apply controller to do the right thing. Make sure that the apply controller is always impersonating an identity. And then you can roll bind to that identity so that if the apply controller doesn't have access to that source object, it can't even fetch it. It can't get the information about the HTTP server and the credentials that it needs for getting the source code. And this allows you to have a multi-tenant environment inside of a cluster, right? You can protect the source code from other teams that maybe shouldn't be seeing that. The other thing that you can do here is that instead of asking your cluster admins to do the roll binding, which is not that hard, you can actually generate the roll binding definitions and give source controller or whatever object actually owns the access control list, the ability to create the roll bindings for you. So this is a way to get that convenience-style API without putting the onus on the apply controller implementers without giving them the trust to implement the actual access control, right? So the control stays within source controller's generation of the roll binding and it's implemented by the Kubernetes API server instead. And I think that this is a really good design. The complexity trade-off allows you to do things like take that code that you have inside of the apply controller and distribute it as a command line to a library so that now, since you don't have to trust your client, you can actually give that and run it on any computer that has access to Kubernetes. So that way, you're not just trusting people to follow the rules. So you actually get to ask Kubernetes to enforce it using the API that's designed to do it. And, but yeah, there's a lot going on in this proposal. I really wanna share these ideas with all of us implementers. I want us users to have an expectation and to have faith that when we are using our tools, that we can trust them to provide a secure basis for a platform that's enjoyable to use and is extensible, right? If I have the idea to add something to the command line tool, I wanna be able to do it and not revoke the trust of other platform admins everywhere. So, check out the pull request. It's been up for quite a while and there's still a few typos in there. There's like 66 comments, add your own. I really appreciate some feedback. And yeah, let's chat. Flux 2 582. This is my Twitter handle. Hit me up. My DMs are open. Follow me on GitHub. And let's chat later. Cheers.