 All right, well welcome, my name is Jordan Liggett. I work at Red Hat, I work on Kubernetes and on OpenShift. And I'm part of the team that helps lead the Kubernetes authentication and authorization efforts. And today, I'm going to be talking about how you can use RBAC more effectively to secure your API servers and your Kubernetes APIs. So what is RBAC? RBAC stands for Role-Based Access Control. It's one of the authorization modes that you can use with your community's API server. And its job is to answer the question for every request that comes into the server. Can the subject that is submitting this request perform this verb on this object? But that's a little dry, and it's the last session of the last day, and we're all fried. So I figured maybe we'd start with that instead. So instead of deployments or pods or whatever, I'm going to walk you through sort of conceptually how the RBAC design is laid out using this question. Can Bob educate dolphins? So here's a little story. This is Bob. Bob's a pirate. Bob works on a really strict ship, and if you don't ask for permission, you walk the plank. So Bob is the first mate of the green ship, and his responsibilities, his permissions, are to help the captain and train the crew. And so if we're thinking in terms of roles, we can say his role is first mate, and the role is just a named set of permissions. And so the first mate role includes the permissions of helping the captain and training the crew. And that role is granted at a particular location. So he has that role on the green ship. Now, there are lots of ships. Every ship has a first mate. It makes sense for the permissions to be uniform across the ships. So we're going to define the first mate role globally. It means the same thing across all the ships, but we grant it locally. So Bob can only do first matey things on the green ship. Defined globally, granted locally. Now, training the crew got boring. So one day Bob asked if he could also train dolphins. And his captain thought that was a little weird, but whatever. So he said, all right, Bob, I'm going to give you, in addition to your first mate role, the role of dolphin trainer. And you are allowed to educate dolphins. But let's be honest, we're not going to use this role on any other ships. So we're just going to define this role locally. We're going to keep this one to ourselves. And because it's defined locally, it's also only granted locally. So now Bob has two roles, dolphin trainer, first mate. And he can do anything that either of those roles allows. So if we go back to our original question, can Bob educate dolphins on the green ship? The answer is yes. So we're almost done with our story. Just have a little post log. Whatever happened to Bob, it turns out he got himself elected pirate king. So it must have been his strong sense of purpose. Thank you. And so this is a permission that is dealing with something global. The armada is a global thing. And so the role that allows it has to be global. And it also has to be granted globally. So these are sort of the ways you can think about defining roles globally or locally, and then granting them globally or locally. So enough about Bob. Let's look at an actual API request and the types of questions that it would ask of the authorization layer. So a request comes into the API server. It's got an HTTP method and a URL and some headers and some other stuff, the body, that other stuff. But the first thing we do is parse out request attributes from the verb, the method, and the URL. So you can see from this, the post method maps to the create verb. We extract the apps API group, the namespace, and the resource. And so this set of request attributes becomes the input to the authorizer. The next step is authentication. So the authentication layer looks at the request and determines who is making this request. And in this case, there's a bearer token. And so the authentication is going to look at that and determine, OK, this request is coming from Bob. And so the request attributes and the user information become the input to the authorizer. And so for this request, the question we're going to ask is, can Bob in the group system authenticated create apps deployments in this namespace? So let's look at how we would set up our back to allow that. This is basically what we want. We want Bob, OK, and Bob's back. We want Bob to be a deployer in namespace in S1. And we want that to mean that he can create deployments in the apps API group. So remember our roles. We're going to set up a deployer role. And that deployer role is going to map to this one permission, create apps deployments. And then we want to grant that inside the NS1 namespace. So RBAC uses API objects to define this policy. So first, we're going to create a role object. And this looks like every other Kubernetes API object you've seen. It lives in the RBAC API group, which is V1 as of Kubernetes 1.8. It has a name. We're just going to call it deployer. And we're going to create it inside the NS1 namespace because that's where we want to grant these permissions. And remember we said that a role is just a named list of permissions? Well, an RBAC role has a list of rules. And each rule has the opportunity to match the attributes on an incoming request. And so this role has one rule, and it matches the attributes on the request that we saw. The create verb is included in this list. The apps API group is included in this list. And the deployments resource is included in this list. And so we can pass this to keep control, create, and we have now defined a role inside our namespace with the permissions we want. So the next step is to grant this role to Bob. And we do that by binding the role to Bob. So we create a role binding object. No surprise, in the RBAC API group, we are also creating it inside the NS1 namespace. And the location of the role binding determines the location of the permissions. So if we created this in a different namespace, we would be trying to grant permissions in a different namespace. But this is the namespace we're in. So that's the first important thing about a binding. Where is it? Second important thing, what role is it referencing? So this is going to reference the role we just defined. And the third important thing, who is it binding the role to? And so you can reference one or more subjects. In this case, it's just one. A subject can be a user, a group, or a service account. In this case, we're binding to a user with the name Bob. And that's going to match the user info coming from the authentication layer. So once we create both of these objects, we have defined a local role inside the NS1 namespace. And then we've granted that role also inside the NS1 namespace to Bob. So once we do this, Bob will be authorized to submit that API request. And just like it makes sense to reuse roles between namespaces, creating deployments is something that would be useful to be able to grant in other namespaces. And it would be a pain to have to create a copy of this in every namespace. We can actually define this globally by changing the kind from role to cluster role. And all that changes is the kind, and then the kind inside the role ref. So now we are defining the permissions globally, but still granting them locally. If Alice was in the NS2 namespace, we could create an almost identical role binding inside the NS2 namespace, referencing this cluster level role. And that lets us reuse permissions and just maintain policy in one place, but then grant it in lots of other places. And then finally, if we wanted to give this permission across all namespaces, let Bob create deployments in any namespace. Big surprise, change role binding to cluster role binding. And now you have granted it globally. So to review, there are two API objects you can use to define permissions. The first is a cluster role, which defines permissions globally. And so you would use this if the things you're granting are a cluster scoped, like nodes or persistent volumes. Or if you want to reuse a set of permissions across namespaces. Or if you want to give cluster-wide access, like letting someone list pods across all namespaces or create deployments in any namespace. The other object you can use to define permissions as a role object, you would use this if the permissions are only around namespace resources, like pods, deployments, things like that. And you only wanted this definition to be valid in a particular namespace. So that's the definition side. On the granting side, two objects, you can grant with a cluster role binding to grant permissions across the whole cluster. You would do this to grant permissions to cluster scoped resources, like nodes, persistent volumes, or if you wanted to give cluster-wide access. And you can grant with a role binding object in a particular namespace. So that's the overview of the RBAC API, two ways to define permissions, and two ways to grant them depending on the scopes you want. So next I'm gonna talk about how you go about setting up an RBAC-enabled cluster. So first, the easy way. This is a one-step process. Use a distribution or installer that does it for you. This is becoming more and more realistic, actually. Most major installers have at least an option for it, and many do it by default. So kube-up, kube-atom, distributions like OpenShift, GKE, Tectonic, many of them have an option and many of them do it by default. So this is actually becoming realistic, which makes me very happy. But if you have your own deployment mechanism or you wanna know what's happening under the covers, this is sort of the mechanics of what you would do. First, you tell the API server, start with RBAC authorization, and congratulations. You're running an RBAC-enabled cluster. But there's a little more to it than that. So when you start an API server this way, on start it automatically creates a set of default roles and role bindings. Now these aren't magic. You can actually go look in the GitHub repo and the definitions are there. You can look at them, create them yourself. You can edit them. The thing that is helpful is that these are auto-maintained. So on every start, the API server will recreate these or update them adding in new permissions. So this lets us go release to release and let these automatic roles continue working. You can opt out of the reconciliation if you want, but it's very handy. There's a set of default bindings to system prefixed usernames. So the thinking is there's a chicken and egg problem with any authorization system. If you start it up and it is denied by default, then how can the first user grant permission to anyone? And so what this does is pin the default roles to a set of system prefixed usernames, which lets you provision those components to use those usernames and they automatically have the right roles. If you don't want to use those usernames, you don't have to. You can use your own and set up bindings yourself. But this makes the setup a lot easier and we found to work well with most of the installed methods. The one break glass in case of emergency type of permission you would need would be a bootstrap super user. And so when you are setting up your cluster, you would want a credential with the system masters group. This is the group that is allowed to do anything to the API server. And so typical use is to set up with this, delegate more scoped permissions to other users for normal day to day use and then put this in a very safe place and don't use it for daily use. This is a break glass in case of emergency type of credential. The second level is the control plane components. Cube scheduler, cube controller manager, cube proxy. If you set these up to use these system prefixed usernames, they will have our back roles that allow the things required by those components. One note with the cube controller manager, the permissions required by the controller manager give it enough permission to start up all the individual control loops that it manages. There are actually individual roles for each control loop and passing this flag to the controller manager will tell it to use service accounts for those loops. That's recommended so that a bug or an exploit in one loop doesn't allow it to escape and accidentally start messing with completely unrelated resources. The goal is least privilege principle. Each control loop only has the power that it requires. The other nice thing about this is eventually it could let you run those control loops standalone in a pod completely isolated and just run a pod with that control loop role and it would be scoped to just the permissions it required. That brings us to Cubelets. So Cubelets are interesting. Originally they started with our back defined role but as we started thinking about what it looked like to enforce node isolation, our back didn't seem like a great fit. So we actually have a dedicated authorizer for nodes and the recommendation is to use that in combination with our back. So you can start up the API server telling it to use the node authorizer and the RBAC authorizer and if you give your Cubelets well shaped credentials they will be given permission just to their own pods and their own secrets and their own persistent volumes and not allowed to mess with other nodes, pods and secrets. That is what's recommended. That interacts well with the node TLS bootstrapping but you can provision credentials for your nodes manually or some other process if you want to as long as you make them well formed with this username and this group. All right, so at this point you have your bootstrap super user that can set up anything they need to set up. You have your control plane members working. You have your Cubelets working. The next step is usually add ons, right? Network plugins, DNS, the random things that you need to run, metrics, logging, whatever you are running that is accessing the API. What do you do about that? Well, fortunately many already include RBAC role definitions in their manifests. That's really great to see. It makes it easy to plug these into secured clusters and even if you're not using RBAC it's actually very handy to have them enumerate the API permissions that they require. For those that don't, you have a few options. There are some predefined roles. These are pretty broadly scoped. There's the cluster admin role which is a super user role that can do anything to anything. Then we have admin, edit, and view roles and these are kind of three levels of permissions that are designed to be granted within a namespace. So view is read-only access to everything except secrets because those can be escalating. Edit gives you right access to all the workload APIs, things that a normal app developer would need to create. Pods, deployments, replica sets, daemon sets, all the workload types of things, services, stuff like that. Admin includes everything edit does and also includes permission to set up roles and role bindings in the namespace. Then you start getting into access management. If you're looking at what an add-on needs, maybe you can identify this add-on would fit within the view role. All it's doing is listing services for service discovery. So you could grant one of these existing roles. And the way you can do that is to use some of the cube control commands that will let you set up the role bindings. So ideally you would grant the most limited role you could to the service account that the add-on is running as. And so ideally it has a service account, not the default one, but figure out which service account it's using, create a role binding, pick the role that lets it do what it needs to do and grant it to it. Yes. So the question was if you're not using RBAC, but a manifest has RBAC objects, can you still deploy it? The answer is yes, probably. The API is independent of the authorizer. There are some edge cases around escalation detection where we don't allow creating objects that have permissions that you don't already have. And so there are some tricky interactions there. If you're the super user, if you're using that bootstrap super user, yes, you can continue to create anything. If you are not the super user, it gets a little bit more complicated. Super user, so the system master's group, that is recognized as a super user against the API server in general, not just by RBAC. So if you grant to a specific service account that's ideal, if everything in the namespace is still using the default service account, you can grant to that. What you typically end up with there is a lot of permissions glommed on to the default service account, and it gets hard to differentiate one app from another. That's also not great from an auditing perspective. If you're trying to figure out who did this thing and you've got 16 applications all running as the default service account, it gets really tricky to know what caused the problem here. You can even grant a permission to all service accounts in a namespace, so you see we're addressing the group, system service accounts, and then the name of the namespace. This is great if you're deploying several related things in their own namespace. You can grant to all of them at once. And then we've all done it, sometimes that's necessary. Just make the thing a cluster admin. Depending on what you're deploying, if this is your credential integration mechanism or your network plugin, you probably trust it very highly already, and so making a cluster admin, we've all done it. Ideally, there would be a custom role that perfectly fits least privilege required by this app. And so now I'm gonna talk about ways to build custom roles. So option number one, know every API call that an app makes or is ever going to make, and then have fun. This describes the last two years of my life, actually. And it wasn't great, so I built this. In Kubernetes 1.8, there are structured audit logs, which mean that we can very easily take a look at who is making requests and all the details about the request. All the same attributes that feed into the authorizer are available via the audit logs. And all the user info that feeds into the authorizer available via the audit logs. And so this is actually a really great way to build policy automatically. So the first thing you can do is enable audit logs, run the application through its paces, have it do everything it's gonna do with a service account that is specific to that application, and then process the audit logs and generate a role that encompasses what it did. And so I actually built this tool. So I call this audit to RBAC. I'm from Red Hat, so it's shamelessly sold in from audit to allow for SD Linux. And I wanna show you a video demo. So I have a cluster running and I've got the audit log. Let me see this. That's amazing. The pre-canned video died. There we go, okay. So when I list pods, I can see that my API request showed up in the audit logs. Yeah, I'm gonna have to come out here and watch. That's fun. All right, so I'm gonna create a job here and while it's getting going, let's look at what the job is gonna do. This is just a bash script that uses cube control to try to do some things. So it's gonna read a config map, try to delete and update it, look at pods in a couple namespaces, look at nodes and then run and clean up a couple sub jobs. And so if I look at the logs for this thing, it's actually stuck on that first step, which makes sense because it's running as a service account which has zero permissions. So let's use audit to RBAC to set up a role that's gonna let this job do what it needs to do. So the first thing we can do is run audit to RBAC, feeding it the audit log and the service account that we want it to pay attention to. And as soon as we run that, it spits out a role and that role only has a single permission which is read access to the config map, which is what I'm trying to do. It also, look no hands, it also has a role binding which grants that role to that service account. And so I can actually just pipe that to keep control, apply, and as soon as I do, my job succeeds at that step and now it's stuck on the next step. So what I'm gonna do now is set up a loop that just keeps looking at the log and generating a role and applying it and I'm actually gonna dump it out to this YAML file so we can see what's going on, but that's just for our benefit. And before I start that, I'm gonna set up a diff so we can see changes as they come in. All right, so now we're gonna kick off the apply loop and start taking a look and immediately we start to see create permissions come in. We see delete and update permissions come in. Now, let me zoom out so you can see what's going on or not see what's going on. It's dealing with batch jobs, so it's different API groups. It's moving and collapsing permissions so if you start dealing with things with multiple names, is it gonna combine those into sort of the minimal rule that covers them? It's looking at pod logs so it's dealing with sub-resources and we're just gonna let this thing run and by the end we have the role that covers it and it actually will differentiate cluster roles and roles so it'll give you cluster permissions and names-based permissions and all the bindings that are required to set that up. And if we wait for a couple seconds, I think it's about done. Success, done. So, it was pre-recorded but there was no editing. So, this is the kind of thing that I would spend a couple hours building by hand and now it gives us immediately something that may require a little bit of fix-up. You might look at that and say, oh well actually I want this to be a little broader or a little narrower, but it gets you like 97% of the way there. So just to talk about a few of the things that this is doing, it does some smart verb expansion. So if you list pods, at that point authorization-wise you've already seen the content of all of those objects. So if you list and we allow that we also let you get and watch because that's not exposing any more information. If you update something we give you patch and update. So the kinds of things that encourage best practices so that in a month when you figure out, I'm getting a lot of conflicts, I should start using patch. You don't have to come back and like update your role. You could update, we're gonna let you patch as well. It does multi-name inference. So if you access a pod named A and a pod named B it can collapse that and say you can actually access a pod with any name. That's optional, you don't have to do that. But if you're trying to infer roles from usage patterns letting exercising the app with a few example names across a few example namespaces kind of indicates I'm gonna be working with arbitrarily named objects. Same thing for multi-name space. If you do something to pods in namespace A and you do the same thing to pods in namespace B it actually moves that permission up to a cluster role and says granted across namespaces because you're no longer constrained to a single namespace. Again, it's optional. You don't have to do those expansions if you don't want to. So a couple ways you could use this. One is the way we saw here which is deny by default, watch the audit log and then apply and allow. That works great. That's the most secure thing to do. Depending on the application characteristics your application might be crash looping every time it's denied. So I wrote mine to be tolerant of that and just kind of keep retrying. But that might not work well for your application. An alternative is to run in a test environment with CI tests that just exercise everything your app is gonna do and give it full permission in that environment. And then after the fact, post-process that audit log, build roles from it and then compare that to the committed roles for that app and say, oh, you know what? The current build started using more permissions so I'm gonna fail that CI build and open a pull request saying, hey, new permissions, make sure these are okay. So there are lots of other ways you could use this. I'm excited to see how people are gonna use this. And with that, I actually wanna finish up by talking about a new feature in 1.9, which is aggregated roles. So one of the things that we're seeing as the extension capabilities around Kubernetes grow, it's easier for people to create custom resources, add in their own API servers, serving their own API types. The question is, how does this interact with policy? When I tell Kubernetes, hey, I have a custom resource foobar, I need to add permissions for this resource to the appropriate roles. And so that's what aggregated roles let you do. So I mentioned that there are admin edit view roles that are designed to be granted within namespaces. As of 1.9, these are aggregated roles, which means that you can create a separate cluster role and label it and the permissions in that role will get pulled into the appropriate admin edit or view role. And if you remove your cluster role, those permissions get yanked out. And so this makes it really easy to define a cluster role alongside your custom resource definition and declaratively create a new API type and the associated permissions all at one and those get folded into the existing roles. And if later you decide that this API type has to go away, you do the same thing, you delete your definition and your role and those permissions get pulled out. So this is a really powerful way to contribute without having to manage those admin edit view roles yourself. All right, I think we have about five minutes for questions, happy to answer any questions if we run over, I'll be out and all. This is the last session, what am I, what am I talking about? Thank you guys for coming. I'm surprised you made it up here. Any questions? Yes. So the question was, can you mix namespace and non namespace permissions in a single cluster role? And the answer is no, you would need two definitions because if you grant the first one globally, you're granting all the permissions defined in it globally. So if you said I want to edit and delete pods and you grant that globally, they get all of those permissions globally. So you need two role objects for that. Yes, is there a feature to add a mapping from an enterprise role? You mean like in some external authorization system? Yes, the feature is the Kubernetes API. The idea is that you can actually use a glue client to program down policy into a Kubernetes cluster. There's not a built-in feature, you would use the API to transfer that. Yes, so the question was, are there special permissions for auto-scaling? So the horizontal pod auto-scaler resources are their own resource, and so you can permission those individually. You could allow or disallow a user from changing horizontal pod auto-scaler resources. The HPA controller uses scale sub-resources on deployments and replica sets. And so we didn't really talk about sub-resources, but sub-resources are a way of giving you a different endpoint for restricted subsets of operations. So the HPA controller will work with scale sub-resources and typically only it is allowed to address those. Controllers also update status sub-resources and the API server can permission those differently and only pay attention to status content when it comes in via a status sub-resource. I'm not sure if that answers the question, but the two points of permissioning are different resources like HPA and then different sub-resources. Yes, okay, so the question was, are audit logs too verbose to run in production? And the answer is, it depends. There's actually a really expressive policy for describing what you want to show up in audit. So you can scope events to a particular user or a particular resource or a particular set of resources and define what verbosity you want for each of those. So you could say, I only want metadata for this user or I only want metadata for these resources or I want you to ignore node update calls completely because those happen 5,000 times every 10 seconds. You have a lot of control about what you put in the audit logs. Yes, so how does RBAC work with multiple clusters? There are a couple options. One is the program down sort of federation model where you define it here and then it gets synced down to the standalone clusters. That gives you the ability to coast if the central one is unreachable. The other option is to delegate, actively delegate. And this is actually the model that's used for aggregated API servers. So the webhook authorization mode is actually compatible with pointing it at another Kubernetes cluster. And so if you have an extension API server, those typically set themselves up to use webhook authorization checks against the central Kubernetes API server. And so you have RBAC policy defined here and all the other clusters are using webhooks. So you can either program down or you can delegate up depending on like your toleration of performance. The performance capabilities of the central one and what you want to happen when there's an outage. Yes, no, so if you were paying attention, all of the attributes that go to the authorizer come from the URL and on a create API request, the name is not in the URL. So the name is not available for the create permission. What we typically have seen done is having a central sort of namespace provisioner where you ask it, make a namespace for me and it creates the namespace, gives you a role in it and then hands it to you. Name not being available as part of create does add challenges. We are well aware. We're trying to think of ways to address it, but yeah, that's an ongoing effort. The question was, are there plans to add conditions to role specifications? It's been discussed. We want to be really careful about not breaking delegation cases. So the more input there is to the authorization decision, the harder it gets to kind of reconstruct accurately all of the context input for a delegated check. And so there's definitely interest in it. I think that is something that we're kind of currently talking about and thinking about whether it makes sense to do in RBAC proper or as a layer or some combination of the two. There's no projects I'm aware of. I mean the audit to RBAC tool actually does a simulated check internally. The way it works is it looks at the role it's set up so far, looks at the request that was audit logged and says would this have allowed that? And if not, then go add it in. So I mean it's not difficult to reuse what's there to run those simulations. Yes, are there any tools that create users and assign roles? So Kubernetes philosophy around authentication is to try to stay relatively neutral and not have users be a first class concept because everybody's authentication is a special snowflake. And so there aren't APIs around user management. As such, when you're talking about RBAC, if the user authority is over here and the authorization decisions are over here, no, there's not a particularly strong connection between the two. Time for one more and then I'll stick around as long as people want to, but yeah. All right, I appreciate it. Thank you guys.