 Hi everyone, my name's Gerald Nunn. I am the GitOps Technical Marketing Manager at Red Hat. And today we're going to be talking about managing multi-tenancy, hopefully somewhat efficiently, and walking through some of the ways to do it, the pros and cons and challenges that we have rather encountered and what we tend to recommend to our customers internally. So let's get through it. So from an agenda point of view, we'll do a quick overview of multi-tenancy, what it means. I'm not going through a canonical description of it, but just some things I want to highlight. And then we're going to just go through some challenges and recommendations in terms of doing multi-tenancy with Argo CD, how it works, some of the things that people run into and how we look at handling it. And then a little bit about the future, particularly about things that I'm excited about in terms of addressing multi-tenancy, some of the gaps that are in Argo CD right now for that particular feature set. So let's start with the overview. So from a multi-tenancy perspective, there's a variety of different ways of looking at it. I mean, in a pure definition, anything is multi-tenant that's more than one user on it even if you have no security, no safety measures around it, right? And that's really kind of the nun layer and it's typically what people don't want to do from multi-tenancy perspective. But then the next two sections, I really see kind of different use cases for them depending on how your organization, how your structure and the level of trust that you have amongst your users and between your users, right? So from a safety perspective, you might just be looking at not so much from a security perspective in multi-tenancy. I don't want to stop that fingering. I don't want team A to overwrite a deployment or a resource that team B deployed, right? That's really more of a safety question than it is I need to protect my resources and make sure I don't have a nefarious actor go in and do bad things and that's more the secure. And I'm highlighting this because when I talk through the challenges of recommendations, a lot of it is going to be that kind of context because there are certain things you might want to do if you're only looking for safety that you don't want to do if you're looking for security. From a building block perspective, there's really three key components from my perspective in Argos CD for doing multi-tenancy. There's role-based access controls, managing who can do what within an Argos CD. There's projects in terms of managing the organization of the applications and application sets that you're delivering. And there's scopes, cluster scope and namespace scope that it allows you to manage in terms of the privileges that Argos CD itself has within that cluster. And just a very brief overview of each one in turn. So from role-based access control, Argos CD really has two levels of RBAC involved. There is the RBAC that's at the Argo level which is what the user sees when they interact with Argos CD through the UI, the API or the CLI. And then there is the underlying Kubernetes RBAC that the Kubernetes or the RBAC or the Argo controller interacts with in terms of doing things on the cluster when it wants to deploy things. And those are two very important things. The Argos CD RBAC is defined on the global level in a config map. And also important to note, if you define the default policy and you give users permissions on a default policy, you allow them to do things, you cannot later revoke those permissions. So you need that to keep that in mind. But you can deny and then allow, that is allowed. So from a project perspective, projects, aka app projects in Argos CD are really just a logical grouping of applications and allow you to organize things within an Argos CD environment for your tenants. They really restrict what your applications can do. You can blacklist resources. You can manage what things can be deployed, not deployed by that particular, those applications that belong to that particular project. You can also assign project-specific roles, which becomes very important from a multi-tenancy point of view, as we'll see later on. Now, app projects in my experience are typically aligned along team boundaries. So team A might have their own app project or application, A might have its own app project. And similarly, team B would have a completely different app project with different sets of permissions and roles in each one. Argos CD scopes, I think this is kind of an underappreciated thing with Argos CD that maybe is not as well known as it should be. There are actually two scopes in Argos CD, cluster scope and namespace scope. When you deploy Argos CD with a cluster scope, essentially Argo can see everything in that cluster. It has complete cluster-wide permissions, at least from a view perspective. I get watch list type permissions, it can see everything. When you deploy a namespace scope, it can only see everything in that namespace or those namespaces that you give a permission to. It cannot see things cluster-wide. So the downside of cluster scope is you've got a lot of privileges potentially when Argo is running in that mode. The downside of namespace scope is because of the way Argo is currently architected. It sets up watches per namespaces when you deploy in that mode. It doesn't really scale particularly effectively. You can get the tens of namespaces managed by namespace scope, but you can't get beyond that. You start getting into hundreds, you're gonna start having a lot of grief when you start trying to do that in that mode. All right, so challenges and recommendations. So the first challenge is really around privilege escalation. So in Argo CD, as we see in this diagram, I've got, say, three different users they're all interacting with Argo CD, and there's three different clusters bound to that Argo CD that it's interacting with. And each of those clusters essentially has a bear token linked to a service account that gives it a particular set of permissions on that cluster to do what it wants. And the problem you went into with the privilege escalation is essentially the privilege you have to give Argo CD is the same privilege at the highest level or the highest tenant requires, right? So that potentially means that the other tenants can do privilege escalation in terms of doing things in that cluster that you don't want them to do because they're all sharing that same underlying role, Kubernetes role in this case. So recommendations for this. I'm a big fan of separating my use cases. I do not like to have my cluster configuration Argo CD shared with my tenants. Because essentially the cluster configuration Argo CD is essentially gonna be close to if not cluster admin type privileges. And while you can certainly mitigate that out and we'll talk about that in a bit, from a safety point of view, an isolation point of view, I like to keep those two things completely separate. So I like to at least have those two separate Argo instances on that basis. Similarly, going along those lines, I like to separate my Argo instances. So the one on the left is where you've got no isolation. I've got one Argo, it does everything. It does cluster configuration, it manages all the teams. It's one instance that does it all. There's no isolation or safety with that. I'm not a big fan of this approach. But again, if you're going for a safety model more than a security model, it might be acceptable, right? Depending on what you're looking for. Partial isolation is really where, as I mentioned earlier, you separate those use cases out. I have my cluster configuration Argo CD that does what it needs to do from a cluster config. Then I have a separate instance that's for the teams and has much more restricted rights than that cluster configuration. All the teams can potentially use one instance, if again, that meets your particular needs as an organization in terms of what you're looking for between safety and security. And you can go farther in that and have more granularity and start sharding out more, I say sharder, that's probably a bad term, it's overload in Argo terms. Have different instances of Argo CD that the different teams use and get much more isolation that way. The downside of that, of course, though, is that your level of management required to do that is much more increased, right? As a platform team, now I'm managing a lot more Argos and trying to keep track of them all. So another recommendation is to mitigate the underlying Kubernetes privileges that you're having to grant, i.e. the highest level that meets whatever the highest tenant needs, is you can mitigate that to some extent in configuration. So in app projects, you can blacklist resources, so you can essentially say I want to blacklist all the cluster scope resources, right? Let's say you have to have a shared instance for your tenants, it has to be cluster scope because it has a lot of different namespaces and things that's managing, namespace scope won't scale enough, you can blacklist all that stuff out and essentially mitigate the privilege escalation issue that way. The other option you have, if you have to use cluster scope, is to do resource inclusions or exclusions. This is global to Argo CD, so you can't do that on a per tenant basis. But what you can essentially tell Argo is, you know what, I've got these resources that I want you to completely ignore. Don't put any watches on them, don't do anything with them, you're not allowed to touch them. So even though the underlying service account Kubernetes still has that privilege, you're telling Argo to completely ignore anything so that if a tenant tries to use that resource, Argo will completely ignore it and thereby mitigate that problem. But it is just mitigation at the end of the day, right? The other challenge that comes up for me a lot is declarative management of applications and application sets in Argo CD when you're trying to do multi-tenancy. So a big issue, and sometimes people don't realize this, is that you cannot have a multi-tenant and traditional Argo where the applications and app projects and application set all live in the Argo CD installation namespace, you cannot allow users to manage that YAML directly, either directly on cluster or through Git. Because as soon as you do, they combine that application to any app project they want and inherit whatever security is in that particular app project, right? So that is a problem for me because hey, we're doing GitOps. Why do I wanna do things imperatively? Like creating things through the UI or the CLI just has a bad smell to me versus doing things declaratively and that's what I wanna do. So in terms of ways to meet this challenge, first of all, don't ignore it. We should be doing applications and application sets declaratively. You can potentially use a policy engine. I've played around with Kyverno to do this to essentially enforce applications getting particular projects. The problem with this, it doesn't really scale very well because you have to know a lot about the user in order to actually do this. So it really wasn't something that worked out well for me and I don't recommend it. But if somebody's done this successfully, love to hear from you. In the question mark, consider category. I've got application and application set in any namespace. If you're not familiar with this feature, I'll be talking about it in the future section. It allows you to have applications in namespaces not in Argo CD. Then they can declaratively manage that stuff. However, this is in beta right now. So for some organizations that have a, I can't use beta software, this might be a no-go for you. At Red Hat, for example, we consider these feature tech preview so we don't support them in production right now. But we do have an issue open upstream in Argo CD to potentially promote application in any namespace to stable. And hopefully that will happen. Application set to stable will be a while yet. And then the last two in terms of recommended options are really you can manage declarative creation of applications and application sets by using standard Git workflow, having pull requests, right? Essentially, your application teams create new applications, modify applications in Git, create a PR, the platform team reviews the PR and says, yeah, that looks good to me. Nobody's changing anything behind my back on this front. You can also do, as I mentioned earlier, the per team instances of Argo CD. So that works out really, can work out really well. There is a model I like to affectionately refer to. I see Michael Crenshaw's in the audience, the Intuit model, yeah, Michael. Michael's the great maintainer, one of the great maintainers at Argo CD. The Intuit model where essentially, if you've got application teams, you wanna give them different instances. You can give that instance access to the clusters that only that team needs specifically. And so they essentially get a single pane of glass for their applications, but they're still isolated to a particular instance. So it's quite useful for that. All right, so from an RBAC recommendations point of view, do separate your global roles and your team roles, right? So if you start down the Argo CD route and you start putting all your roles in the Argo RBAC CM, you probably find at some point this gets really unwieldy pretty quickly, depending on the number of tenants you're dealing with, like if you have hundreds or thousands, that's a lot of RBAC in there and trying to keep it all straight. And what's going on, it can be very cumbersome. So I like using and isolating all of the RBAC for the teams into app projects, right? And that's really where, from my perspective, it belongs. So global roles go in Argo RBAC CM, team roles, they go on app projects. I'm a big fan of global projects. That's another kind of feature that's not maybe as well used in Argo CD as it should be. So what a global project allows you to do, and I've got more on it in the next slide, but essentially allows you to have an inheritance model among our real app projects. So you can declare, for example, a blacklist once and have all of your tenant projects inherit from that. And then if, for example, you deploy a new operator or a new custom resource definition and you want to blacklist that, you only have to change it once, right? You don't need to go through 30 different app projects and change it everywhere. This is what I'm kind of interested in getting feedback on, configuring your default access to deny versus an empty role. So typically what you see in Argo CD for the default policy is like quote, quote, right? An empty quote, no role, you don't get anything. It's great, it works, but it's not very explicit in terms of intention or intent. And again, if you have auditing requirements, sometimes auditors kind of balk at that and go, well, that's not really being very prescriptive in terms of what you're trying to do here, right? So I like to set up a deny all role just to kind of be very explicit about what I'm doing and if I get somebody new on the team that doesn't know Argo well, he doesn't get confused by what's this quote, quote thing and tries to change it or do something funny with it, right? And as I mentioned earlier, while the default role, if you give it permissions, I mean, if you give the default policy like a read-only, you can't take those permissions away, you can do the opposite. You can give it a deny role and you can add allow permissions and that works just fine. And then the last one is another one I'm kind of interested in hearing from folks is about using the existing Kubernetes aggregated namespace roles. So in Kubernetes with RBAC, there's these default kind of roles out of the box admin edit view. I like using those for my Argo CD service account for the application controller. The reason being, and this is maybe something that's very specific to OpenShift, but in OpenShift when we deploy new operators, those roles automatically get updated, right? So leveraging that role really allows me to reduce my maintenance burden in terms of new things get on board in the cluster. Do I got to go and hand bomb in updates to my cluster role bindings or my namespace role bindings in order to support those new things that I've hand bombed in. So again, very interesting hearing from folks in that in terms of, is that something you're interested in doing or are you doing or is it a bad idea because you found X, Y or Z? All right, so here's an example of a global app project. As I mentioned earlier, so typically across all of your tenant app projects, there are some commonalities that you're going to want to have, right? You can, for example, say I want a blacklist all cluster resources, I want a blacklist certain namespace resources or scoped resources in this case, I've got namespace resource quality limit range and some OpenShift specific stuff in there as well. Obviously your use case will vary from that perspective. But essentially the global project, as I mentioned, you can inherit this in your other app projects and pick up this information. So for example, another one that's quite useful is sync windows. You can specify a sync window in your global one. So if you're taking your cluster down or you're doing something with Argo CD that affects everybody, you can just do the one sync window there and everybody will get it automatically. And it's configured versus the standard Kubernetes match expression, so it's a pretty straightforward feature to use. But like I said, it's an underappreciated feature in my opinion in Argo CD. So here I've got an app project example that I'm using in one of my things. So I've got the global app project. You can see the label there, it's essentially selecting it. I don't have the example of the match expression, but it's pretty straightforward. Destinations, the normal thing for an app project, i.e. where can tenants deploy this application? In this case, it's just a local cluster in a particular namespace. I actually have a much longer list, but I shortened it to fit on the slide. And then after that, I like to define a standard set of roles for those tenants. So I typically do admin user pipeline roles as kind of my out of the box tenant roles. But again, you can customize this on a tenant by tenant basis depending on what your particular needs and use cases are, right? So the admin role, for example, that guy's allowed to do anything in this project with applications. He can create new applications, he can delete applications, whatever he needs to do. The user role on the other hand, I'm specifically only allowing them to get applications so they can see the list of applications and to sync them. That's it. That's all they can do. So that user, for example, if he notices there is an application that's out of sync or degraded and he fixed it and he wants to sync it, he can go ahead and do that. But he has limited rights in terms of doing anything else from that scope. And then the pipeline role, really where that comes into play is that, and I'm gonna talk about this actually in my keynote a little bit, is you're always gonna have CI continuous integration interacting with Argo CD. And you're gonna need to have an account for that CI to interact with and give it a set of permissions, right? So again, for the pipeline role, I've given it the same permissions that the user get in sync. But I do separate those roles because sometimes you do wanna differentiate them and give them different permissions, depending on the use case that you're trying to address. All right, so the future. The future is glorious. So I'm a really big fan of this feature, applications and application sets in any namespace because for me, it solves a lot of the challenges that are out of the box with Argo CD right now in terms of multi-tenancy. And I cannot wait for this thing to become stable. I think again, as Michael knows. So essentially, as I mentioned earlier, it allows you to deploy applications in any namespace and that allows those users then to declaratively manage those applications without having to have the platform team essentially kind of oversee it as if you had to get repo with PRs. And essentially the way it works is that you define an app project and in that project, you say you're managing these namespaces. And as soon as an application gets dropped into that namespace, it automatically inherits that app project. That app project stays in the Argo CD namespace and the platform team is managing that app project. So users do not have the opportunity to change that project because it's an automatic binding. There's nothing to change. And they can't actually monkey around with any of the Argo namespace because they don't have any access to it. They only have access to the namespace that you've designated for those particular applications. The one downside of this feature, and I haven't really dug into too much why that is, is it does require a cluster-scoped instance. You can't use this with a namescay-scoped instance. And just to reiterate, your team must have none to limited access to Argo CD which is what you really want from a multi-tenancy perspective anyways. You don't want your teams doing things in the Argo CD namespace. That's a bad idea. For privilege escalation, this is the other component from my perspective in terms of dealing and solving some of the challenges around Argo CD. I don't think there's really been much concrete work on this. And again, Michael can keep me honest on this. Afterwards, but I'm very interested in impersonation which is the Kubernetes feature for the service accounts that the application controller uses. And what that allows you to do is essentially set things up so that you have multiple service accounts and then Argo CD could automatically impersonate for a particular user or team and select the right service account with the right privileges for that particular team. So that that user cannot then do privilege escalation to get a broader set of features out of that. In line with that, another thing I found quite interesting when I was researching this is that it comes up once in a while, well, you could just declare multiple clusters in Argo CD with different tokens and then have them kind of do it that way, right? The problem with that is it doesn't actually work. I think under the hood, Argo CD is actually referencing things by the server API endpoint which is the same for all of them. And you get kind of a round robin effect from my understand which means you're never quite sure where you're gonna end up in any particular application. And then the other one that's quite interesting to me is respecting RBAC for resource exclusion. So if for example that in the Kubernetes RBAC you've given the controller a certain set of permissions and the controller tries to go out and watch that resource anyways and it fails, if you just respect that as a resource exclusion and stop trying, right? It shouldn't cause any problems, stop things from deploying. It should just work happily in that way. And that for me too is another interesting feature that I'd like to see happen. The last one here is not one I'm super familiar with. I'll be upfront with you folks but from a multiple team instance is quite interesting to me which is there's a thing called Argo CD Core which actually didn't know too much about until about a month or two ago when one of our lead developers, Yann Fisher, Fisher mentioned it to me. And essentially what it is is it's much more focused on just using Kubernetes RBAC versus Argo RBAC. It's a very stripped down version of Argo CD and I think it has a lot of potential from the point of view of addressing more of the proliferation of instances, particularly if you can somehow manage them more broadly, right? And provide a single pane of glass on top of that. And at Red Hat we have certain products for that. I'm not here to do a sales pitch though but there's all the other products out there that will manage Argo as a single pane of glass that could maybe work with this as well and be quite interesting. But the idea of a stripped down Argo consuming less resources but being able to have more of them for your tenants I think is quite interesting is another way to manage that privilege escalation. All right, so that brings me to the end with three minutes and 50 seconds left. So any questions? Being the applications in any namespace? Oh, that project is not beta, that's there now. You should be using that now if you're doing multi-tenancy in my opinion. It's an important feature. There's also another session that's doing a deep dive in app projects from the New York Times folks and they're very smart people and I've seen them present before and they do a great job. I'd highly recommend checking that out if you wanna see more about app projects. Yeah, in my opinion it allows you to manage it effectively if you only have like two tenants, hey, do what you want type of thing, right? But in the real world, you've got hundreds or thousands of tenants, you need to have some organization around it and app projects is really what's providing that capability. Oh, thank you. Yeah, that's what gonna be my keynote. Yeah, so wait 15 minutes and I'll be talking about it. But yes, definitely there's a few challenges I've run into from an app management perspective. That's one of them but really for me the challenge is really about being able to look at things more holistically and test holistically when you're doing this stuff, which in turn is driving maybe a need for a higher level of primitive than applications. I'm kinda previewing my keynote a bit but we'll talk about it in the keynote. It's only five minutes but I'm happy to talk about it after. No problem. Pardon me? Oh, the slides should be available through the website I believe, I've uploaded them so they should be there if there's a spot to download them. If not, let me know and I'll figure out where you get them from or if they get sent out after the fact. Yeah, I haven't done too much with the repo credentials because we're all open source and I just put everything up on get in public repos and never need to worry about the credentials to be honest. There is a feature in app projects where you can essentially allow teams to self service the repository credentials. Oh, okay, yeah. Yeah, I'm not too familiar with that but yeah. It might be a good question to you for that New York time session as well as doing a deep dive on that stuff. Okay, that's it for me. Thanks very much, everyone. Have a great rest of Arvacron.