 Bonjour to Lemound. I hope I pronounced that correctly. Anyways, hello everybody. Welcome to our session on Unlocking the Power of Multitenancy in Argo CD, a journey with Expedia Group's compute platform. Before I dive into the session, a quick round of introduction. I'm Anjul Jain. I'm a software development engineer at Expedia Group Gurgaon, India. I've been with Expedia for about two years now and I'm super excited to share my learnings and the incredible work my team has done on Argo. And it's also my first time outside India and oh boy, do you walk. My legs are literally hurting but you gotta do what you gotta do right. So I also have with me my super cool, super talented colleague and co-speaker Rajat. Hello everyone and thank you Achal for your kind words. So my name is Rajat Porwal. I'm working with Expedia for more than two and a half years and this is my first talk outside India. So I'm quite excited for this talk and over to Achal for kick off the talk. Thank you. Thank you. Let's get down to business. A quick look at the agenda for today. We're going to talk about EG Runtime platform. Runtime is also the name of our team. We would try to demystify multitenancy in communities. We'll take a look at the internal developer platform flow and some look at the EG setup adventure, its multitenant foundation, some of its advantages, best practices and in the end we leave an open floor for questions. A little glimpse into some of the applications that we use, the scale of our cluster and how we help our developers through this platform. We believe in smart design and using open source tools whenever we can. We're also open to some commercial stuff if it helps our developers succeed. Our focus is on making a developer's life easier with a reliable setup for deploying and managing their software. Think of us as the backstage crew following the DevOps way. Devs get to control their apps and resources easily while we handle the technical details. A platform makes sure their work gets to the right place and keeps them updated on how things are going across the board. It's all about making their tech life simpler and more consistent. A quick look at some of the numbers there. We have about 600 plus clusters running on AWS EKS. Some of them are running now on Argo CD and some of them are legacy management tools. Then we have 20k plus nodes and about 200k plus pods running which are holding about 11k applications as of now. So let's talk about multi-tenancy in Kubernetes. Multi-tenancy is like living in a shared apartment building. In our tech world, it means multiple users or teams sharing the same computing space but keeping their data and settings separate. Think of it as having different apartments in the same building. Now you would ask me why do we even want to bother with multi-tenancy? It's like living in a smart city, optimizing the use of resources, simplifying the tech infrastructure and boosting collaboration and agility among users and teams. And here you have a tech-savvy urban playground. Now we have got two flavors here, soft and hard multi-tenancy. So I'll take the same example, soft multi-tenancy is like a friendly neighborhood within a big company. Different teams peacefully coexist, focusing on preventing accidents rather than causing them, much like neighbors looking out for each other. On the other hand, hard multi-tenancy is the high security building for external services. It's like living in a city where not everyone knows each other. Here the focus is on securing and isolating each tenant as there might be some bad actors trying to exploit the system. Let's dive into the integrity of multi-tenancy in Kubernetes, breaking down two key approaches that help us manage this shared digital space efficiently. First, we have the namespace per tenant. Think of this as giving each tenant their own room in our shared apartment building. In Kubernetes, it means allocating a separate namespace for each tenant. However, managing this requires careful setup of our back and constant monitoring to ensure everyone is playing by the rules. Next, we have the virtual control plane pertinent. Now imagine a shared apartment building having multiple floors and each floor has its own virtual control center. In Kubernetes term, it would mean we deploy multiple virtual clusters or control planes on the same physical cluster. Again, this would mean managing these virtual control planes would introduce overhead. There's a potential for resource contention because each virtual cluster would compete for resources in the shared physical cluster. So in our multi-tenant Kubernetes world, it's like deciding whether each tenant gets their own room or their own floor with a dedicated control center. Each approach has its perk. It's all about finding the right balance. Now let's take a quick look at the EG internal developer platform. You may take some notes. We'll start with the EG developer who unboots their application using Backstage. We have created some default templates for them. All of their source code is hosted on GitHub. Their chart definition, Docker files, everything. Then comes the continuous integration part of it where we are using Jenkins and GitHub Actions where we run the actual builds. All their applications are deployed using Spnaker for managing all the deployment pipelines. Then, of course, we have a personal touch API created by our team. This RCP API is the entry point for deploying apps to the platform. It takes some information about the app and it then hands it off to Argo CD which then creates for the resources, our EKS clusters and everything. And of course, nothing gets overlooked. So we have some monitoring and observability tools set up as well. Now you might wonder how our setup for Argo looks like. I'll give you a little glimpse and then I'll hand it over to Rajat for deep dive. So picture this. At the heart of our operations, we have got the Argo control plane cluster chilling on Amazon EKS. This mastermind manages the show orchestrating different EKS clusters all thanks to some specific IAM roles we have set up. It's the brain behind the scenes. Ensuring a smooth and secure performance for all our applications across the board. Now we have got one regional captain in each AWS region. So an instance in a particular region would handle releases to its region only. To maintain resiliency of Argo, the control plane resources such as our API also runs in multiple regions. Handing it over to Rajat. Thank you Achal for giving a quick insight about multi tenancy internal developer platform and lastly about our control plane setup within EG. So for mental tenant setup, isolation is the key. We have to make sure our tenants are isolated and they are safe and secure. So for this, we will discuss about the multi tenant foundation. So majorly, there are three building blocks for the multi tenancy in Argo. One is scope and next is RBAC and lastly is projects. So the scope is basically it is a way how you deploy Argo CD. So with this, if we deploy Argo CD as a cluster scope, it means it can watch resources across all the name spaces. But if we stall Argo as a name space scope, so it means it will be watching the resources from that particular name space. So eventually it also means we can run multiple instances of Argo within single Kubernetes cluster. So with this approach, we can isolate each tenant with a single instance, running parallel in a single cluster, so more isolation and more security. Next stage, RBAC. So RBAC is role-based access control by which we can set up some roles and we can define policies and we can tie these policies to certain user or user groups. So this way certain users or service accounts can perform the actions what they are intended. So nothing, they cannot do anything else whatever not defined in their scope. So RBAC is a feature you can say it is quite important in the multi tenancy setup where we can define certain roles for each tenant users or maybe tenant application developers and they can perform certain actions on their application itself. And the projects. So projects are basically logical isolation of applications within Argo. So you can think of multiple applications which are similar in characteristics or similar in nature, can be grouped together and can be part of single project. So on the project level we can define some certain restrictions which will be applicable over all the applications within a single project. So we may define, sometimes we have to give some certain additional permissions for the applications of certain project. So this is the way where we can define. So a bit more about the scopes. So majorly there are two scopes, cluster scoped Argo setup and the name space Argo setup. In the cluster scoped Argo setup there will be only single instance of Argo running and it will be deploying applications across all the target clusters. The benefit of this approach is we have to like there is only single instance so the maintenance and the installation is quite easy. We also getting a single UI for seeing all the applications within our system. But the downside of this approach if one of the main instance goes down then whole system will be down and there will be application blockage of application deployment. On the other side we have our name space Argo setup where we are running multiple Argo instances in a single cluster into different name spaces. Each instance can be work as a single entry point for that single tenant or maybe single cluster or you can say multiple target clusters are connecting to single tenant or single instance. So this way there is more isolation but with more running instances the overhead of maintaining the Argo is quite high. If we are scaling a very large number then the large complexity will come with the maintenance. But the benefit we get from this approach is there is no single point of failure. So if one of the instance goes down still other instances will be serving other tenant applications. Just one more one last thing I think Argo provides the scope documentation but it is very underestimated feature which I think and a lot of people do not talk about this but if we are running Argo with the name space instance setup we can isolate resources across single instance and basically providing more security to Argo system. So we will talk about the RBAC in Argo and Kubernetes. So there are mainly two layers of RBAC within Argo. So first layer of RBAC where service accounts and users are interacting with the Argo CDS control plane. It can be either Argo API, it can be CLI or it can be Argo UI. So for user to interact or perform certain operations on the Argo control plane they need certain permissions. By default Argo is very unrestricted in nature like it comes with the default RBAC so anybody can do anything but we can create certain roles and permission and we can tie these permissions with the different application owners and different application teams. In the second layer of RBAC, in the second layer basically Argo CD will be interacting with the target clusters. So for Argo CD to perform certain actions or to create certain resources it need role access or cluster role access created within the target cluster. So here we can define only the minimum privilege that we need for our Argo to create or delete the resources. We should not be providing additional permissions to keep the environment or target cluster safe and secure. So how we can set up RBAC within Argo system? So within Argo system mainly there are two ways where we can define the RBAC. So we are defining roles and policies for the global label which will be applicable for all the applications in the cluster. So Argo CD RBAC CM config map is the place. So here we can define roles, we can define certain policies. We can define or attach these policies with the groups or service accounts. So Argo by default comes with two built-in roles. One is read only, another one is admin role. But it is quite recommended to create custom roles according to our need. The second place where we can define RBAC is app project. So in the app project we can define roles and policies and we can tie these policies with the groups. So advantage of defining RBAC on the project label is we can escalate certain permissions which will be applicable on applications which are part of this app project. So maybe certain teams need additional permissions. For example if you look at this example or this screen. So there is basically I have given a sync only permissions to the applications which are part of this my project. So this way we can give additional permissions to applications within a single app project. So now we will talk about the second layer of RBAC. So in the second layer I discussed like Argo CD has to interact with the target cluster. So there are multiple ways how Argo can interact. So but in the AWS ecosystem IM role is the best option. It provides more security and it is very native to AWS. So how things work I will just give a quick glance. So Argo will assume a role under like we have to create IM roles on the source account and the target account. So Argo CD eventually assume the role in the source account and then eventually assume the role on the target cluster account. And from their AWS entity mapping comes into picture and IM role will be tied to this Kubernetes cluster role. So here so we have to make sure in the target cluster role we are following the least privilege principle. We are defining giving only least permissions which Argo can which Argo need. So yeah so now we will talk about the projects. So projects I have already told like with projects we can isolate all the applications within Argo. And we can set certain restrictions and permissions which will be applicable on the applications which are part of this project. So Argo project provide other features like we can restrict what can be deployed within our target cluster or within cluster. So for this we can define give a list of trusted repositories. Then we can also define the destination clusters and MS spaces where the application must be deployed. We can also allow the we can give a list of allow our deny resources of Kubernetes objects that what should be allowed by Argo what should not be allowed by Argo. Lastly we can define the RBAC this we have touched upon earlier. So with RBAC we can define on the project level that will be applicable to across all the applications within single project. So how we can use these features within Argo project. So under the Argo CD application project spec we have source repose section. So inside the source repose we can define all the repositories from where either those repositories we want to block or maybe we want to allow. So we can give all list we can give the list of all the repositories from we want Argo to pull or fetch the changes. Secondly we have the destination field in the destination field. We can give all the list of clusters or the list of clusters where we actually want to deploy the applications or we want where we want to allow to Argo to deploy the applications. Next we have a cluster resource by white list and backlist option. So there we can define whether we want to get cluster by resources or we want to create name space by resources. So this was like allow and deny feature so we can allow certain resources or we can also deny certain resources. In the RBAC section we can define that basically this is the role section here we can define the RBAC. So we can define a certain role and policies and we can tie these policies to certain groups. So how we can configure these projects so similar to the Argo RBAC Argo provides a feature of global project. So think of global project as a main or umbrella project where you are defining all the standard set of restrictions and permissions which will be applicable across all the projects in the Argo. So for this how we can set up this first we have to create a single app project. So once we create the project like in the inside the project we define all the policies restrictions and destination list and everything. Then we have to give this project name inside the Argo CDCM config map. So inside the Argo CDCM config map there are global projects option and we have to give the project name there. Once that we decide give the project name there that project will be will function as a global project within Argo system. So for any other application specific project to inherit the permission from the global project we have to define certain labels. So for this we have to define this label in that config map also. So if this label will be present in any of the application specific project that app project will inherit the permissions from the global project. So this way any global project permissions can be inherited via application specific projects. Now I'll talk about the default app project. Sorry. So Argo default by default Argo comes with a default app project. So this default app project is very unrestricted in nature. So it means it can like if anybody uses default app project so they get all the permissions like restrictions. They don't have any restrictions and they can deploy to any clusters. So we have to make sure we are restricted this restricting this default app project and we should not be allow any applications to refer this default app project. What are the things we are doing with Apple within our EG. So first thing we are defining a global app project and we are enforcing our teams or any app application specific project to inherit the permission from this global app project. Then we have a dedicated project for runtime components or runtime admins and dedicated components for the other tenant workloads. And by default in tenant workloads project there is no permission for creating cluster wide resources. So if any developers or any users want to create any additional resources which is cluster wide we have created a separate workflow for this and they have to run that workflow to get additional permissions. And this workflow must be reviewed by platform team within EG. Lastly I will discuss about the app of application set pattern. So before I discuss about this thing this is an admin feature. So we have to make sure this is only accessible for admins only. So as an admin we want to create certain applications as soon as clusters come up maybe certain add-ons maybe any Splunk or anything. So for this what we can do like we can create a single app of application set pattern. So here one application will be holding all other application sets within itself. So once we deploy one application set it will eventually create application sets and these application sets will create the application in turn and that they will eventually get deployed in the target clusters. That's it from that app or app set pattern. I will hand over to Archal for advantages and other things. Thank you. Thank you. Okay let's talk about some Argo CD multi-tenant advantages. Sharing is scaring right? Argo CD's multi-tenancy would help use computer power wisely saving costs by making sure everybody gets what they need from the same tech resources. Scaling made easy. It smoothly adjusts to handle more or less work ensuring things run smoothly no matter how busy it gets. Well we keep an eye on everything and Argo CD lets us easily watch over and track what's happening making it simple to understand and fix any issues. Stay safe and separate. It ensures each user or team has their own space keeping things so secure and making sure no one messes with each other's stuff. Easy management. Argo CD makes it simple to control and manage everything from one place ensuring everyone follows the rules. Lastly it works everywhere so whether you're using public or private cloud Argo CD's multi-tenant works smoothly in each tech setup. Now I would share some smart moves with you in case you might consider moving to Argo after this session or might use some suggestions or maybe obviously give us some. So Argo CD offers deployment flexibility options. First we have the centralized Argo CD and as the name suggests it is a single centralized Argo CD instance which is used to manage the deployment across multiple communities clusters. This approach is suitable for organizations with a limited number of clusters and a desire for centralized control and visibility over the entire deployment process. Next we have the Argo CD instance per cluster. In this configuration each communities cluster has its dedicated Argo CD instance. This setup is useful when clusters are operated independently and there's a need to isolate the deployment control and resources for each cluster. Last we have a combination of both. This approach involves a mix of both centralized and decentralized Argo CD instances. This allows for a balance between centralized and decentralized management. I'll give you a quick example. So you might have a centralized Argo CD instance which is managing deployments for critical applications that span multiple clusters. While each individual cluster has its own Argo CD instance for managing their cluster specific applications. In Argo CD configuring default access to deny means users have no access by default and must be explicitly granted permissions. Conversely you may also set it to an empty role which would granted basic access by default. This obviously requires some careful management to avoid any unintended privileges. So choose based on your security needs and the principle of least privilege. I know it's lunch time but I'll quickly go through this last one. A few more best practices for you establish a global app project for shared resources ensuring consistent control across all tenant projects by whitelisting or blacklisting necessary items. All tenant applications should inherit permissions from the global app project. Maintaining uniform access rules for a seamless multi-tenant experience. Always maintain a clear separation by having distinct app projects from administrators and tenants. This would ensure the right levels of control. Implement robust security measures for connected Git repositories to prevent any potential threats from malicious code or unauthorized changes. Then we have considered blocking cluster-wide resources in most projects. This would enhance security by limiting unnecessary access and potential misuse. Lastly, you may consider fine-tuning your RBAC with a focus on projects. This would ensure that permissions align with the specific needs of each tenant promoting a secure and controlled environment. Thank you everyone. You have been a gracious audience. A huge shout out to KubeCon, CollaborativeCon, co-located events and Expedia Group for giving us this platform. We're open to questions. If you have any, we'll try to give our best. Thank you.