 Hello. Hey, bunch of everybody. Good morning. My name is Michael Coyle. I'm the Global Container Lead for Accenture's AWS Business Group. Pleasure to be here with you guys today to talk a little bit about designing the multi-tenancy strategy for your organization. With me, I have Naveen from Reface Systems, my colleague and esteemed partner. So today we're going to get into a little bit about what multi-tenancy is and what it means for your enterprise organization. As the Global Container Lead, I work with numerous large enterprises, Fortune 100 companies, a lot in the financial services industry, a lot in the insurance industry. And these organizations are looking for specific things. They're looking to save money. They're looking to manage their resources effectively. They're also looking to reduce complexity. So where are we on the slide here? Okay. So use cases for multi-tenancy. Let's get into that a little bit. What are the things that we consider when we're looking at it from an organization that trusts its neighbors? It has applications that are running in a dedicated cluster that has numerous components. It could be shared services. It could be APIs. It could be a front-end component. So you trust your neighbors. That's one of the reasons we see our enterprise customers using multi-tenancy clusters. What are your data requirements? I can't see that. Your data is extorted externally, so you're using RDS. You're not keeping stable sets in there. That makes your data privacy concerns reduced inside the tenants, right? These are teams small. Reducing complexity helps development teams, operational teams keep their complexity at a minimum, right? There's not a bunch of different clusters that need to be upgraded. There's not a bunch of different DevOps pipelines matching all these different clusters. Our cost is a significant factor, right? How many clusters do you need to manage? How many different types of nodes do you have? How many different applications are being deployed to these clusters? Is there a need for effective resource utilization, right? Maximizing your available resources in a multi-tenant cluster allows better cost management. What about regulatory or audit compliances? Do you have concerns about data privacy? If you have a consumer-facing application, right? Is it public-facing on the Internet? There's a security concern right there. What do you require minimal customization? Do you need to have different kernel aspects? Do you need to talk to the cluster API? Do you need to do other internal cluster operations? Let's get into a little bit about hard versus soft cluster security concerns, right? A soft cluster, soft multi-tenancy. This is typically what we see in our enterprise customers. They trust their neighbors. They have applications that way. They don't have to worry about people trying to probe and attack their application layer. They can focus more on uptime and preventing issues, outages, then you have hard multi-tenancy where you have public-facing application. This is good for service providers offering service to public consumers or adversarial tenants with no different types of users, no relation to each other. They could be trying to sign up your service and probe around, see what's going on inside the cluster, see if your tenants have any vulnerabilities. They're consistently exposed to many bad actors trying to exploit their systems, and they focus on securing systems that isolate each tenant from threat vectors. But these terms are a little vague and they could be a broad spectrum of different things, right? So there's no real sure definition for hard versus soft multi-tenancy. Some of the patterns we have, right? Namespace is not shared clusters. They're created on demand for applications or platform teams. Application or platform teams may own singular multiple applications or microservices. Share services is a typical pattern we see as well. They're consumed by individual application owners and are typically seen on multi-tenant clusters, workspaces on shared clusters. Self-service experience for application teams, the ability to create namespaces within specific constraints, and dedicated node pools with shared cluster for guaranteed resource allocations. State-of-the-laps potentially noisy neighbors, isolated platform stacks, and then we have the virtual cluster. Virtual clusters are kind of neat, right? Applications need cluster-wide privileges for development and validation. They have separate APIs and data stores for isolation of all objects within the virtual cluster. And you can have a shared platform stack for certifications, policy enforcement, ICOs, secrets management and monitoring. And then you can also do simulations. Simulations for testing new tools, ingress, or alpha k8s distributions. With that, I'm going to hand this over to Naveen and he's going to take over. Thanks, Michael. What I'm going to do is cover some of the architectural slash operational considerations when you're implementing multi-tenancy. I'm going to be talking more about the concepts, not going to get into the solution. And I'm also going to call out some of the common tools that are available to solve each of the challenges, implementation challenges when you're looking at multi-tenancy. Okay, the first one is the host cluster. So cluster could be running on-premise, cluster could be running in the cloud. So you'll have to build the multi-tenance stack in such a manner that the solution works across any cluster type. Because the last thing you want to do is build something custom to a particular environment and realize that you have to migrate clusters to another environment and you have to rebuild the stack all over again. So this is an important consideration to keep in mind when looking at some of the concepts around multi-tenancy. The second thing is self-service. A couple of speakers already mentioned about this. An important aspect to think about is what kind of self-service do you want to implement? Do you want to implement self-service at all? And if yes, what shape and form should it be? As an example, you have to think about the self-service interface. Is this going to be backstage? Is it going to be a CMDB? So you have to think about all of those aspects as well. And if you're going to be providing namespace as a service, how are your users going to create namespace? How are you going to make sure that they don't have cluster-wide privileges? So those are the aspects that you need to think about when implementing multi-tenancy and also enable a self-service model. Next one is control access. When you think about control access at a very high level, you can distill it into three pieces. The first piece is how do you wire up your identity provider? Because your identity provider is a source of truth. Depending on the group associations that a particular user has, you want to make sure that the user has associated permissions access to the cluster. So this is a critical piece. So you have to think about how you're going to integrate your identity provider. And the second piece is managing the Kubernetes RBAC itself. So what kind of privileges do you want these users to have? As an example, for a developer, you may want to provide full access to a test cluster, but then for production environments, maybe you want to allow them to only spawn spots, get list, watch, delete parts, but not do anything else. So how do you do it at scale? How do you do it across many different clusters that you have? And also the life cycle associated with it, right? As in when user changes group membership or user is no longer part of the company. So you don't want, you want to make sure that there's nothing stale on, on those clusters. And the third piece is the access piece itself. How do you enable your users to securely access your cluster? They could be working from home. They could be working out of a cafe. How do you make sure that they can access clusters across your on-prem environments and running in the cloud in a consistent manner? So very high level three things, right? So identity provider, wiring it up with your access. Second is managing the Kubernetes RBAC itself at scale. And the third is enabling secure access. A CNCF project that helps you do this is called Palace. So it makes it easy to kind of manage RBAC at scale across many clusters and also enable the secure model for access to clusters. The next thing to think about is tenant isolation. I feel that this topic has been beaten to death already. Namespaces by default are not isolated in Kubernetes environments. So as and when you create namespaces, you have to figure out how do you kind of attach the required network policies. And if a particular team or application shares or has multiple namespaces, so you want to make sure that those namespaces can communicate with each other. A general problem that we see is so every cluster has a bunch of what I call a system namespaces where your security tooling may be running, maybe observity tooling, etc. And you want these user namespaces where the applications are running to communicate with them, but then the user namespaces themselves need to be isolated. So you'll have to think about it in conjunction with all the other policies that you want to implement for the cluster itself and arrive at a standard mechanism for doing it. Some of you may also have the need to make sure that when auditors ask for information on whether the necessary controls are in place across your clusters, so you're able to provide evidence. So the other important thing to consider is how do you make sure that you have those mechanisms in place. And the last thing is resource quotas. Absolutely important so that the application teams don't step on each other and you don't have a problem where one rogue team or application ends up disturbing or bothering the other applications that are sharing the cluster. Net-net, make sure that you have the necessary network policies in place to make sure that you have the right evidence, etc. capturing that the necessary controls are in place. And third thing is always remember to enforce resource quotas because these are going to be critical when you implement a multi-tenant model for your application teams. The next one is policy enforcement. Of course, there are certain policies that you want to enforce from a security and operational efficiency standpoint, but for multi-tenancy clusters, you also have to think about the ecosystem. An example is, let's say you have a security tool, right? And the security tool flags a container as vulnerable. So one of the things that a security team may want to do is find out which team or which application this belongs to. And usually they look at namespace and if you have the necessary namespace labels configured in an appropriate manner, it's easy for them to close the loop. It's easy for them to figure out who do I need to reach out to in order to remediate the vulnerability. Another use case is chargebacks. Doing it on the basis of namespace labels becomes a lot more easier, especially if applications have namespaces across multiple clusters or you have a paradigm where you want to figure out the cost for both your test environments and your production environments and they're all over the place and you want a consistent way of figuring out how much does a particular application cost. So another example is there are other tools that could rely on namespace labels to enable role-based access. You could have like an observatory tool that looks at the namespace labels that the user has access to and accordingly grants access to logs. So these are some of the ecosystem considerations that you have to think about because in a dedicated cluster model, you can basically say, hey, yeah, as a user you have access to everything on the cluster, but in a multi-tenant model, you have to think about every single ecosystem solution that's going to impact how users operate. The next thing is in terms of violations, policy violations, you could be using OPA gatekeeper or Kaivano, how do you make sure that they centralize visibility and how do you make sure that your end users, developers have access to those violations but only for violations that are against their namespaces because if they don't have access, they have no clue what's going on and some of you, part of the platform team, you need to be following up with them to kind of remediate those violations against resources that are already on the cluster or if some of the actions are getting blocked, you have to handhold them and tell them what they need to be doing. So it's very important to think about how we're going to expose information on violations to your end users when you're implementing policies. And the last thing is you want to shift things to the left as much as possible because you don't want to like find out issues at the time of deployment. Next thing is chargeback, showback, and cost optimization. This is very critical because the primary driver for implementing multi-tency is cost reduction, right? And if you don't double down and then further ensure that the applications are running in a cost efficient manner, then all the work that you've done, all the other necessary controls that you had to actually put in place, it isn't really worth it. So from a chargeback, showback standpoint, of course a prerequisite for this is to make sure that you have Prometheus stack or something along those lines to collect granular utilization metrics from the cluster. But once you have that, you have to think about how you're going to construct your chargeback models. Is it going to be based on namespaces? Is it going to be based on a combination of namespace and cluster labels? So that's the first thing that you'll have to figure out because you have to do this at the beginning, even before implementing multi-tency. Otherwise, you're going to be force fitting solutions later on. The second piece is unallocated resources, right? Obviously, there are costs that someone has to pay. Ultimately, the numbers need to add up. So how do you share the cost for unallocated resources in your clusters across many teams, many applications that are sharing the cluster? There's another thing to think about. And also, you're going to have common services. I mentioned about security tooling, observatory tooling. There's cost for running some of those tools, right? And you have to figure out a model for sharing the costs for those as well. And the last thing is you can consolidate non-cubinitis costs and personal costs, et cetera. So from a chargeback, showback standpoint, there are multiple steps involved. So you need to kind of understand and appreciate all of these nuances as you're kind of implementing your chargeback slash showbacks. And in terms of optimization, there are two different parameters or vectors that you need to consider. One is cluster efficiency. And the second is the namespace slash application efficiency. Let's take the first one, cluster efficiency. What does it really mean? So you're basically looking at how much unallocated capacity you have in the cluster. The more the unallocated capacity, it means the cluster is less efficient. And you have to figure out what are the tools. So what is it that you need to do? The answer could be reducing the number of nodes, switching to a different node instance, implementing tooling such as carpenters. So it could be anything, right? But as a starting point, you need a good appreciation of metrics for your clusters, the utilization metrics, so that you can figure out what you need to be doing. The second thing is, so once you've taken care of cluster efficiency piece, you have a process in place. The second thing to think about is your namespace and application efficiencies. So here you have a different way of measuring it. So how much are you allocating or based on resource requests? What is the utilization? So if the utilization as a percentage of requests is a low number, then it means that you have a problem. And this is a very common problem that we've seen in customer environments where application teams, they arrive at a number and usually that's a very suboptimal number. So how do you kind of have a process in place where maybe the application teams have access to their utilization metrics and you expect them to go course correct? Or you need to think about other solutions, whether it's HPA, VPA, anything else. And one of the other major considerations is, how do you make sure that you go back in time? Because you don't want to make resizing versions based on data for the last five minutes or an hour, even a day. So you'd want to go back a couple of weeks and look at the metrics over a period of time before arriving at a sensible number for resource requests and limits. So these are some of the aspects that you have to think about. And it has to be a recurring ongoing process. Of course, the first time you do it, you're probably going to see more of a benefit. But then, subsequent times, it's important to keep the process going because you don't want to kind of get to a point where everything is working efficiently and then things break down. So you'll have to think about how you kind of make sure that there are systems in place, there are processes in place to ensure that your systems are running efficiently. That was it. Any questions we are open to questions now? You mentioned a tool for RBAC management across multiple clusters. What was the name of that tool for the access control? Yeah, it's a parallelist, P-A-R-A-L-U-S. Okay. Yeah. Thank you. Any other questions? No, thank you so much.