 Cluster API works very well in private cloud-like environments. In this session, we'll be sharing about our experience about how to get this cluster API fit into multi-tenant cloud environments, and how we have modified the standard cluster API usage pattern to fit into our cloud environment. And lastly, we'll also share the details on how we built a Kubernetes engine that delivers self-managing clusters in a self-service manner in multi-tenant cloud environments with cluster API at its core. With that, let's get started. So the agenda for the first 1 third of the session we'll be talking about current usage patterns of cluster API and some of the issues that we saw in multi-tenant cloud environments. Arun will be going through that. So basically, we set the problem statement. And then I will be covering on building Kubernetes engine with self-managing clusters. And lastly, we'll be touching on fleet management operations. Before we delve into any of the details, I would like to give you all an overview on what multi-tenant cloud environment that we are tackling here, which is our product VMware Cloud Director. So this is the cloud. And basically, this is an infrastructure as a service platform. The cloud provider here is supposed to sell the infrastructure resources to their tenants. And the tenants here are not end individual users, but tenants here are enterprise-level tenant organizations. And there could be thousands of tenant organizations in our cloud environment like this. And these tenant organizations are strictly isolated from each other in terms of compute, storage, and networking. So they have very strict tenant boundaries. And now the expectation here is that users from one of these organizations would come in and request for a Kubernetes cluster. And somehow, these Kubernetes clusters need to be manufactured and delivered to them in a self-service manner. So that's the goal at large. And the personas that we are dealing with for this session will be mentioning a few of these personas in the next upcoming slides. Cloud provider is the one that oversees the entire cloud. And tenant org admin is the one that oversees the tenant organization. And tenant org users are the end users who would want the Kubernetes clusters. With that context set, as I mentioned before, the goal here is to deliver Kubernetes clusters in a self-service manner in this kind of multi-tenant cloud environment and with cluster API as its underlying technology. And one of the early observations that we have made while investigating this was few of the personas that we just described, like cloud provider, tenant organization administrator, they need not be necessarily cluster API savvy. So we had to keep this in mind when we were actually coming up with proposals and evaluating pros and cons. A quick background on cluster API, we expect the audience to have a fair understanding on what cluster API is for this session. I'm going to quickly go through this. Cluster API is a Kubernetes 6 project that simplifies lifecycle management of clusters. And it offers a lot of benefits in terms of simplified usage, ease of development, faster feature delivery rate, and most importantly, the ecosystem behind it. So yeah, we love cluster API. And the cluster API expects a parent cluster to be already present in order for end users to create and manage workload clusters. So a Kubernetes cluster with cluster API components installed is called a management cluster. The idea here is that user would come in and run commands against it to create workload clusters. So I want to highlight two points on this slide. Number one, this user is expected to have access to the management cluster. It could be a low-privileged access, but the user needs access to the management cluster. And another point here is that the cluster API here in this management cluster needs to have access to the control plane endpoint of these workload clusters for it to manage. So just keep these two points in mind. Next, Arun actually will be going over a few of the cluster API usage patterns. And we'll see how these two points can become challenge in multi-tenant cloud environment. OK, so with that, I'm going to play the recording that's about cluster API patterns. In this section of the talk, we'll talk about cluster API patterns used in the industry, and how do we actually ship Kubernetes cluster creation as a service using cluster API. One of the simplest patterns is the cluster API usage in a private cloud. So suppose you have a team of 10 engineers who want to create workload clusters, and they want to share one management cluster. So what they would do is they would essentially assign one of the engineers as the management cluster administrator, and each of the engineers who want to create a cluster will actually get a namespace access to this particular management cluster. So the issue I request for their workload cluster, they get a workload cluster. The management cluster administrator will actually ensure that the management cluster is upgraded, updated with CVE fixes, and so on. So it's a simple collaborative exercise. How does cluster creation work in the case of private cloud? So this is something which can work in the case, in the general case as well. This is a cluster API use case. So there's an org management cluster. There's an org management cluster administrator. A user wants to create a cluster. They ask for access to the management cluster. They get a namespace. Then they issue a cluster creation request with the infra token. They get a workload cluster. Then they actually try to get the admin queue config of the workload cluster using the same namespace, and they get the admin queue config. So now they are set and ready to go. That said, however, the problem statement gets expanded in the case of a multi-tenant cloud. So in the case of a multi-tenant cloud, you could have multiple organizations, say org1 and org2. And each organization has multiple users. Now each set of users wants to use a set of management clusters which create workload clusters in their own tenant org. So tenant org1 users would actually try to use management clusters to create workload clusters in org1. Likewise, org2 will use the management clusters only for org2 and create workload clusters in org2. There needs to be a management cluster administrator who ensures that the two sets of management clusters are isolated. So you can see that there's a amorphous cloud of management clusters, and this administrator ensures that there is isolation. There is fairness also, that I mean, one of the requests of org1 should not swarm the request of org2, and that these management clusters are always available. So there needs to be a one or more management cluster administrator personas who maintains all of these management clusters and ensures that they are highly available, backed up, and restored in case of failures, and they are always isolated and the other common aspects of distributed computing. Now, if you take any management cluster which is used by multiple workload clusters, the management cluster can be considered as a single point of failure from a distributed computing perspective. Likewise, currently we are talking about isolating management clusters using namespaces, which is good for some set of use cases, but let us consider what happens if there's a network partition. In the case of cluster API, networking is king. So essentially, management clusters and workload cluster need to have a networking connection all of the time. If there's a network partition, the workload cluster cannot be upgraded or updated or patched. So it actually essentially is not serviceable anymore. Likewise, what happens if the management cluster is compromised? In that case, every single workload cluster which is attached to the management cluster is compromised. So all of the secrets, all of the tokens which were actually in every single one of the workload clusters is lost, and they need to be recovered in not an easy way. We can say. So that said, now let us actually look at two patterns and see how they could fit into these use cases in the multi-turning cloud. One simple pattern is that there is one global management cluster. All of the workload clusters actually talk to this global management cluster. Likewise, all of the users get workload clusters using this. Simple pattern, one management cluster to rule them all. There is a single overhead of one global management cluster. However, we cannot have strict network isolation, which is required by many customers of VMware Cloud Director. So for example, suppose there needs to be one global management cluster which has cross-tenant access to all of the workload clusters. And that is something which is not easy to do. However, it is doable with a lot of infra help. Likewise, fairness and quota. What if Oregon has 1,000 clusters and Oregon has five clusters? How will cluster API ensure that the requests made to the same management cluster will be fair? And one of them will not swap the other? Note that Oregon and Oregon can be competitors. Resources management. Suppose Oregon has 10 cluster autoscalers and Oregon has one. How do we ensure that Oregon gets the right bill and Oregon gets the right bill? How do we ensure that this management cluster is always highly available to all of the organizations? And the final thing which is a problem is that many of the tenants do not like to trust an infrastructure component which is outside their tenancy, which can be trusted by other competitors as well. So the obvious next extension to this is have one organization management cluster per organization. So Oregon has its own management cluster or two has its own management cluster. This is a good solution. Tenants don't need to trust an external management cluster. There is more flexibility in the shape of the cluster. So each user can talk to their own management cluster administrator and actually get a decent, I mean, they can say that they want Celiam and Andrea and these users can say they want Calico and they can actually begin to get clusters with their own CNA and things like that. The overhead of management cluster is delegated to the org itself. So it's very easy to do resource management. Strict isolation is available at their tenancy level. So because there's a tenant boundary and we are all based in the tenant. However, we run into the issue of scale. So there can be thousands of organizations in VMware Cloud Director. And we are saying that addition of new organization means creation of a new management cluster and creation of a new management cluster administrator who manages that within that organization. So there needs to be one management cluster administrator per org. So we are scaling the personas as well along with the infrastructure. We don't get strict isolation at a user level which is not a big problem, but it is a problem for some use cases. And as you can see, management cluster is still a single point of failure. So in the remainder of the talk, Sahiti will actually discuss about various solutions we have actually thought about for this structure, for this problem and how it fits into our infrastructure. Thanks, folks. Okay, so let's do a quick recap on what Arun has actually went over. So basically, he went over the couple of cluster API usage patterns and out of those two stand out. Number one, cluster API usage in a private cloud-like environment where we have a single management cluster and we have multiple workload clusters under it. That pattern mostly works very well, especially in clouds which are of smaller size and teams which are of smaller size. And the next one is cluster API usage in multi-tenant cloud platform where dedicated management cluster is expected to be present in each tenant organization. In which case all of the issues that we just went over, the problems, all those seemingly minor issues now begin to look much bigger and that's because of the scale that we are talking. Few of the challenges here are user-level multi-tenancy within a management cluster. So this issue is a challenge within itself, even with single management cluster. Now imagine for thousand organizations having thousand management clusters and if admins are supposed to do this user-level multi-tenancy for each management cluster, it's going to be a challenge. And similarly, for the production grade workload clusters, we want to somehow guard the management cluster at all times, so we need to worry about backup, restore and bunch of other issues that we have already seen. Now, so with all this in mind, we have revised our problem statement. So original problem statement was bullet number one here, which is to build Kubernetes engine that delivers Kubernetes clusters in a self-service manner with cluster API. Now, number two and number three, we have added those conditions to ensure there is not much cost for the providers and also to the tenant organizations, that is to avoid the burden of managing management clusters. And number three is reduce the need for cloud admin and tenant admins to be cluster API savvy. So usually the primary responsibility of these cloud providers and tenant administrators is to provide infrastructure as a service platform and not necessarily the platform as a service like Kubernetes engine. Okay, so now let's say no management clusters. Let's imagine there's no management cluster layer, neither at a cloud layer nor at individual tenant organization layer. Instead, write a program, simple program, which we call it lightweight, stateless Kubernetes engine, that manufactures these Kubernetes clusters somehow and then make these Kubernetes clusters a bit more capable so that they can manage themselves, which we are calling self-managing clusters. So in this slide, so we just talked about two things. One, Kubernetes engine and then self-managing clusters. So first, let's look at what self-managing cluster is, then I'll double click on the Kubernetes engine part as well. So what is a self-managing cluster? So this diagram represents the traditional hierarchical structure between management and workload cluster. So the idea here is that user would come in and run a command against management cluster. As a result, workload cluster gets created and associated workload cluster records are created in HCD database of the management cluster. Now, in order for this workload cluster to make itself managing, we need two steps. Number one, install cluster API in it. And then number two is move these records from the parent to the child. And for that, we can leverage cluster-cuttle-move command. And once we do that, once these records are moved from parent to the child, so the cluster API sitting in the workload cluster can see its own records. And that's what actually makes it operate on itself. And basically, that makes itself managing cluster at this point and the parent to child relationship is broken. And now user can directly access this workload cluster to run commands against it to resize, upgrade, and any kind of update operations. And yeah, so we have got our self-managing cluster now and we can get rid of the management cluster. And the concept of self-managing cluster is not something new or we have not invented it. It has been there and cluster API supported it. The novelty here lies in the fact that we have productized this concept of self-managing clusters to build a Kubernetes engine that delivers these clusters in a self-service manner. Okay, now let's put all of these together and see end-to-end workflow in action. Yeah, in other words, what we are trying to do here is a cluster API as a service on multi-tenant cloud. So let's go back to the original picture and we have this thin layer Kubernetes engine and the idea here is that one of the users from this tenant org would come in and request for a cluster. And now this k8s engine processes the request and the first thing that it does is it creates a bootstrap cluster. Basically, it creates a lightweight virtual machine inside that tenant org space and then install cluster API on it and then, sorry, first install kind software which is used to create the bootstrap cluster and then install cluster API on it and then create the workload cluster by applying the CAPI AML and then install cluster API on the workload cluster and then move the records from the management cluster to workload cluster. So all of this is being orchestrated by this Kubernetes engine. And once we make this workload cluster self-managing, this Kubernetes engine gets rid of the bootstrap cluster so that's why we called it fmrl management cluster. So at this point, we have got the self-managing cluster ready and that's delivered to the end user and likewise, other users can request for more clusters and these self-managing clusters are delivered to them in a self-service manner. Okay, now let us take a deeper look. Okay, so sorry, before we go to the next slide, I have one more point to share here. So this is a composition of the self-managing cluster. So this Kubernetes engine also ensures that a few of the essential packages like CNI, CPI, CSI are installed on it and then there is this custom operator which I'll be mentioning in a bit. And another point here is that for cluster deletion, the same process as cluster creation needs to be repeated. As in the creation of the bootstrap cluster and moving, so this time around, we need to move the records back from child to the parent. And the reason being the cluster cannot delete itself in a clean manner. So if user goes and directly issues a command to the self-managing cluster, okay, delete. So then things may not work properly. So that's why we have opted for this option. Okay, with that, now let's go ahead and discuss updates, upgrades and fleet management on how all of these operations would work on self-managing clusters. Okay, so the one A, we have already just discussed this. Once we have the self-managing cluster, if the user is cluster API savvy enough, they can directly go to the cluster and run kubectl apply and do whatever they want, resize, update, upgrade. And now in our cloud environment, there are several other use cases that we had to handle. Like as a novice Kubernetes user, all I want to do is click bunch of UI buttons to resize or upgrade the cluster. So there can exist such person us where somebody just creates Kubernetes cluster and hands them over to someone else who does not have enough privileges to create Kubernetes cluster. So this use case is valid and important for us. And that's one. And as a, now let's imagine we have a management cluster and 100, sorry, let's say we have 100 self-managing clusters floating in the cloud environment. Now, as a cloud provider, how do I upgrade, for example, cluster API version on all of these self-managing clusters? We cannot expect the cloud provider to go to individual Kubernetes cluster to upgrade these cluster API versions or for that matter, any application. And as a cloud provider or a tenant user, how do I install my own applications if I want to without accessing the Kubernetes cluster? And how do I apply network policies or set a container registry on this group of clusters? So for all of these use cases, except for one A, so the user wants to just express the user intent and somehow this user intent is supposed to be propagated down to the Kubernetes cluster. So let's see how we can do that. How to propagate the user intent to the target cluster. So two conventional models, push model or pull model, let's see how push model works. So we have already written a simple program and we call it Kubernetes Engine. Now we might as well leverage that. So we can have this Kubernetes Engine push the user intent to all the Kubernetes clusters. This might actually work well in most of the cloud environments, but in our case, these tenant organizations here have strict tenant boundaries, networking boundaries and it's not that easy for us to punch firewall holes there. So in order to push, basically this Kubernetes Engine needs access to the control plane and point of these workload clusters which may not be necessarily the case. So this doesn't work for us, but this may actually work in other cloud environments. So now let's see other option that we have pull model. So the idea here is that we write a simple operator that subscribes to bunch of various forms of user intent and periodically pulls and applies them onto itself. So this actually works out well for us, even in strictly isolated multi-tenant environments and we went actually with this option for to build our Kubernetes Engine. And this operator can be installed at the time of cluster creation process. So the Kubernetes Engine, we can have that install this custom operator while creating the self-managing cluster. Okay, all that said, now we are at a stage where we have built a Kubernetes Engine that delivers these self-managing clusters in a self-service manner and we have also addressed the fleet management problem. Now let's quickly evaluate the pros and cons of having this management to child workload cluster hierarchy versus just self-managing clusters. Reduced infrastructure costs. So first thing is we need not worry about the infrastructure resource management for the management clusters at all. Less cost to the tenant organizations. We don't have to hear from tenant organizations saying, hey, why is there some cluster which is non-functional for me where I cannot host my modern applications? So, and why do I have to pay for those management clusters? So that question will not even arise. And another thing is there is no need to worry about HA backup restore of these multiple management clusters. The moment we have production grade workload clusters, we definitely need to worry about these, we need to guard those management clusters. So we need to worry about high availability backup restores. So this is all extra infrastructure costs and operational costs as well. So the moment we have these management clusters, there is a need for cluster API savvy persona to manage. And there is no need for us to worry about user level multi-tenancy within a given management cluster because we don't, that problem doesn't exist anymore. And every user gets their own workload cluster. And we need not worry about Kubernetes versions queue between management and workload clusters and another point here is user has a choice of not being part of the fleet and they have more granularity to work on their clusters. And multi-tenant logging, this is one of the sweet hidden benefits that we got from this approach of self-managing clusters. So let's say we have the management cluster and there are 100 workload clusters under it. If there is a problem with someone single workload cluster, the owner of that workload cluster has to go to the management cluster owner and have to say that, okay, so something is wrong with my cluster, please analyze the logs and fix it. So now in addition to the hierarchical structure between management and workload cluster, now we also have the hierarchical structure between the personas and that would all add up to more operational costs. So now with this, where user can access their own logs, the debugability becomes much easier and this actually tremendously helped us to troubleshoot and debug our environments. And we have already addressed the fleet management problem where by actually having the custom operator to subscribe to various forms of user intent, pull them and apply it. And this user intent can be captured in the, it could be either as an API resource or S3 endpoint or a simple configuration file that is sitting in NFS share. It's, I mean, there are a lot of options that we can choose and this operator can be customized to suit our cloud environments. Personas, no need for multiple personas to keep themselves up to date on the internals of cluster API as the copy knowledge is built into the Kubernetes engine. And this is important for us because we have many personas like that, like cloud provider and tenant or administrator. The moment we introduce this learning curve, that means some reluctance and resistance for the product adoption and time to market. Okay, so this is what we have accomplished. This is how we satisfied all of our requirements. There is no management layer and all the baggage it brings. Every cluster here is a standalone independent and self-managing cluster. There are no happy admin savvy personas, scale and security issues are solved, fleet management is figured out. So we built a Kubernetes engine that delivers self-managing clusters in a self-service manner on multi-tenant cloud environments with cluster API at its core. And the cost we had to pay to do this is to write a program that automates the manufacturing process of self-managing clusters and another operator to act upon fleet management operations. And if we have to compare this cost that we would incur with the long-term maintenance of numerous management clusters in these gigantic cloud environments and all the baggage it comes with, the cost of developing this design for once is almost nothing. And we have productized this and our customers are using this solution to spin up production grade clusters. Yeah, that's the power of self-managing clusters. Thank you. So yeah, please feel free to ask.