 So Anand is the team leader of OpenShift Github service team at Red Hat. He has around 17 years of experience in building and operating interface applications. And yeah, I think we're all excited to hear from you Anand, yeah, over to you. Hello everyone, my name is Anand Francis and I'm from the Red Hat developer tools team. I'm here to present a talk on multi-cluster Github's, mainly dealing with two open source products, Argo CD and Open Cluster Management. So probably Argo CD, many of you might have been familiar with that, probably a quick show of hands, like how many have heard about Argo CD, you're not surprising, it's quite a popular CD tool. And how about Open Cluster Management, have any of you tried out or used Open Cluster Management? O.C.M. Okay, I see one hand, nice, okay, yeah. Probably this was done already, I'm working as a team lead for one of the managed service that we are building in Red Hat. It's a CI CD service built on Tecton and Argo CD. So this is a quick agenda on what we'll discuss in this session, we'll see what a multi-cluster setup is, the need for it and what are the challenges that we have. Next we look into like what is O.C.M. and how O.C.M. can help. Then we can see what are some of the Github's principles and what is Argo CD. Then we can see how we can leverage both O.C.M. and Argo CD to build an open application model based framework, so that will be a demo. Based on both these open-source tools, finally we can have questions. So let's start with a basic cluster, so this is a single Kubernetes cluster. We have a control plane and a data plane and in control plane we have storage in HCD, scheduler, controller manager and API server and data plane we can have multiple nodes and pods are kind of allotted to each node. So this is a single cluster setup and ultimately we can have a cloud provider which provides the load balancer to interface with all these workloads. So now let's get into one more detail, like I'm adding another cluster. So this is a two cluster setup but still you can see that the control plane nodes are kind of segregated, they work in isolation. So this is one issue with Kubernetes, like a cluster is kind of an independent resource and the control plane is kind of controlling all the aspects of it. The life cycle of the cluster is controlled by that single control plane. So there is no interaction with that. So what we mean by a multi cluster setup is some kind of federation between two or more Kubernetes clusters where we have a control plane that can control more than two clusters. So these are some of the common reasons why we need a multi cluster. So the broad areas I've just marked here. So one is the location, the next one is the isolation, reliability and there are some specific scenarios where multi cluster is the only choice. So let's start looking into each of them. For location there are like jurisdiction constraints where data has to reside on a particular region or a country. Latency, there can be like customers accessing your services across multiple regions from the world and the latency can be a factor. And then we have data and service gravity. Like you have your data residing in some particular region and you can't move that out. So you need to operate your Kubernetes cluster in the same region or your service might be already running in a particular region. And you want to run your Kubernetes cluster along with that. We'll look into each of it in detail. I'm just giving a quick overview here. Isolation, though Kubernetes provides isolation through namespaces, it's a weak isolation. So if you need strong isolation, you can have scenarios like where you have multiple environments like dev, test, staging and QA. You can have hard isolations by having multiple clusters. And cost could be a reason where you can segregate the cost aspect of it from different departments and different parts of your company having different cost controls. The next can be performance. You can have multiple clusters offering different service level agreements and thereby giving different performance and cost benefit of it. The last one is the security of it. Like you can run your audit services and security related services in a secure isolated cluster, which is not exposed to the other parts of the workload. And then the last reason is the organization. You can have an organization-specific multi-cluster so that each part of the organization does not interfere with the other part of the organization. Next, related to reliability, we have infrastructure diversity. Like you can have multiple clouds, multiple regions to increase the reliability of the service. So for example, if one AZ goes down, you can have multiple AZs, I mean clusters running in multiple AZs and regions to take care of the infrastructure, reducing the blast radius. So if one of the workloads is down, like you can reduce the blast radius by operating it on multiple regions and multiple clusters. Next one is to handle upgrades. So whenever you are upgrading your Kubernetes version, you can have multiple clusters and slowly move your workloads to the new version. Last one is scaling. So if your Kubernetes has reached a scaling point like where you have to increase the number of nodes and the number of nodes are kind of a limiting factor for a single cluster, you can have multiple clusters to kind of scale your workloads and move on to different clusters. Yeah, so the specific scenarios is like some complex migration scenarios where you have to migrate to a different CNI. You need a different cluster altogether to move to different CNI versions. And then there are some edge and IoT use cases. Now we are going towards MicroShift and K3S where Kubernetes is being run on a Raspberry Pi kind of a device. So there it can be like each device can be treated as a Kubernetes cluster in itself. And there we can use this multi-cluster scenario to kind of control the overall flow. So a bit more detail on the same aspect, like one is the due restriction, like where customers are forced to run all their data centers in a particular region or a country. Next one is controlling the latency. So if there are customers working and accessing your service in different regions, you can have different load balancers and you can point to the right load balancer based on the customer's region. This is related to data gravity and service gravity. For example, some of the data that your workload requires is already available in one of the regions. And the cost of egress of that data would be expensive. So you are forced to run your Kubernetes cluster in those regions. So this is one example. And this is related to service. Similar to data, if you are depending on some particular service, regional service, and you are not able to move that service, you can use the multi-cluster approach to handle that. Yeah, and these are like performance related, like you can have different customer tires where you can have different SLAs. You can offer different SLAs for different customers. This is the environment that we already spoke out. We can have dev functional different environments on each cluster. Organization, like during mergers and acquisition, like there can be different organizations that are kind of integrated. And you don't want different parts of the organizations to kind of interact with each other. In that case, you can have different clusters for each organization. And there could be cost-controlled reasons where each organization's want a separate bill. So they don't want to share the bill and they can have cost control on their usage. And infrastructure diversity, so you can use different flavors of Kubernetes cluster to have a diverse infrastructure, which can help in increasing the reliability of your workloads. Reducing the blast radius, for example, if one of the cluster goes down, there will be still one LTE cluster that will still serve the request from the customers. These are cluster upgrades. So when you want to upgrade a cluster across different versions, you can use this multi-cluster and kind of move your workloads to the newer clusters. So Edge IoT, that's an upcoming area where Kubernetes is being run on smaller devices. And you can use this multi-cluster approach to kind of handle the fleet management of those clusters. And this is like an example of a CNI migration where you need a completely different cluster with a different CNI provider that is installed. And you want to kind of slowly move your workloads to the newer cluster. So now let's look at the challenges we have. The first one is the infrastructure cost. So as you scale more clusters, the number of control planes is going to increase. And the cost of administering the cluster also will increase. So the infrastructure cost is going to go up. The next is the complex configuration. So you have to plan well in advance, especially the network configurations, how each cluster networking should look like, and how you kind of configure each of the cluster. So the R back and other related parameters that needs to be configured for each cluster needs to be done in an efficient way. So this is a challenge right now. And security aspects, same like we have to configure R back. And we have to ensure that all the security certificates are kind of handled efficiently. And the inter-API calls between these clusters also needs to be handled. So OCM or the Open Cluster Management handles some part of these challenges. Infrastructure cost is not something that can be addressed with the OCM tool. But configurations and security aspects of it can be handled using OCM. So now let's look into what OCM is. So Open Cluster Management, or OCM in short, is a community-driven open source project. It's in CNCF sandbox maturity level, as of now. It's based on a hub and spoke architecture. So we have a central admin server and several managed clusters that are attached to that. It supports multicluster and multicloud management. It can provide fleet management. So you can handle all the configurations with a single pane of view. So all the configurations are done in a single place that gets replicated to all the managed clusters. So it's highly scalable, basically, due to the pull-based model that it uses. So the admin server just acts as a storage layer. And all the managed clusters does most of the effort in pulling those configurations towards the managed cluster. So we have a CLI interface, which is called ClusterADM. That eases the management and creation of the resources. So this is what is briefly about the OCM. This is the architecture of it. So we have a central hub cluster, which acts as the control plane of all the Kubernetes clusters that are going to be connected to it. And then we have multiple managed clusters. Each managed clusters will have several agents that are running on it. And the hub cluster will have a placement controller and a registration controller. So the registration agent and the registration controller is the one that is involved in the initial registration of the managed cluster with the hub cluster. And the work agent picks the work from the hub cluster and executes it on the managed clusters and then sends the status back to the hub cluster. So the main advantage is you can manage the fleet and also you can have a single place where you can configure all the resources through the hub cluster. So this is the typical cluster registration process that we have. So hub cluster and the managed cluster are kind of isolated. So there can be two different administrators for them. The hub cluster will provide a token. And that token will be used by the managed cluster to do the initial registration. Then it creates a CSR certificate signing request that initiates the registration process. And the hub cluster has to accept that managed cluster in order to accept that signing request to create the managed cluster setup. Once the cluster is integrated to the hub cluster, it keeps sending the heartbeat to indicate that the services are running healthy. And whenever the certificates are kind of expired, the hub cluster will trigger and it will issue a new certificate for the managed cluster. So this is the hub and spoke model. So here you can see that the admin cluster does not have any workloads as such. It has a namespace for each cluster. And it assigns work for the managed clusters as the workload unit. Whereas the managed cluster is a typical one. It has nodes and it has pods assigned to it. It has the work agent that continuously pulls the hub cluster for any new work that is being created in the hub cluster. It will take and execute that on the managed cluster. So these are some of the concepts in the OCM. So we have a managed cluster. That's the cluster that is going to host the workloads. Hub cluster is one that is used purely for the management of the cluster purpose. And then you can group the clusters. The managed clusters can be grouped as cluster set. So you can group it on certain parameters. Like you can group your cluster set based on the environment or the cloud provider that is being used. So there are different combinations that can be used. So you can see the label selector, claim selector. And similar to Kubernetes, Tains and Tolerations, you can have Tains applied to the managed cluster as well. Placement is another custom resource type that can define what are the managed cluster that needs to be selected. And placement decision is the output. So whenever you are scheduling a workload, you create a placement and you associate a managed cluster set to it. Then the placement decision will select a subgroup of the clusters from the managed cluster. And all the workloads will be spread across those clusters. Manifest Work and Manifest Work replica set. These are the work-related items. So the workloads get spread using these two aspects. So Manifest Work replica set works on a per cluster basis. Manifest Work replica set will work on a cluster set basis. So that is the essential difference. But both are used to spread the workload across the managed cluster. So these are some snapshots that I've taken. So this is how a CRD for a managed cluster will look like. The below one is the managed cluster set. You can have different label selectors to select all the managed clusters that match a particular label. Then you have an exclusive cluster set label, which works on a label that is specific for cluster sets. So that particular label can be set only for one managed cluster set. Next we have placement. So this is used for the selection criteria. So you can select based on a label or a claim selector. So claims are properties which are assigned to the managed cluster. For example, you can set the platform. For example, if it's AWS, you can set the platform.open cluster management.io, which mainly says that what provider is that particular Kubernetes cluster is. You can also have a prioritizer policy. So predicates are like filtering criteria. The clusters get filtered based on the predicate criteria. You can also have the prioritizer policy where you don't want to filter, but you want to say the top 10 clusters that match this particular cluster criteria. So you can select based on how much CPU and how much memory is available. And you can tell the number of clusters that you need to select based on this particular criteria. So this is an example of a manifest work. So this is how the workloads get spread across the multiple clusters. So here you tell the namespace. So each managed cluster will have a particular namespace created in the hub cluster. So a manifest work will work only for a single managed cluster. So here you have to tell what kind of workload you want to deploy on your managed cluster. So this is how we have to kind of specify that using the workload manifest. So here I'm creating a deployment on the managed cluster. I can create this on a target namespace. And that particular cluster will pick this workload and deploy it. Next we have a manifest work replica set. This is an extension to the manifest work. And it's still in an alpha state. So this is a new feature that got developed. So here the advantage is you can refer a placement which can select any number of clusters. So this is spread across all the clusters that match this placement name app platform clusters. So in this example, I'm using the Agocid application to deploy a guest book application, which is a sample application. We can see an example of this in the demo as well. So now the OCM concepts are almost complete. Now I'll get into the GitOps aspect of it. So this is a standard GitOps principles that we have. We have a system that is described declaratively. And Kubernetes is a declarative system, so we can manage the artifacts and the workloads that need to run in Kubernetes. The desired state is versioned and stored in Git. So this is one of the principles that the storage should be versioned. And the desired state must be explicitly set in Git. So all the workflows that is applicable in Git, like all the approval and PR process, can be applied. And approved changes are applied automatically through an agent. And there is continuous reconciliation in GitOps, like where you have a controller running in the Kubernetes cluster, which constantly pulls the changes that are happening in the Git repository. And it constantly checks for differences and makes the required changes to keep it in sync with what is applied in Git. So a little bit about Agocity. So Agocity is a CNCF graduated project. It follows all the GitOps principles. Apart from Agocity, we have Flux CD as well. So both of these projects are kind of achieving the same GitOps principles. Agocity was created by engineers from Intude. And they have open sourced the project. Right now, Red Hat also contributes to most of the features and bug issues in Agocity. So in this demo, I'm using Agocity to store the workloads that needs to be distributed across the managed cluster. So on the left side, you can see the hub cluster, where you have different managed clusters grouped into a single managed cluster set. And then we have created a placement and a manifest replica set to spread the workloads to the clusters that are selected by a particular placement. Towards my right, this has all the managed clusters. So I have three managed clusters. And I have Agocity instances running on each of them. I have an application set, which builds an OEM-based platform as part of this demo. Applications set are a group of applications. So in this example, it has a group of Helm charts that installs a bunch of CNCF graduated projects that can be used as an app platform. And an application set controller, in turn, will create several applications. Like each one will be a separate Helm chart in itself. And the application controller, again, from Agocity, is what does the constant reconciliation. So any manual changes that you do to the managed cluster will be automatically fixed and reverted through the application controller. So let's see the recording of the demo. So here I'm creating, this is the cluster admin CLI that I'm using. So I'm creating a cluster set called App Platform. And I'm adding clusters 1 and 2 to that particular cluster set. I need to bind that cluster set to a particular namespace. This is to control who can associate a workload to that particular managed cluster. So we use our back for controlling that. And that's why we have to bind it to a particular namespace. So these are some of the commands that are available. I can list the cluster sets and what are the clusters that are selected for that particular cluster set. So the first problem is to install the ArgoCD instances on the managed cluster. So the first step is to create a work replica set to install ArgoCD. So with this, the workload for ArgoCD, ArgoCD gets installed in all the managed clusters. So this will be my first step. And the subsequent steps, I will create the application set and spread it across all the workloads. So there are some issues that I was debugging as part of the video, so I'm just skipping them. So the placement reference was wrong. So that's why it was initially not spreading the workloads. So once I corrected that, you can see that the ArgoCD pods, they are starting to initialize. So the next step is to apply the app platform, which is the application set that I initially briefed about. So that will create the workloads to create the application set and, in turn, the applications get created using ArgoCD. So here you can see all the applications that the ArgoCD instance has created. So it is initially in out-of-sync state, and it shows the health status. So ArgoCD has two concepts. One is the application sync, and another is the health check. So out-of-sync means that the required resources are not created yet. So it will keep on trying, retrying in a loop to bring it to sync. So it will create the required Kubernetes resources to bring it to sync. And the health check is for denoting the health of the deployed resources. So all of that is available in a single pane of glass. Like you can see all the components that are getting installed, and whether it's in sync and whether it's in a healthy state. So that's all I have for the demo. So I have one more video, but I'm skipping it to save time. That talks about how to add managed clusters to the hub cluster. But yeah, I'll skip that for now. I'm open to questions if there are any. So all the code that I have demoed here is available in my GitHub account. It's a public repository. So if you want to have a look at it, you can take a note of it. Hi. So my doubt is, let's say I have managed clusters in three different cloud providers, GKS, EKS, and everything. I am controlling it with OCM. So can I create a deployment file, so such as that where each pod will be placed on each cloud provider? Correct. So yeah, if I'm understanding your question, you have a deployment, and you want to spread the pods across different. That's not possible. You can have deployments on each of it, but probably you can control the replica of each of it based on if you want one replica in each environment, you can control that part of it. Because the deployment controller is running inside the specific cluster, right? So OCM doesn't understand what's running within the cluster. Yeah. Hi, thanks for your presentation. So initial slide, I could see you have connected two different clusters with one load balancer. So can you please explain how the connectivity will be? Because as far as my understanding, how the one load balancer will connect to another load balancer? Yeah, basically this is kind of a global load balancer concept wherein you can take a client request and based on the geographical area, you can redirect it to another IP address. So this is based on that concept. OK, if I have a multiple application in the, I have a multiple deployment in the cluster one, and I have a multiple deployment in the cluster tool, how does load balancer will route it to the correct application? You want to route it based on the endpoints? Yeah, since I have a multiple deployment ready, each application I pointed to, through ingress, I have pointed to on ALB, imagine in such a way. How this will route my packets when I have a multi load balancer concept? So is it about this particular example? Yeah. So in this, this is like all the workloads will be kind of replicated, and the idea here is to reduce the latency. So for example, a customer accessing your particular application from US region might require to be connecting to the service that is running in US, right? He doesn't want to connect to a cluster that is running in India. So based on the particular region, we can route it, route the request to the cluster running in US. So that latency will be reduced. But the workload will be kind of replicated in both the clusters. So it's to serve different customers. OK, thank you. Yeah, I think in the interest of time, we'll have the Q&A offline. You can reach out to one in the meantime. Thank you. Thank you, everyone. Thanks a lot. I would like to request money to once again.