 Hello, CubeCon and CloudNativeCon. Welcome to my talk, MCKTIS, a Container Orchestration Platform for Geodistributed Multiclusters Environments. My name is Murugeta, and I'm a PhD candidate at the Focal Project and the University of Renoir in France and Cloud Native Engineer at Elasticis AB. So today, I would like to talk about the evolution of cloud deployments in the past few years and some of the challenges in multiclusters management in terms of resource management and application deployment. And then I'll briefly discuss Kubernetes Federation, which is a foundation of our work. And then I will briefly discuss the architecture and controllers of our platform, MCKTIS, and lastly show you a demonstration of our platform in action. So in the last few years, we have seen an increasingly geographically distributed cloud deployments. The major cloud providers now have data centers in many regions of the world. We can identify three main geodistribute deployments, namely hybrid cloud, multicloud, and fog computing in hybrid cloud. We deploy applications on data centers from private and public cloud providers. In the multicloud case, we deploy applications on multiple public clouds or data centers from multiple regions of the same public cloud provider. The last one for computing is an emerging geodistribute computing paradigm where we have resources from private, public, as well as microdata centers that are distributed in vast geographical areas in the aims to be closer to end users. Much of this evolution is driven by the increasing demands of modern applications. Some of the non-functional requirements are low latency, the desire to provide fast services to end users by placing our applications in regions where most of our users are located. High bandwidth and reliable connectivity, in the case of IoT and video analytics, for example, we like to have high bandwidth and reliable connectivity so that we can upload a vast amount of data to the analytics framework. We have other requirements, also such as high availability and disaster recovery, scalability, security, and compliance. So managing this geodistribute environment is not easy. There are a lot of challenges. We are talking in terms of hundreds and thousands of clusters, which become almost impossible to manage manually. So we need to have automated ways of deploying applications and managing resources. This means solving many challenges, some of which are resilience in terms of hardware and network failures, providing various placement policies, automated placement policies, solving out-of-scaling problems at various levels and in various degrees in order to provide a persistent performance, even though our workload and user traffic changes all the time. We also have to solve user traffic routing and load balancing issues so that we can route user traffic from one cluster to another. So we believe we try to address these problems in MCK and we believe that state-of-the-art container orchestrators such as Kubernetes serve as a basic foundation and building blocks because of their portability, interoperability, and extensible nature. We are now the first ones to address these problems. For example, the Kubernetes SIG multi-cluster group has been working on the Kubernetes Federation project for the last few years. This project provides control plan and concepts and obstructions necessary to manage multi-cluster Kubernetes. And in the same spirit as Kubernetes, the architecture has a host cluster where the controllers are deployed and much of the decision takes place and member clusters where applications are deployed. Kubernetes Federation provides a manual placement policy where you can specify your desired clusters and the Kubernetes Federation controllers will deploy your applications on those clusters. It also provides an automated mechanism called replica scheduling preference, which provides fully load-balanced or weight-based placement across your clusters. But we believe that in many of the geo-distributing environments, such as for computing, you need more automated policies, such as resource-based policies in order to utilize the clusters fully or locality-aware and proximity-aware placement policies so that we can deploy our applications on the clusters where most of our users are located. We would also need auto-scaling, resource provisioning, and network routing policies. So that's where our project MCK is, which stands for multi-cluster Kubernetes comes in. So our aim is to address some of the challenges we mentioned earlier with the goal to provide automated placement of loading and bursting policies, as well as auto-scaling at three levels. First, at the multi-cluster or Federation level so that our platform can add or remove Kubernetes clusters to the Federation based on the amount of workload or user traffic at the cluster level where worker nodes are dynamically added or removed in response to changing workloads and at the level of pods or applications where the replicas of deployments are adjusted in accordance to the very source utilization. We also aim to provide inter-cluster network routing and load balancing. The architecture is pretty much similar to that of Kubernetes Federation. In our case, we have a management cluster that has our controllers, as well as workload clusters where our applications are deployed. In MCK, it is we rely on other open source software, such as Kubernetes Federation, which we use for managing membership to the Federation, adding and removing of clusters to the Federation. We use cluster API for transparent provisioning and removal of Kubernetes clusters on supported cloud providers. We use Prometheus for monitoring our workload clusters for resource use and network use. We also have SERF and CDM on our workload cluster. SERF is used for measuring inter-cluster network latency, which we rely on when making, offloading and bursting decisions. Celium is used for inter-clusters, network routing and load balancing. So on top of this, we introduce four new controllers, namely multi-cluster scheduler, multi-cluster horizontal polar scalars, reschedulers and cloud cluster provisioners and auto-scaler. I will discuss the details of these controllers in the coming slides. So in MCK, we have introduced some six custom resources. The first three are similar to that of Kubernetes Federation. So multi-cluster deployment is similar to federated deployment. Multi-cluster jobs is similar to federated jobs. Multi-cluster service is similar to federated service, but we have also introduced multi-cluster horizontal polar scalars, cloud cluster provisioner and multi-cluster scheduler. So these are the resources that our controllers deploy, update and remove. On the right, you can see the custom resource definition for one of our resources, the multi-cluster deployments. It's a simple definition. The first controller is a multi-cluster scheduler. So this one is responsible for creation, deletion and updating of three of our custom resources, namely multi-cluster deployment, multi-cluster service and multi-cluster jobs. We provide manual and automated policy-based placement capabilities. The first one is a manual cluster affinity capability where you can deploy your applications on selected cluster similar to given-it-is federations. We can specify, I want to deploy on this and this and this cluster, and then our scheduler takes care of that. We also have automated resource-based and network-traffic-aware policies. For example, we have a worst-fit policy that deploys applications on the clusters that have the most resources available. We also have the best-fit policies that users have been packing algorithm to deploy applications on those clusters that have been used the most. We have the traffic-aware policy, which comes in handy in the case of work computing, that deploys applications on those clusters that are receiving the most traffic. So network traffic is used here as an indicator for the presence of most users. Our scheduler also provides horizontal off-loading capability to neighboring clusters. What this means is if your selected clusters do not have sufficient resources for deploying your applications, our scheduler will deploy or offload the application to another cluster that have sufficient resources but is closer to the selected clusters in terms of network latency. We also provide bursting capability where if a cluster cannot place all the replicas of a deployment, the extra replicas are deployed on neighboring clusters that have sufficient resources. On the right side, we can see the manifest file for a sample multi-cluster deployment. As you can see, this manifest file is pretty much similar to that of vanilla Kubernetes deployment. The only difference is the API version kind and certain fields under the spec section. For example, here we have specified on the number of clusters or locations where we want our application deployed and the placement policy. Our multi-cluster service works in collaboration with CILIUM to provide inter-cluster routing and load balancing. The service manifest is pretty much similar to that of Kubernetes service and we add the annotation for CILIUM so that CILIUM can do the inter-cluster network routing and load balancing. For us, for this, we need to have CILIUM and CILIUM cluster mission on the workload clusters. The multi-cluster horizontal port auto-scaler also similar to Kubernetes horizontal port auto-scanner adjusts the number of replicas of our deployments across all clusters. And this decision is passed to the scheduler which makes, which adjusted the deployment or placement of our replicas. So for example, if initially we had a few replicas and then our auto-scaler decides that more replicas are required and our cluster cannot place all those replicas, our scheduler can't best the extra replicas to other clusters. The cloud cluster provisioner and auto-scaler periodically checks the status of our multi-cluster deployments and transparently provisions Kubernetes cluster on a supported cloud provisioner via cluster API and then joins these clusters to the federation. This scales the number of worker nodes as necessary and finally removes the cloud cluster if it's not needed anymore after a certain amount of time. The manifest file is shown at the right. Here we have to specify the credentials for our cloud provider and additional information such as the IP of the load balancer and so on. The implementation of MCKTIS was done on using COPF which is Kubernetes operator's framework from Zalando based on Python. Now let's go to a short demonstration of MCKTIS in action. In this demo, we have one management cluster and five workload clusters in the Grid 5000 experimentation. Test-made, we have a management cluster in REN and five workload clusters in REN, Nont, Lil, Luxembourg and Ghanon. We also have an open stack cluster in Nulsi which acts as the cloud cluster on which we will transparently provision the Kubernetes cluster when necessary. Each of our five clusters have a master node and five worker nodes for the sake of heterogeneity. Clusters one and five, their nodes have four CPU cores and 16 GB of RAM each and the nodes of clusters two, three and four have two CPU and four GB of RAM each. The table shows the intercluster network latency between these clusters. The prerequisites for this demo are a management cluster and few workload clusters as shown here. We also need Kubernetes Federation and Prometheus cluster API on our management cluster and our cloud clusters need to have Selium and Selium cluster mesh serve as well as Prometheus. We also need the credentials for a cloud provider. In our case, we have used open stack but for that matter, we can use other cloud providers such as AWS, Google Cloud and so on. And if we like to have intercluster network latency and load balancing, we need physical or virtual networks such as VPN across our clusters. So now let's go to the demo. In this demo, I'll show how to deploy the custom resources and controllers of MCKATES and how to use it to deploy some sample applications across multiple clusters. First, let's check the prerequisites. The first one is Kubernetes Federation. So to check that, I'll run the command cube, cuddle, get cube fed clusters in the cube federation system, namespace on the management clusters. And we see that we have five clusters that form the federation. Next, let's check the status of Selium cluster mesh. To do this, I will face my cube cuddle command into one of the clusters, cluster one in this case. And first check the presence of the Selium parts. So as we can see here, we have Selium parts running. So now I'll go into one of these parts and run the command Selium node list. And we see that the Selium cluster mesh has been formed. Next, I'll check the presence of the cluster API resources to check this out. Simply run the command cube cuddle, get namespaces on the management cluster. And we see a few namespaces that contain the resources for cluster API. We will use cluster API to provision a cloud cluster from OpenStack in this case, but we can use different providers for that matter. So let's check the presence of the OpenStack cluster. So I'll run the command OpenStack catalog list. And I see the details of my OpenStack clusters, but we can see that we don't have any servers at the moment. Next, I'll check SERF. So SERF is used for estimating the inter cluster latency between the clusters. And this is important for offloading from one cluster, offloading deployments from one cluster to another. So SERF members and this one of the agents. So this should be RPC. And we see that there's a SERF cluster as well. So now it's time to deploy our custom resources. We have, in this demo, we will deploy four custom resources which are multi cluster deployment, multi cluster service, multi cluster jobs, and cloud provisioners and auto scanner. So these are the custom resources that our multi cluster scheduler and cloud provisioners use later on. So first I will deploy the first three custom resources. So CRD, multi cluster deployment, multi cluster service and multi cluster jobs. Great, they are created. Next I'll deploy our multi cluster scheduler controller. So this is deployed as a normal Kubernetes deployment as we can see in the specification file here. We would like to deploy this on the master node of the management cluster. So for that reason, we specify a node selector and the necessary tolerations. We would also need a service account and the necessary role-based access control so that this controller has the necessary privileges to do its functions. So we deploy the RBAC and the deployment. So let's check whether this part, this deployment is created. So it's being created right now. Now let's go to deploy the custom resource for cloud, the cloud provisioner. So the custom resource definition is a very simple one as we can see here. And similarly, we also have the deployment for the controller of cloud provisioner similar to the previous one. So we deploy these two as well. Great, so let's check now whether the parts are running. So the multi cluster scheduler is running and the part for the cloud provisioner is being created as we speak. So it looks great now. So what we can do now is we can deploy multi cluster applications. So I'll show three scenarios. So the first scenario is I will try to use our multi cluster scheduler to deploy applications on a specific cluster. And then I'll show the horizontal offloading capability. What this means is if the, we don't have sufficient resources in one cluster, our scheduler is able to offload the applications to a nearby cluster by estimating the network latency. And third, I'll show the bursting capability. So what this means is if a cluster cannot place all the replicas of a deployment, we will, our scheduler will deploy the extra replicas to a nearby cluster. And then I'll also show bursting into the cloud. So if our fixed clusters do not have sufficient resources, our cloud provisioner will create a cloud cluster on OpenStack and our deployments will be burst to that cluster. So let's check the, so one thing first, we have to create a cloud provisioner. So we have the manifest file here that contains the necessary information about the cloud cluster. In this case, OpenStack, the credentials and other important information such as the IP address for the load balancer and so on. So we create this cloud provisioner. All right, this is created great. So now let's deploy our first application. So this is a simple application that prints just hello. So we would like to deploy this application on one of our clusters, in this case, cluster two. And as if you notice the manifest file for a multi cluster deployment is very much similar to that of a normal Kubernetes deployment. The only difference is the API version and kind. And there are a few additional fields under the spec section. In this case, if we want to deploy a multi cluster deployment on specific clusters, we can specify the name of clusters comma separated following the locations field. So now let's deploy this. So this will create five replicas on clusters two. So we have created a multi cluster deployment named hello. So let's check whether this resource has been created. So we see that it's created and let's check its status. So as we can see on the status, the multi cluster deployment is created on cluster two and with five replicas, let's go to, let's go through all the clusters and check the parts that are running on those clusters. So I will create the for loop. So as you can see here, we have five replicas running on clusters two. So our scheduler is able to deploy on a specific cluster. So now let's look at the offloading capability. So let's edit the manifest file for this deployment. So cluster two. So for some reason, if we want to increase the resource request for our parts, in this case, let's increase it to three course. We know that the clusters two does not have nodes that have three course. So our scheduler will try to find another cluster that has three or more course and we'll try to deploy this deployment. So let's verify that. So I will apply the changes. So now let's check the status of our multi cluster deployment resource. And as you can see here on the update section of the status, we now see that the deployment is now running on cluster one and not clusters two. And we have five replicas running. So let's verify this. So as you can see the deployments, the parts are now running on cluster one. So the reason is that clusters two does not have a sufficient resources. So our multi cluster scheduler has now deployed this deployment on the nearest clusters two clusters, the original cluster two, which is cluster one. Next I'll show the bursting capability. So again, let's edit the manifest file. This time, let's increase the number of replicas from five to 10. And since clusters two only has five nodes, it cannot place all these 10 replicas of the application. So our scheduler will try to deploy the extra five replicas on another cluster so that has the sufficient resources and is also closer to clusters in terms of network latency. So let's apply the changes and let's check the status of our multi clusters deployment now. So in the update section, so we should see, yeah, we should see that now the deployment is running on two clusters, cluster one and cluster five. So an additional cluster five and the extra five replicas have been deployed on cluster five. So let's check this very five. So as you can see, we have 10 replicas running on two clusters, five on each cluster. So this is great. What if we want to increase the number of replicas once more? So let's make it 15. And let's see what happens now. So what happens now is we don't have sufficient resources on our five fixed clusters. So when the scheduler, our multi cluster scheduler cannot deploy all these replicas, it will update the status of the multi cluster deployment with a message saying that kind of deployed and it needs to provision a cloud cluster. So that's where our cloud provisioner controller comes in. So it will check the status of the multi cluster deployment and it will create a new Kubernetes cluster or an OpenStack, join C to the Kubernetes Federation. And then once the new cluster is ready, our multi cluster scheduler will deploy the extra replicas on that new cluster. So let's apply this change. And let's check the status of our multi cluster deployment. So as you can see here, there is a status update. So there's this message that says the application could not be deployed on the fixed clusters and we need to provision a cloud cluster and there's this message to cloud. So now our cloud provisioner will create the Kubernetes cluster on OpenStack. So let's go to OpenStack and check if the machines have been created. So now first the master node for our new cluster is created. So it would take a couple of minutes until the cluster is up and running fully. Let's check once again whether the cloud cluster has been created or OpenStack ever released. So we see that one master node and three worker nodes have been created. Let's check the status of the Federation. So we see now there's a new addition to our Kubernetes Federation named cloud one. So this is a cloud Kubernetes cluster that has just been created by our cloud provisioner. Now let's check the status of our multi cluster deployment. Hello. And as you can see on the status, the deployment is now deployed on three clusters, cluster one, cluster five and cloud one. So we see that our cloud provisioner has indeed created a cloud cluster and joined it to the Federation when our scheduler realized that it did not have sufficient resources on the fixed clusters to place our deployments. Our cloud provisioner can also autoscale the worker nodes of the cloud clusters and even remove the cloud cluster when the workload has decreased and that cloud cluster is not needed anymore after a certain amount of time. Next, I'll show the automated placement capability of our multi cluster scheduler. For this, I have another deployment. So this time we'll use the best fit placement policy. So the best fit placement policy tries to deploy applications on those clusters that have been used the most. So this is a been packing algorithm. So it tries to utilize resources as much as possible. So in this case, we are trying to deploy on two clusters that have been used the most. So let's try to deploy this. So I can see the best fit. And let's check the status of our multi cluster deployment. So as you can see, the deployment has been deployed on clusters one and five. These are the clusters among all our clusters that have been used the most and let's verify. So we can see that the parts have been created on clusters one and five. Another placement policy is the worst fit policy. So what this does is it will try to deploy applications on the cluster stuff that have the most free resources available. Similarly, in this case, we would like to deploy on two clusters using the worst fit policy. So let's try this and let's check the status. So this time, the deployments have been created on clusters two and three. This is because these are the clusters that have the most resources available. And let's verify once more. So as you can see the worst fit, the deployments using the worst fit policy have been created on clusters two and three. So the last thing I want to show you is how to deploy the multicluster service controllers and how we can access the back end, the first backend application using a front end. So for this, we first need to create a multicluster service corresponding to our multicluster deployment named Halo. So in this case, we don't have to specify the clusters because our scheduler will find the corresponding multicluster deployments named Halo and it will create the corresponding multicluster services on all the clusters that have the deployment. So let's apply this. And let's verify whether these services have been created. So as you can see the Halo service have been created on clusters one and five. We can also check on the cloud cluster whether the service have been created. So it's created. Now we need the front end deployment and service to access the application. So we have a front end multicluster deployment that we need to deploy this front end at least on one of the clusters that contain our backend multicluster deployments. So in this case, we'll try to deploy this on cluster one. And there's also a corresponding front end service that we will deploy on cluster one as well. So let's deploy these two and let's check the status. So the front end service have been created on cluster one. Let's check the part. The front end deployment have been created on clusters one and two. This is because we have specified five replicas and since our scheduler could not deploy all the five replicas on cluster one, it has created only three of the replicas and has burst the extra two replicas to cluster two. So this is okay. Now let's try to access the applications front end using the IP address of the master node of cluster one and the node port. So as you can see, our application has responded with a message hello. So in this demo, I have shown you how you can deploy the custom resources and controllers of MCKATS and how using these controllers you can deploy multicluster applications and services. I hope you have enjoyed this demo and if you'd like to see more demos, you can take a look at our GitHub repository. Thank you. To conclude, if you are interested in learning more about this project and contribute, please visit our project's website for Google.eu. You can also read our paper which was accepted at the CERTES International Conference on Computer Communications and Networks. The link is shown here. You may also take a look at our source codes on GitHub and look at more examples as well. Thank you very much.