 Hello, Open Source Summit. I'm really excited to present our work, MC KETES, a Container Orchestration Platform for Geodistributed Multiclusters Environments. My name is Molugeta, and I am a PhD candidate at the Foguru Project and the University of Wuhan in France and a Cloud Native Engineer at Elasticis AB in Sweden. The Foguru Project is a PhD program that is funded by the European Union and this project involves six European organizations and hires eight PhD students. In the last three years, we have published more than 20 scientific publications. You can reach out to us or check our works at these addresses. In this talk, I will briefly discuss the evolution of cloud deployments and the challenges of resource management and application deployment in multi-cluster geodistributed environments. Next, I will discuss Kubernetes and Kubernetes Federation, which are the foundations of our work. And then I will briefly discuss the architecture and controllers of our platform, MC KETES. Finally, I will show a demonstration of MC KETES in action. In the last few years, cloud environments have become increasingly geographically distributed. As you know, the major cloud service providers now have cloud data centers in many regions across the world. And we can identify three major types of geodistributed deployments. The first one is the hybrid cloud deployment, where enterprises or companies can deploy their applications on data centers that are from their private data center or public cloud providers. The second type is the multi-cloud deployment where applications are deployed on data centers from multiple public cloud providers or multiple regions or zones of the same public cloud provider. The third one and the most recent one is the emerging for computing paradigm where applications are deployed on resources from private data centers, public clouds, as well as micro data centers that are found in many geographical regions. The idea here is to be able to deploy applications on resources that are close to end users and user traffic. So this evolution is driven mainly by the growing requirements of many of the modern applications. For example, many applications would like to be very responsive or have low latency. So the application needs to be deployed on data centers where most of the end users are located. Some applications such as IoT and video analytics require high bandwidths and reliable connectivity so that they can upload a vast amount of data to the analytics environments or computing environment. Other requirements such as high availability, disaster recovery, scalability security and compliance are also among the non-functional requirements that are driving this evolution. Even though applications are nowadays deployed on various geo-distributed environments, there are many resource management and application deployment challenges. Here we are talking about potentially hundreds or thousands of clusters. So manually managing these clusters becomes almost impossible. So we need to have automated deployment of applications and resource management. So when we do this, we have to address several challenges such as resilience from network failures, various types of placement policies, automated placement policies. The platform also needs to scale dynamically according to workloads and the platform needs to provide user traffic routing and load balancing mechanisms so that user traffic is routed from one cluster to another. So we think that to address these challenges state-of-the-art container orchestrators such as Kubernetes can be used as the building blocks because they are portable, interoperable and highly extensible. So Kubernetes has emerged at the de facto container orchestrator platform in the last few years. One of the reasons is it's a simple abstraction of the overwhelming complexity of resource management and application deployment behind the scenes. The Kubernetes architecture is pretty simple. We have a control plane that acts at the brain of the cluster and several worker nodes where our applications are actually executed. We deploy applications on Kubernetes declaratively, meaning we tell it what we want it to do rather than how we want it to be done. And Kubernetes controllers continuously monitor the cluster and adjust the actual state of the cluster to the desired state that we have indicated in our declarative manifest file or configuration file. But Kubernetes was designed having a single cluster in a single data center or local area network in mind and lacks some of the necessary mechanisms such as proximity, our replacement, proximity, our network routing and other mechanisms that are required in your distributed environments. So it's not easy to use Kubernetes to manage resources across multiple regions in your distributed environments. So the Kubernetes sync multi-cluster subprojects has come up with the Kubernetes federation. So the architecture is similar to that of Kubernetes. We have a host cluster where the controllers are deployed and manages multiple Kubernetes clusters. And Kubernetes federation provides concepts and abstractions that are necessary for managing multiple clusters. We can have manual placement where we can specify the clusters where our applications need to be deployed and Kubernetes federation will take care of it for us. Kubernetes federation also provides an automated placement policy called replica scheduling policy that fully load balances our replicas across all the member clusters or we can provide weight where we would like some of our replicas to more of our replicas to be deployed on some clusters and not others and Kubernetes federation will automatically do that for us. But we lack other automated policies for example, proximity aware placement. We lack offloading and bursting capabilities that are required in multi-cluster geo-distributed environments. So that's where MCK8S or multi-cluster Kubernetes comes in our aim is to address some of the challenges mentioned earlier with the goal of providing automated placement, offloading and bursting mechanisms. We'd also like to provide auto-scaling at three levels, multi-cluster or federation level where Kubernetes clusters are added or removed from the federation depending on the workload. We also have auto-scaling at the cluster level where worker nodes are added and removed from the Kubernetes clusters dynamically and at the application level where ports are added or removed depending on the workload or resource utilization. The last goal is to provide inter-cluster network routing and load balancing mechanism. So our architecture of MCK8S is very much similar to that of Kubernetes federation. So we have a management cluster as well as several workload clusters where applications are actually executed. MCK8S is based on other open-source projects such as Kubernetes federation, cluster API, Prometheus, Serf and Celium. We use Kubernetes federation for managing cluster membership, adding and removing of clusters to the federation. We use cluster API for transparently provisioning and removing clusters from supported cloud providers. We use Prometheus for monitoring our clusters. On the workload clusters, we also have Serf that is used for estimating inter-cluster network latency and Celium that is required for inter-cluster network routing and load balancing. So in MCK8S, we have introduced four controllers. The first one is MC, a multi-cluster scheduler. We have a multi-cluster horizontal port auto-scaler, a multi-cluster rescheduler and a cloud cluster provisioner and auto-scaler. So I will discuss the details of these controllers in the coming slides. In MCK8S, we have also introduced these new custom resources. Some of them are similar to that of Kubernetes federation. For example, the multi-cluster deployment is similar to Kubernetes federation's federated deployment. The multi-cluster jobs is similar to federated jobs and the multi-cluster service is similar to federated services. But we have also introduced new ones such as multi-cluster horizontal port auto-scaler, cloud cluster provisioner and multi-cluster rescheduler. So these custom resources are the ones that our controllers manage. On the right side, you see a definition of one of our custom resources, which is the multi-cluster deployments. It's a simple custom resource definition. So our first controller is the multi-cluster scheduler. So this controller is responsible for the lifecycle of the multi-cluster deployment, multi-cluster service and multi-cluster jobs resources, meaning it creates updates and deletes these resources as required. Our scheduler supports manual placement of deployments, the cluster affinity placement mechanism where we can deploy our applications on selected clusters. So we can specify, I would like to deploy my application on this and this and this cluster and our multi-cluster scheduler will do that for us. We also support automated resource-based and network-based policies. For example, we support West Feed policy where our applications are deployed on those clusters that have the most available resources. We also have a West Feed policy that has a bin packing algorithm that allows to use our clusters as much as possible. And we also have a network-based policy, a traffic aware policy that allows us to deploy applications on those clusters that receive the most amount of traffic. So traffic here is used as an indicator for the presence of end users. This is interesting in the case of for computing environments where we would like to deploy our applications closer to where most of our users are located. The multi-cluster scheduler also supports offloading to neighboring clusters. So the idea here is if the cluster in the cluster affinity case that we selected does not have the sufficient resources to place our application, our scheduler will offload or deploy our application on another cluster that has a sufficient resources but is closer to the selected cluster in terms of network latency. So it's here where SERF is used for estimating the inter-cluster network latency. Our multi-cluster scheduler can also burst the replicas of applications to neighboring clusters. So here what we do is that if the selected cluster cannot place all of the replicas of our application, the scheduler will offload the extra replicas to another neighboring cluster that is closer to the selected cluster in terms of network latency. Here on the right, you see a manifest file for a sample multi-cluster deployment. So as you can see, this is very much similar to a Kubernetes deployment. The only difference here is the API version and kind as well as additional fields under the spec section of the deployment. Here we specify the number of clusters or locations on which we want our application to be deployed and the placement policy that we would like to use. So in this case, traffic aware, but we can use other policies such as worst fit, best fit and so on. To allow inter-cluster network routing and load balancing, we have an integration with Celium. So in this case, Celium needs to be deployed on our workload clusters and Celium cluster mesh has to be enabled. So this multi-cluster service manifest is also similar to that of vanilla Kubernetes service. The only difference is the API version and kind. So our scheduler will automatically find the corresponding multi-cluster deployment having the same name and creates a Kubernetes service on those clusters. And by adding this Celium annotation, we can have inter-cluster network routing and load balancing capability. The other controller is the horizontal pod autoscaler, the manifest file as you can see on the right is similar to that of vanilla Kubernetes horizontal pod autoscaler, except the API version, the kind and the target reference kind should be multi-cluster deployment and we can specify the name. And based on the resource utilization that we specify here as the threshold, our horizontal pod autoscaler will adjust the number of replicas and then passes the decision to the multi-cluster scheduler, which will adjust the placement. So for example, if bursting is required, our scheduler will burst the application. The cloud cluster provisioner and autoscaler will transparently provision a cloud cluster if our fixed clusters run out of resources. And this controller periodically checks the status of the multi-cluster deployments and when it finds out that they cannot be deployed because of lack of resources, it will transparently provision the cloud provisional and then joins the new cluster to the federation. It can also scale in and scale out the worker nodes of the cloud cluster. And finally, if a cluster has been underutilized for a certain amount of time, our cloud provisional will transparently remove that cluster from the federation. So the idea here is to reduce over provisioning and save costs on cloud spending. So the implementation is based on the COP Kubernetes Operations Framework from Zalando and we use Python for the implementation as you can see the details in our GitHub repository. So now it's time for a demo. So for this demo, we have one management cluster and five member clusters in the grid, 5,000 experimental test page in France. So we have a management cluster in the HEN site here and we have five member clusters or workload clusters, one in HEN, another one in Nantes, in Lille, in Luxembourg and in Grenoble. We also have an open stack clusters that acts as a cloud cluster to which we will burst our applications when necessary in Null C. So each of our clusters have a master node and five worker nodes. So for the sake of heterogeneity, we have our nodes in the clusters one and five, each of them have four CPU cores and 16 GB of RAM, whereas the nodes in clusters two, three and four have two CPU cores and four GB of RAM. And this table shows the inter cluster network latency so between the different sides of this test page. This test page, so for this demo, we have some prerequisites. The first one is to have a management cluster and a few workload clusters that we have here. We also need Kubernetes Federation, Prometheus cluster API of the management cluster. We, our workload clusters need to have a CELIM and CELIM cluster mesh, SERV and Prometheus. We need to have the credentials for a cloud provider. So here we have OpenStack, but for that matter, we can use other supported cloud providers such as AWS and Google Cloud as well. And if we would like to have inter cluster network routing and provisioning, we need to have physical or virtual network between the different sites. So now let's go to the demo. In this demo, I'll show how to deploy the custom resources and controllers of MCKATES and how to use it to deploy some sample applications across multiple clusters. First, let's check the prerequisites. The first one is Kubernetes Federation. So to check that, I'll run the command cube, cuddle get cube fed clusters in the cube federation system, namespace on the management clusters. And we see that we have five clusters that form the federation. Next, let's check the status of CELIM cluster mesh. To do this, I will face my cube cuddle command into one of the clusters, cluster one in this case. And first check the presence of the CELIM parts. So as we can see here, we have CELIM parts running. So now I'll go into one of these parts and run the command CELIM node list. And we see that the CELIM cluster mesh has been formed. Next, I'll check the presence of the cluster API resources to check this out. Simply run the command cube cuddle get namespaces on the management cluster. And we see a few namespaces that contain the resources for cluster of API. We will use cluster API to provision a cloud cluster from OpenStack in this case, but we can use different providers for that matter. So let's check the presence of the OpenStack cluster. So I'll run the command OpenStack catalog list. And I see the details of my OpenStack cluster, but we can see that we don't have any servers at the moment. Next, I'll check SERF. So SERF is used for estimating the inter cluster latency between the clusters. And this is important for offloading from one cluster, offloading deployments from one cluster to another. So SERF members and this one of the agents. So this should be RPC. And we see that there's a SERF cluster as well. So now it's time to deploy our custom resources. We have, in this demo, we will deploy four custom resources, which are multi cluster deployment, multi cluster service, multi cluster jobs and cloud provisioners and auto scanner. So these are the custom resources that our multi cluster scheduler and cloud provisioners use later on. So first I will deploy the first three custom resources, so CRD, multi cluster deployment, multi cluster service and multi cluster jobs. Great, they are created. Next I'll deploy our multi cluster scheduler controller. So this is deployed as a normal Kubernetes deployment, as we can see in the specification file here. We would like to deploy this on the master node of the management cluster. So for that reason, we specify a node selector and the necessary tolerations. We would also need a service account and the necessary role-based access control so that this controller has the necessary privileges to do its functions. So we deploy the RBAC and the deployment. So let's check whether this part, this deployment is created. So it's being created right now. Now let's go to deploy the custom resource and for cloud, the cloud provisioner. So the custom resource definition is a very simple one as we can see here. And similarly, we also have the deployment for the controller of cloud provisioners, similar to the previous one. So we deploy these two as well. Great, so let's check now whether the parts are running. So the multi cluster scheduler is running and the part for the cloud provisioner is being created as we speak. So it looks great now. So what we can do now is we can deploy multi cluster applications. So I'll show three scenarios. So the first scenario is I will try to use our multi cluster scheduler to deploy applications on a specific cluster. And then I'll show the horizontal offloading capability. What this means is if the, we don't have sufficient resources in one cluster, our scheduler is able to offload the applications to a nearby cluster by estimating the network latency. And third, I'll show the bursting capability. So what this means is if a cluster cannot place all the replicas of a deployment, we will, our scheduler will deploy the extra replicas to a nearby cluster. And then I'll also show bursting into the cloud. So if our fixed clusters do not have sufficient resources, our cloud provisioner will create a cloud cluster on OpenStack and our deployments will be burst to that clusters. So let's check the, so one thing, first we have to create a cloud provisioner. So we have the manifest file here that contains the necessary information about the cloud cluster. In this case, OpenStack, the credentials and other important information such as the IP address for the load balancer and so on. So we create this cloud provisioner. All right, this is created great. So now let's deploy our first application. So this is a simple application that prints just hello. So we would like to deploy this application on one of our clusters, in this case, cluster two. And as if you notice the manifest file for a multi cluster deployment is very much similar to that of a normal Kubernetes deployment. The only difference is the API version and kind and there are a few additional fields under the spec section. In this case, if we want to deploy a multi cluster deployment on specific clusters, we can specify the name of clusters comma-separated following the locations field. So now let's deploy this. So this will create five replicas on clusters two. So we have created a multi cluster deployment named hello. So let's check whether this resource has been created. So we see that it's created and let's check its status. So as we can see on the status, the multi cluster deployment is created on cluster two and with five replicas, let's go through all the clusters and check the parts that are running on those clusters. So I will create the for loop. So as you can see here, we have five replicas running on clusters two. So our scheduler is able to deploy on a specific cluster. So now let's look at the offloading capability. So let's edit the manifest file for this deployment. So cluster two. So for some reason, if we want to increase the resource request for our parts, in this case, let's increase it to three cores. We know that the clusters two does not have nodes that have three cores. So our scheduler will try to find another cluster that has three or more cores and we'll try to deploy this deployment. So let's verify that. So I will apply the changes. So now let's check the status of our multi cluster deployment resource. And as you can see here on the update section of the status, we now see that the deployment is now running on cluster one and not clusters two. And we have five replicas running. So let's verify this. So as you can see, the parts are now running on cluster one. So the reason is that cluster two does not have a sufficient resources. So our multi cluster scheduler has now deployed this deployment on the nearest clusters two clusters, the original cluster two, which is cluster one. Next I'll show the bursting capability. So again, let's edit the manifest file. This time let's increase the number of replicas from five to 10. And since clusters two only has five nodes, it cannot place on these 10 replicas of the application. So our scheduler will try to deploy the extra five replicas on another cluster that has the sufficient resources and is also closer to clusters in terms of network latency. So let's apply the changes and let's check the status of our multi clusters deployment now. So in the update section, so we should see that now the deployment is running on two clusters, cluster one and cluster five. So an additional cluster five and the extra five replicas have been deployed on cluster five. So let's check this verify. So as you can see, we have 10 replicas running on two clusters, five on each cluster. So this is great. What if we want to increase the number of replicas once more? So let's make it 15. And let's see what happens now. So what happens now is we don't have sufficient resources on our five fixed clusters. So when the schedule, our multi cluster scheduler cannot deploy all these replicas, it will update the status of the multi cluster deployment with a message saying that cannot deploy then it needs to provision a cloud cluster. So that's where our cloud provisioner controller comes in. So let's check the status of the multi cluster deployment and it will create a new Kubernetes cluster or an OpenStack, join C to the Kubernetes Federation. And then once the new cluster is ready, our multi cluster scheduler will deploy the extra replicas on that new cluster. So let's apply this change. And let's check the status of our multi cluster deployment. So as you can see here, there is a status update. So there's this message that says the application could not be deployed on the fixed clusters and we need to provision a cloud cluster and there's this message to cloud. So now our cloud provisioner will create the Kubernetes cluster on OpenStack. So let's go to OpenStack and check if the machines have been created. So now first the master node for our new cluster is created. So it would take a couple of minutes until the cluster is up and running fully. Let's check once again, whether the cloud cluster has been created or OpenStack ever released. So we see that one master node and three worker nodes have been created. Let's check the status of the Federation. So we see now there's a new addition to our Kubernetes Federation named cloud one. So this is a cloud Kubernetes cluster that has just been created by our cloud provisioner. Now let's check the status of our multi cluster deployment, hello. And as you can see on the status, the deployment is now deployed on three clusters, cluster one, cluster five and cloud one. So we see that our cloud provisioner has indeed created a cloud cluster and joined it to the Federation when our scheduler realized that it did not have sufficient resources on the fixed clusters to place our deployments. Our cloud provisioner can also autoscale the worker nodes of the cloud clusters and even remove the cloud cluster when the workload has decreased and that cloud cluster is not needed anymore after a certain amount of time. Next, I'll show the automated placement capability of our multi cluster scheduler. For this, I have another deployment. So this time we'll use the best fit placement policy. So the best fit placement policy tries to deploy applications on those clusters that have been used the most. So this is a bin packing algorithm. So it tries to utilize resources as much as possible. So in this case, we are trying to deploy on two clusters that have been used the most. So let's try to deploy this. So I can see the best fit. And let's check the status of our multi cluster deployment. So as you can see, the deployment has been deployed on clusters one and five. These are the clusters among all our clusters that have been used the most and let's verify. So we can see that the parts have been created on clusters one and five. Another placement policy is the worst fit policy. So what this does is it will try to deploy applications on the clusters that have the most free resources available. Similarly, in this case, we would like to deploy on two clusters using the worst fit policy. So let's try this and let's check the status. So this time, the deployments have been created on clusters two and three. This is because these are the clusters that have the most resources available. And let's verify once more. So you can see the worst fit deployments using the worst fit policy have been created on clusters two and three. So the last thing I want to show you is how to deploy the multi cluster service and how we can access the back end, the first backend application using a front end. So for this, we first need to create a multi cluster service corresponding to our multi cluster deployment named HLO. So in this case, we don't have to specify the clusters because our scheduler will find the corresponding multi cluster deployments named HLO and it will create the corresponding security services on all the clusters that have the deployment. So let's apply this. And let's verify whether the services have been created. So as you can see the HLO service have been created on clusters one and five. We can also check on the cloud cluster whether the service have been created. So it's created. Now we need the front end deployment and service to access the application. So we have a front end multi cluster deployment that we need to deploy this front end at least on one of the clusters that contain our backend multi cluster deployment. So in this case, we'll try to deploy this on cluster one. And there's also a corresponding front end service that we will deploy on cluster one as well. So let's deploy these two. And let's check the status. So the front end service have been created on cluster one. Let's check the part of the front end deployment have been created on clusters one and two. This is because we have specified five replicas. And since our scheduler could not deploy all the five replicas on cluster one, it has created only three of the replicas and has a burst the extra two replicas to cluster two. So this is okay. Now let's try to access the applications front end using the IP address of the master node of cluster one and the node port. So as you can see, our application has responded with a message hello. So in this demo, I have shown you how you can deploy the custom resources and controllers of MC KS and how using these controllers you can deploy multi cluster applications and services. I hope you have enjoyed this demo. And if you'd like to see more demos, you can take a look at our GitHub repository. Thank you. So to conclude, if you have any questions or you'd like to contribute to this project or you'd like to collaborate with us, please reach out to us, check our website at fogguru.eu. If you are interested, you can also read our paper that was accepted at the International Conference for Computer Communication and Networks or I triple C and check out our GitHub repository. So thank you for your attention.