 Hi everyone, welcome to the session. I'm very glad to be here to talk about how we manage millions of clusters running at the edge. This is Dixie from China. I am an open source advocate and participate in quite a few open source projects. Also, I am the founder of an open source project classnet, which is exactly what I'm going to introduce today. Before getting started, I would like to briefly talk about today's agenda. First, I would briefly talk about why Kubernetes is used for edge. But most of the time, a single Kubernetes cluster cannot meet all our needs, especially for edge scenarios. Then we have to adopt Kubernetes model cluster architecture. The difficulties and challenges of running Kubernetes will multiply as we scale. In this session, I will share our solution on multicustom management. Also, the budget mark will be given on managing one million clusters running at the edge. There are also some improvements we can do in the future. Now let's get to the first topic. Kubernetes at the edge. Why edge computing matters so much? According to Gardner, only 10% of data today is being created and processed outside the traditional data centers. By 2025, the number is projected to increase to 75% due to the rapid expansion of IoT devices. More processing power will be enabled on embedded and mobile devices and more user cases and projects for edge computing will be created over the next few years. The primary benefit of edge computing is that users get a better experience in terms of reliability, reduced latency, and potentially better privacy by keeping more of the data on the device or on the local network. The next stage of cloud computing brings computing power even closer to the users, such as we can push our workloads that were previously running in data centers or clouds directly onto user devices. This will make deployments of software to remote edge locations as seamlessly as deploying to the cloud. With the help of Kubernetes, this is going to come true. There are several advantages of using Kubernetes for edge computing. Kubernetes is being ubiquitous and fundamental. As you all know, it is going under the hood similar to Linux. Kubernetes is already designed for working cross-data centers and dealing with issues that are similar to edge computing. It is a well-established platform for hosting microservices. It facilitates the cloud net approach to application deployments. Going from multi-region data centers to multi-edge locations is not a big problem. Also, with Kubernetes, we can benefit a lot from cloud net community. It is a major ecosystem. We can find many awesome cloud net products to leverage, and Kubernetes can manage anything, not only containers. Let's look at some of the most popular options when we want to deploy Kubernetes for edge computing. The first way is to deploy your full-fledged cluster at each edge location. That means Kubernetes, Control Plan, and Worknodes are deploying on edge nodes. With this way, Kubernetes is fully autonomy at the edge. The second way is to only have a single Kubernetes cluster in our data center or cloud, while all the Worknodes are running at the edge. This helps eliminate the old heads of having a dedicated Control Plan at each location, but it may not be feasible if there is a significant latency, connectivity, or lack of a sufficient band-wise for-cluster internal services or operations between the Kubernetes Control Plan and the Worknodes locations functionally correctly. The second way is to only have a single Kubernetes cluster in our data centers or clouds, while we keep our Worknodes running at the edge. This would help eliminate the overhead of having a dedicated Control Plan at each location, but it may not be feasible if there is a significant latency, or lack of a sufficient band-wise for-cluster internal services or connectivity, operations between Kubernetes Control Plan and the Worknodes may not work correctly. When we use Option 1 to run our full-fledged Kubernetes at the edge, that means we need to manage multiple clusters running at the edge, and even when we use Option 2, we still have the need to manage multiple clusters, because we cannot manage all the edge nodes into one single cluster. In this session, we are focusing on Option 1. There are quite a few large-body Kubernetes distros out there for us to choose, like K3S. This will help us address the wrong Kubernetes at the edge. Now, let's talk about why multi-clusters and the challenge for that. Of course, with multiple Kubernetes clusters, comma, increase in complexity, so why would we still want to choose multi-cluster? Be simple, we do have business needs, such as isolation. While isolation not only means fault data, but for Worknodes, developers, teams, arcs, etc. With multi-cluster, you can replicate your applications across different dance centers in different regions, increasing availability. Don't put all your eggs in one basket. Multi-cluster also helps us with scalability. The load balancer will send traffic to a particular cluster based on the URL or the type of requests. I think as the cluster is getting to many hits, the cluster will be scaled out to handle the load. That helps us meet our diverse performance needs by intelligent utilization of our resources. Monitoring security and feedback policies in a large single cluster environment is extremely difficult, as you all know. With multi-cluster, we can have tighter security checks per cluster basis. Multi-cluster also allows us to meet regulatory and compliance needs. One example is GDPR, where the data of European customers must fit clearly inside the EU region. You can have one cluster for users inside the EU region, while for other global customers, you can have one cluster in the global region. With a multi-cluster solution, you can easily target specific clusters to meet different compliance needs. Multi-cluster is another strategy, which can avoid getting winders logged in. So adopting a multi-cluster architecture is another better choice. And Kubernetes itself do have some limitations. We cannot run all the workloads in a single cluster. This is impractical. Maintaining a very large cluster is painful, especially when you want to upgrade the cluster or backup its data. And when the number of posts and services grows in the cluster, the performance and latency of the whole cluster gets affected as well. Setup and measurements of multi-cluster Kubernetes are not easy. For multi-cluster, this complexity is increased. You have to set up the proper configuration of all the clusters separately. Contrary to a single cluster setup, where there is only one APS server, more APS server exists in the case of multi-cluster. You must set up and manage access to these APS servers for all the clusters. It is extremely complex to maintain multi-clusters and how they work together as one unit, such as custom measurements and verification deployments. Inter-cluster communication is another area that needs to be handled in the case of multi-cluster Kubernetes. You have to manage your cluster's IP, routing routes, and DNS settings very carefully. Network becomes a bigger challenge because you need to be able to handle connectivity downtime and deal with the syncing data between your control plane and edge locations, while the edge Kubernetes challenge is not about distribution, but managing at scale. So we build an open-source project classnet to help us better manage Kubernetes multi-clusters. The goal of the cluster is to have many millions of Kubernetes clusters. The name cluster comes from a combination word of cluster and the Internet. Just as the name says, the goals of the cluster is to try to have managed your clusters as easily as visiting the Internet. No matter the clusters are running on public cloud, private cloud, hybrid cloud, or as the edge, the cluster lets you manage or visit them as if they were running locally. This also helped eliminate the need to jungle different environments. Also, it can help deploy or coordinate applications to multiple clusters from a single set of APRs in the hosting cluster. We started the project since March 2021. Now we have released the 11th version. You can check it out with this GitHub URL. Cluster is multiple platforms supported as well. Actually, before building this project, we had searched all of the cloud-netting community and tried to find a project that can meet our needs. But unfortunately, we did not find one, so I found this project, classnet. Classnet consists of three components, classnet agent, classnet scheduler, and classnet hub. Classnet agent runs at each child-communicated cluster. It automatically regents current cluster to a parent cluster as a child cluster and reports hard bits of current cluster, including Kubernetes version, running platform, healthy, ready-to-live status, etc. It can also help set up a WebSocket connection with classnet hub in the parent cluster to provide full-duplex communication channels over a single TCP connection. This is quite useful, I think. When we want to manage an edge cluster without any public IP address, this will have better help improve the security without exposing the KubeAVS server address publicly. While this WebSocket connection is optional, we can choose to use it or not. Classnet hub runs as a parent cluster. It runs as aggregated APS server that maintains multiple active WebSocket connections with child clusters. But here, we don't need to use any storage like empty city. Since we don't need to store anything, it will approve reduced train requests from child clusters and create inclusive sets of resources for each child cluster, such as namespaces, service accounts, and RBID rules. It proxies our Kubernetes-styled APS to target its child cluster and allows to manage all the child clusters with KubeConfig. Classnet scheduler is pretty simple. It is responsible for coordinating applications to match the clusters based on scheduler strategy. From the architecture, we can see classnet is designed as an add-on. It can directly deploy to your existing cluster and make it become a multi-cluster control plan. It will not affect any existing ports, workloads, services that are running after the air. It provides full-fledged cluster management to child clusters. You can visit any of them with Qwiz conventional KubeCuttle way using KubeConfig on a client-goal library. Also, we provide a KubeCuttle plugin for easy use. Let's see some highlights features of classnet. Classnet provides universal managing control over its genius clusters. No matter where a cluster is running, AWS, Google Cloud, Edge, etc., it can manage them all. Unless the cluster is using certificate committee distros, it also helps deploy and coordinate applications to multiple clusters from a single set of APS in the hosting cluster. All resources are supported, including combining native objects like deployments, deficit, config maps, secrets, etc., and CLEs are supported as well, and ham charts as well. With various scheduler strategies, we can have our applications running independently in multiple clusters, also getting them into several clusters. Classnet supports both push and pull modes for clusters. All means there will be a controller on the agent running in the child cluster that will help reconcile parent objects to current cluster. While push means controlling our agent running in the parent cluster, AWS will push all the changes down to child clusters. Both OZO modes are okay. We can choose the right working mode in our Edge scenarios. Let's do a benchmark on classnet to see the scale of managing Edge clusters. Before that, let's talk about how to construct 1 million clusters. If we only want a couple clusters, it is not a big catch to use real clusters. But if the number goes to 1 million, things get changed. That's too expensive for 1 million real clusters, not only the cost, however, and it is time-consuming to set them up. Running community in Docker is an option which is light-weighted and easy to set up. But when the number goes to 1 million, this should not be a good choice either. So, we run classnet with goal routines. When classnet agent represents one cluster, we just need to accumulate millions of goal routines. Here are the settings of our benchmark. We have four nodes with large quota of CPU, cores, and memory. This will help reduce the maintaining effort and also increase the power of classnet. We have one node to run a Kubernetes control plan and one node for classnet only. Running classnet have with rapidly cut 30. And another two nodes run for performance workers. We always stimulate millions of cluster connections. Edge cluster can work with mode either push or pull. Using pull mode, it will be quite simple comparing to push mode. With push mode, we need to set up extra socket connections if no public IP addresses are exposed. So, we choose push mode to see how far classnet can go. This is a benchmark of 1 million clusters working in push mode. We can see when the cluster number goes to 10,000, each cluster have only need 500 megabytes. When the total number goes to 1 million, the memory reaches nearly 3 gigabytes to maintain 40,000 socket connections. And it could be a bit so we do have lots of pressure as well, between nearly 120 gigabytes. From the benchmark, we can see QBAP server is quite exhausted. From the benchmark, we can see QBAP server is quite exhausted. In the benchmark, QBAP server is running only with one replicas. When running with multiple replicas, this will be mitigated a little bit. And a better solution will be running with hierarchy architecture. That is, we can have nested parent clusters. Each parent cluster can be a child cluster of another parent cluster. This helps improve lead-ons, scalabilities, load balancing, performance, capacities, etc. So, this is going to be our future work. But that naturally supports the hierarchy architecture. But we need to do the following things. First, we need to report child cluster made data to our parent cluster, so we can have a good overview from the top. Also, we need to enable cluster auto-discovery. The new clusters are joined or all the cluster get destroyed. Parent cluster can get the notifications. Multi-cluster topology introduces primarily two classes of challenges. They require formal synchronization between cluster control plans. This has already been solved in the classnet. Second, they require formal interconnections that make services accessible in different clusters. This is another challenge we need to handle in the future. If you want to know more about the classnet, please visit our website www.classnet.io where you can find tutorials, documents, proposals out there. Also, you can check it out on GitHub. Thanks for coming. This is today's session. If you got any questions, feel free to open an issue on Classnet GitHub Rebel and feel free to ping me on Slack or send me emails. Thank you.