 Hello, everyone. I am Zhang Yongxi from China Mobile. Nice, my colleague, Zhang Haowen, and I will share with you together. The topic of our presentation is another choice for issue multiclassor and multi-network deployment model. Our presentation is divided into three parts. Issues, another choice, and validations. Since Haowen was unable to attend in person, he will share with the first part what we do. From China Mobile, mainly focusing on China Mobile Cloud, it is a cloud computing platform of China Mobile. Similar to AWS, it is a company market. 50 products using Istio for scenario releases, traffic management, and observability. In addition, we have also integrated Istio into our operations platform. We call OPS and the cloud-native platform we call CMP to provide service mesh capabilities for internal products and external clients. With the development of China Mobile Cloud, we have encountered the demand for using multiclassor, which has also brought new challenges. Due to the historic reasons, many of our products are deployed in dedicated Kubernetes clusters. The resource utilization is generally low, leading to a certain degree of resource leakage. We have adopted a multiclassor technology based on Kubernetes to centrally manage the idle resource of these clusters, which has improved the overall resource utilization of the clusters. However, it has also brought a new challenge, which is how to use service mesh to span multiple clusters. We think that Istio needs to span multiple clusters with three key requirements. The first is a need for service discovery across multiple clusters. The Istio control plane needs to obtain service and configuration information from all clusters for multiclassor service governance. The second is a need for cross-cluster service connectivity. Cross-cluster connectivity is required between the Istio control plane and data plane, as well as between data planes. The third is a need for cross-cluster service mesh authentication. In some scenarios, the Istio control plane needs to perform cross-cluster authentication for the data plane. The community provides Istio deployment models for multiclassor scenarios with the three key requirements. Let's first reveal the community's deployment models, which can be divided into two categories from a network top-logis perspective, one of which is the single network models. Another model is the multiple network models. Regardless of the network top-logis models, it can be divided into the multi-primary model and the primary remote model from the control plane top-logis perspective. How do the community solution meet these three key requirements? The first point is the ability for multiclassor service discovery. The Istio control plane retrieves authentication information of all clusters Kubernetes API servers, allowing it to access all API servers to retrieve all service and configuration information. The second point is the ability for cross-cluster service connectivity. In non-flat network scenarios, cross-cluster connectivity between the Istio control plane and the data plane, as well as between data planes, is achieved by deploying east of west gateways. The third point is the ability for cross-cluster service mesh authentication. In a primary remote scenario, the Istio control plane and the data plane perform cross-cluster authentication using service account tokens generated by the data plane cluster Kubernetes API server. The community solution effectively meets three key requirements, but also brings some problems. The first is that there are many deployment steps. The deployment of the most complex scenarios requires 10 steps. And the deployment steps will multiply with the number of clusters. The second is time consuming. On one hand, it is because there are many deployment steps that are not automated. On the other hand, in some scenarios, Istio D or the east-west gateway need to be deployed in the mesh cluster, which also requires deployment time. Generally, deploying a new cluster in the mesh takes minutes. If troubleshooting is needed, it will take more time. The third is that there are bottlenecks. In the separate network model solution, the cross-cluster service communication heavily relies on east-west gateways, which need to ensure performance and reliability. This Istio makes us consider whether there are other alternative solutions. We have explored this aspect. And now my colleague, Zhang Yongxi, will continue to share more. Thank you, everyone. OK. Thank you, my colleague. Thank you, my colleague. I will continue with the presentation from here. To tackle the challenges mentioned about, we introduced a novel Istio multi-cluster approach that differs from the community solution. This approach is based on Cosmos, an open source, all-in-one distributed cloud-native solution. This strategy results in three key requirements previously outlined. Firstly, global-unit Istio control play, which accomplishes a global service discovery via the API server of the primary cluster. Secondly, automated interconnection of continuous networks across multiple clusters facilitating direct network connections for pulse. Thirdly, the challenge of cross-cluster service mesh authentication has been addressed. Next, let's talk about how we made the first key requirement, multi-cluster service discovery. Inspired by the open source project, virtual cobalt, we have implemented the following features. Firstly, global Kubernetes resources across multiple clusters, including services, Istio CRDs hosted within the master cluster. This setup allows for the utilization of multiple clusters as if they were a single cluster. Secondly, a SAP cluster can exist in the master cluster as one of more virtual nodes. If a pod in the master cluster is scheduled to run with this virtual nodes, it will be dispatched to the corresponding SAP cluster while remaining in the master cluster as a virtual pod. How global service is implemented in a multi-cluster scenario? Services reside within the master cluster. In the master cluster, pods are virtual, and their pod IPs are synchronized through a sync mechanism to reflect their real pod IPs in the SAP cluster. Therefore, the multiple endpoints associated with the service may be supplied across different SAP clusters. Consequently, by interacting with the master cluster's API server, Istio can naturally achieve global service discovery. Next, let's talk about how to establish connectivity between container networks or across multiple clusters. To facilitate network connectivity between pods or across multiple clusters, we employed traditional tunneling technology about the same layer. Given that in Kubernetes, each node is allocated its own network segment. We only need to establish tunnels between each node and all nodes outside the cluster. This approach is feasible in scenarios where there is a network interconnectivity between nodes across multiple clusters. This approach is known as a peer-to-peer model. However, when only a few nodes between clusters are interconnected, the PDP mode for multi-cluster network connectivity becomes feasible. To tackle this problem and to accommodate larger scale multi-cluster environments, we have introduced the gateway mode. Traffic between clusters is routed through gateway nodes. To implement the previously mentioned multi-cluster network connectivity solutions, we have designed the architecture as shown in the diagram. The controlling manager operates in each SAP cluster synchronizing the network information of the local cluster to the corresponding cluster CR in the master cluster. The network manager in the master cluster raise the cluster CR of all SAP clusters to generate the respective node configure CR for each node. Agent runs on our nodes across multiple clusters. Listening to a respective node config CR from the master cluster to configure the current nodes routing table internals. Let's talk about how we made the third key requirement, cross-cluster service mesh authentication. Let's introduce the implementation of the community solution in a multi-cluster scenario. When you get away, a sidecar is initiated in remote cluster at months of volume named issue token. The cobalt on the remote cluster then requires a service token from the API server or the same remote cluster, which is subsequently mounted into the data plan container. Following this, the data plan sends a CRSR request to ECUD in the primary cluster, including the token information. ECUD verifies the token's validity with the API server or the remote cluster. Often successful validation, ECUD issues MTOS certificates and sends it back to a data plan container, thereby fulfilling the CRSR request. In our scenario, following the community's standard approach could lead to issues. Our solution isolates ECUD from the multi-cluster environment where it's located. Consequently, ECUD can only see and interact with the primary cluster's API server. We have made adjustments according to the community supply. When a data plan container starts up, Cosmos requires a service account token from the master cluster API server and distributes a obtained token to a respective subcluster. When a data plan in subcluster initiates a CRSR request with the token to ECUD in the master cluster, ECUD verifies the token's legitimacy with the master cluster API server. After successful validation, it generates MTOS certificate and returns it to a data plan container, thereby completing the CRSR request. We have validated our solution. We use the same demo as the community, especially the sleep and hardware demo for our validation process. In the setup, the hardware service, which has two versions, is deployed across two different data plan clusters. Additionally, the sleep component is deployed in both data plan clusters to send requests to hardware. This demo is used to demonstrate and verify results into scenarios across cluster traffic and across cluster traffic management. Here are the validation steps listed. We can see that the deployment of ECUD in multi-class scenario is greatly simplified. In CRSR traffic scenario, our validation results confirm that whether a request is sent to hardware service from the sleep container on class one or class two, the response from two versions of the hardware alternate. This is consistent with the validation outcomes of the community's solution. Next comes the validation of CRSR traffic management. Our tests show that whether accessing the hardware service from a sleep pod in class one or class two, the results consistently meet our expectations. Finally, I warmly invite you to join our open source project Cosmos. Moving forward, our ambitions for Cosmos include donating the project to the CNCF community to attract more external contributors and exploring the applications of end-class architecture with Cosmos. Should you have any requests or require any further information, please feel free to contact us. Thank you. All right, and if we have any questions, just go ahead and raise your hand. Oh, great. Hello? So how do you handle certificate rotation between your master cluster and sub-cruster? Sorry, what? My English is not so good. How do you handle certificate rotation between the different crystals? Yeah, how do you manage the certificate rotation between the different Kubernetes clusters? Certificate. OK, I can't express myself so well in English. So this is my email. We can talk about it online. Do you need me? Do you need my translation? I can help you find out the question. OK, OK, OK. He said that your master cluster and your slave master, the certification, the rotation, the rotation, how do you manage the rotation? OK, that's it. Then I can use Chinese, right? You can. I can try to translate it. It's like this. OK, Cosmos will use the token and send it from the main group to the sub-group. In this case, the post in the sub-group will visit the sub-group's ECOD. Then it will be able to perform a transfer. This process will be performed by Cosmos. OK, I will try to translate. The certification is populated from the master cluster to the downstream cluster by the Cosmos D. It's just a service counter token. Is it correct? It's like this. Cosmos will update the certificate from the main group to the sub-group. Then it will send the latest certificate to the sub-group. OK. So something adds up. The certificate is routed periodically by the Cosmos D inside the master cluster. Is it right? OK, OK, thank you. Hi. Hello. One thing that wasn't clear for me. Do you use the tunnel for the communication between clusters? Or do they communicate to the master cluster when they get the information to actually move? Yeah, to sync the configuration amongst all the clusters. Yeah. We use tunnels between master clusters and sub-clusters. And sub-clusters to sub-clusters. So that a pod in master cluster can communicate with a pod in the sub-cluster. And pod in sub-clusters can communicate as well. Is that right? Yeah, but I want to understand if there's any problems about brain splitting or how do you eventually get this synchronization between clusters? If there's something that is handling by your system or if it's TO that does it? Do you? Do you understand the question? Do you want her to translate? OK, so that. Let's talk about it online. OK, OK, that's my email. OK, thank you.