 Hello, how are you? My name is Zhong Huxu. Today I will introduce you multi-plastic service mesh with Easter. Before we get started, let me first introduce myself. I'm Zhong Huxu. I open source enthusiastic and focus on open source software. Since 2017, I have joined the Easter Committee since 2018 and focus on networking since then. Now, I'm Easter co-manager and one of the top contributors. I'm also an Easter steering committee member as the author of Cloud Native Service Mesh Easter, the best-selling service mesh book in China. Let's see today's agenda. There are four parts that I will introduce today. The first part is use case of multi-cluster. In this part, I will show you why we need multi-cluster. And the second part is the change of multi-cluster. And there are several challenges here I will show you. The third part is different multi-cluster patterns. This part is the most important part of today's topic. The last part I will introduce you future evolution of multi-cluster in Easter. Let's see the first part. Why do we need multi-cluster? Multi-cluster is a strategy for deploying an application across multiple component clusters with the goal of improving availability, isolation, and scalability. Multi-cluster can be important to ensure compliance with different and conflicting regulations as individual clusters can be adopted to comply with geographic or certification-specific regulations. The speed and safety of subsequent delivery can also be increased within individual development teams deploying applications to isolated clusters and selectively exposing which services are available for testing and release. The first part, the first point is availability. Multi-clusters can span across multi-regions or even multi-vendors. Users can replicate applications in each cluster, such even when one cluster is totally down. The traffic can fail over to a healthy remote cluster and it has no influence on the system availability. The second one is performance. For Internet users, they can access the servers nearest to them and the latency is the smallest. A strong isolation guarantees simplicity key operational processes such as cluster and application upgrades. Moreover, isolation can reduce the blast radius of a cluster outage. Organizations with a strong tendency isolation requirements can route each tenant to their individual cluster. Skillability, when Kubernetes cluster can take charge of 5,000 nodes at the most, it is not enough for 100 kilo, 100,000 replicas application. So it's not enough for their application level. So they need multi-clusters. The last one is cost. A multi-cluster strategy enables your organization to shift workloads between different Kubernetes vendors to take advantage of new capabilities and pricing offered by different vendors. Okay, let's say the second part, the challenge is for multi-clusters. There are four points I listed here. The first one is service discovery. For native Kubernetes, there is no way to do service discovery for remote clusters. We have two reviews of external service registries like paper. For example, on the picture, on the right picture, the cluster one clients cannot discover the cluster services from the other clusters. The second one is DNS resolve. The remote service domain is resolvable in local cluster because in native Kubernetes, the Kubernetes is responsible for DNS resolve. But here it has no information of the other cluster service and any points. So for a service, it is resolvable. The service is hard to access because the Kubernetes implement service access via IPT balls, which only handles local cluster service and the points. The third one is load balancer policy. The load balancing is wrong robin for Kubernetes. AV testing and canary release are hard to implement. The fourth one is lack of security. It is very dangerous to talk with each other with plain text, especially through the public network. Okay, let's say the next part. What is Istio? From the Istio official document website, we can see what Istio is. Before we start that part, let's show Kubernetes. Kubernetes is a platform for application development and operation and also provides some capability on service, discovery, and load balance. However, Istio is totally service-oriented and is a very good supplement for Kubernetes in service management. It is very friendly to both developers and operators. From the slide here, Istio provides four major functions. The first one is connect. It intelligently controls the flow of traffic and API calls between services. With it, we can do some more advanced traffic management like blue, green, green development, red, black development, and canary release. The second part is secure. Istio provides automatically TLS with each API call. By default, the service-to-service traffic is encrypted with TLS. The third part is control with Istio authorization policy, authentication policy. Istio can control the policy. The fourth part is, I think, the most important one. It provides the ability, such as access log, monitoring, and also distributed tracing. Let's say multi-cluster service mesh. Before we get started in production with multi-cluster service mesh, we should consider the following three questions. Compared with multi-clusters, which provides a black box service already, single-mesh is aware of all the clusters, services, and can do more advanced processing like location, aware routing, and fillover. The first question, the answer is, we should choose single-mesh. The second question is, single-network or different networks. Istio uses a simplified definition of network to refer to workload instances that have direct reachability. For example, by default, all workload instances in a single cluster are on the same network. Many production systems require multiple networks or subnets for isolation and higher availability. Istio supports spending or service mesh over a variety of network typologies. So the second answer for the second question is, it depends on the production environment we have. The third one is single or multiple control plans. Single plan is easier while multi-replicated control plans in each cluster may provide more availability. It can depend on our real scenery. Okay, let's see the next part, DNS resolve before 1.8. Let's see the picture. Before 1.8, the DNS resolve is via the Kubernetes and Istio called DNS. For native Kubernetes services, it is resolved by the Kubernetes and for the remote cluster services, it is resolved by a separate plug-in, Istio called DNS. The called DNS is an upstream DNS for the Kubernetes. In native Kubernetes, Istio uses virtual IP returned by the DNS lookup to load balance across the list of active endpoints for the requested services. Taking into account any configured routing rules, Istio uses other Kubernetes services or endpoints or Istio's service entry to configure its internal mapping or host name to workload IP address. To ensure that DNS lookup succeeds, you must deploy a Kubernetes service to each cluster that consumes that service. This ensures that regardless of where the request originates, they will pass DNS lookup and will be handed to Istio for proper routing. This can also be achieved with Istio service entry. Rather than Kubernetes service, however, service entry doesn't configure the Kubernetes DNS server. This means that DNS will be configured either manually or with automated tooling such as Istio called DNS plug-in. Let's see the DNS resolve after release 1.8. After being 1.8, release DNS proxy is introduced not only for multi-clusters but also for virtual machine DNS resolve. Obviously, we need additional components Istio called DNS. It is not very friendly. It requires service entries created. Istio extends a new kind of XDS. The full name is name table discovery service to facilitate the DNS resolve. DNS is used by the DNS proxy to fetch DNS name tables from Istio control plant. And then it builds DNS lookup table to serve as a DNS resolve from local application service. The name table contains all the services across clusters. For Kubernetes service, the address is class IP. For the other service entry defined service, the address is auto-allocated. This address is a kind of cluster-e address. Okay, let's see the picture. This is the workflow of the DNS resolve. Firstly, the DNS request is forwarded to the DNS proxy. The DNS proxy returns the DNS response if it caches the DNS name table. Otherwise, it will forward the DNS request to the Kubernetes. And then maybe the Kubernetes can forward the DNS request to external upstream DNS. Single network. That's a single network. For a single network, all cluster resizes in the same network and all calls from class 1 can talk to part from class 2 directly. Such no gateway is needed. And the cross-class communication will not increase latency. In most cases, this is not common. This is a proof. Proof is low latency for Istio S2 traffic. As a gateway, it's not needed. The cons. One is the complexity. Need an additional tool to build a flat network. The second one is security. Yeah, it is not secure as all the workloads are within a single network. The third one is no operation called service IP or ranges. That means that the cluster 1 and the cluster 2 cannot have overlapping service IP address or pod IP address. Different networks. So the mesh can spread different networks. Each cluster resizes in a network. Pod can not talk directly to the other calls in another cluster. This provides better isolation. This cluster is independent. And the cross-class service access must serve Istio West Gateway. So the challenge is cross-class service communication. Actually, it requires Istio West Gateway and it works in TIS or pass-through mode. So let's see the pros and cons. Yeah, this is good for scaling of network addresses. The cons is the load balancing across multi-cluster as well as single cluster. Different network gateway. The first one is the stabilizer. Horizon is EDS. Endpoints from different networks cannot be accessed directly, must through a gateway. And the gateway is accessible from other networks. So it is hard. It is Istio, D, who has to convert the endpoint address to gateway address. Gateway works in all pass-through mode and with S&I cluster network filter applied on the listener. Envoy listens on the same port and follows different for different service to service calls by the S&I interface. With this, it requires the cross-class communication must be TIS encrypted. The S&I cluster network filter works by setting the upstream cluster name for the S&I field passed from the TIS handshake. Let's see the single network primary remote model. From the picture, we can say that this is the class one and class two. Class East and class West are the two clusters, but they are in the same network. So service A can talk to service B across clusters. And from the picture, we can say that there is only one control blank, resizing cluster West. So this is the primary remote model. The service discovery is done by the Istio D. Istio D list watches the services endpoints from all the clusters within the meshes. The configure discovery, it is also the Istio D who is to watch customer resource like the service destination rules from private primary from only primary cluster. This is how the single network primary remote pattern works. This is the single network multi primary model. In this model, the Istio control plan is deployed into every cluster. The service discovery looks like the same as the single network. Istio D in each cluster list watches service endpoints from all the clusters. For the configure discovery, Istio D in each cluster list watches only local cluster customer resource like the service destination rule. Sidecars, that means the local cluster sidecars can only connect to the local Istio D. The local cluster Istio D is responsible for pushing excess to the sidecars. This pattern provides more availability and resilience because when one cluster fails or when one cluster wins single control plan fails, the other clusters can work as well. Let's see the gateway for the different networks. How can Istio discovery discover the east-west gateway and its address to do auto split rising EDS? This is by the NMSWIS labels. The NMSWIS labels topology.istio.io network tells Istio the local cluster is in network one. And the service labels topology.istio.io slash network tells Istio D the service is network one. It is the east-west gateway service. So we can get the service ingress IP or external IP address to get the east-west gateway address for network one. So the Istio D can convert the endpoints for network one with the gateway address or what can be improved. Yeah, multi cluster looks very well for and it is already can be used in production. But there are several companies we have to face to. The first one is the better load balancing across cluster from the right side picture. We can see that when cluster one service want to talk to cluster two services and the east-west gateway as a TCP proxy. Actually it is a TLS proxy and the session sticky between the client service and the server service is not applicable because the east-west gateway works around robbing load balancing policy. So there is no session sticky between source and desktop. And this is the Istio in GitHub. We can see more details there. The second one is the headless service does not work well. That means when we want to access a headless service to like the right picture, if we want to access from the cluster one to a remote headless service that is deployed in cluster two. It is not possible now because that is the headless service instance that is resolved to the pod or pod address. The cluster type is using the original desktop. So if we access the pod IP address from the local cluster, cluster one here, the traffic is plain text and it cannot be processed, procced by the gateway. The third one is the cluster client must send TLS right away. So if we want to make the headless service work, firstly we should make the east-west gateway worker for plain text, not only for TLS-increased traffic. The last one is a network or cluster aware load balancing because for multi-networks, multi-cluster model, the traffic is flooded by the east-west gateway. So in order to reduce, so this increase is service-to-service latency because the traffic is through additional east-west gateway hope. So we should increase, improve the performance by supporting network or cluster aware load balancing. So with network aware load balancing, the traffic from local cluster will fall into the local cluster services firstly. And then when the local cluster services fall down, the traffic will fall over to the remote cluster or remote network clusters. Okay, thank you. This is all the all today's topic. Thank you, Camerain. Thank you for your time. Bye.