 Hi, everyone. Welcome to STOCon 2023. Thank you for joining my session. My name is Abdul Basit. I will be talking about STO multi-cluster networking mode and how to use these modes to interconnect multiple clusters together. I work as a product architect in Rakutan Symphony, Singapore. I help our customers in their STO adoption journey, and that includes multi-cluster STO as well. One of the business units of Rakutan Symphony is a SimCloud business unit. And we have three major products in this business unit. Cloud-native platform, which is a career-grade Kubernetes distribution. Cloud-native storage, which is an enterprise-grade, application-aware Kubernetes-native storage. In orchestrator, which is a multi-domain, multi-site service delivery orchestrator that can deliver complex tasks, including provisioning and bootstrapping Kubernetes across multiple sites and data centers. If we look at the landscape of multi-cluster Kubernetes today, we can see two major challenges. Discovering services across these multiple clusters and the fundamental network connectivity between workloads running on these multiple clusters. Our focus will be solving the complexity of inter-cluster networking connectivity using STO. The agenda of the talk is an overview, deep dive into single and multi-mode of networking in STO, some of the key real-world considerations, and a demo. As the name implies, multi-cluster networking refers to the process of making multiple cluster, communicating each other so that they can share resources and allow much more distributed application architecture. Some of the key benefits and reasons our customers adopt multi-cluster STO is around improved availability and performance. For example, deploying in instances of application running in multiple data centers and make them available to each other. Cost saving, security, and compliance using advanced traffic routing, for example, locality-based load balancing that is supported in STO multi-cluster today. Networking is a key component of multi-cluster Kubernetes. As it enables service discovery, load balancing, routing, and traffic management across multiple clusters. STO multi-cluster network modes is defined based on the topology and connectivity of the underlying Kubernetes cluster. Singer or flat network refers in which all the workloads can reach each other directly using its IP addresses. And multiple or different networks refers in which the workloads are not directly reachable to each other but are accessible using one or more STO gateways. STO is not a CNI. So it's up to us how to design and pick a right networking topology that can solve our problems of multi-cluster networking today. So let's look into this single network. It is best for simplicity and uniform access across different clusters. Some of the ways we can achieve single network is by using a cloud provider Kubernetes distribution. So some of the cloud providers already provide a flat network by default. If we are bringing our own Kubernetes to the cloud or we are having on-premise deployments, the best way will be to use a routing protocol like BGP to create a flat network and then deploy STO on top of that. Other ways to achieve this will be something like CNI, for example, Cilium offers, Cilium cluster mesh, or VPN technologies like WireGuard to make a flat network. So if you look at the drawing in a single network, STO will assume all the workloads instances are reachable natively and program the source sidecar proxies that they can reach directly to the destination using its IP addresses. The fundamental requirement for this is that all the clusters must have a unique sideart. Otherwise, there will be conflict. In a BGP or a cloud-based IPAM mode, those public sideart are available in a routable network as well. So if you are building this kind of topology using BGP, it's best to pick a CNI that comes with a BGP daemon. And all the workloads must be accessible to each other as well. And this is important because in some cases, there is an external device like firewall or network policy that is blocking the pod sideart between clusters, and that can cause issues. The advantage of this approach is that it offers native routing and native performance. If for some reason it is not possible to do peering with a routable network, we can still achieve a flat network or a single network using cluster mesh, VPN, or some other methods. Usually these are overlay methods. Unique sideart is still must to have even for this. The advantage of this approach is that it is network independent, but it is more complex and it may involve tunneling overheads. Multiple network is best if there are overlapping endpoint IP addresses across different clusters. Or the clusters are managed by completely different themes. Another reason could be the shortage of IPv4 addresses to allocate unique pod sideart to every cluster. And in some cases, compliance that required network to be completely segregated for different reasons. So if you look at the topology of multiple networks, the pod sideart are usually private. When STO is doing service discovery, it discovers services behind an STO gateway. So the traffic from the source goes to the destination using an STO, usually East-West gateway. So the gateway must have an IP address from the routable range. Same as the previous case, if there are external devices, for example, firewall or network policies between these two different cluster, we must make sure that the required ports are open. So if we look at the journey of creating or selecting network, it's best to pick a flat network for Kubernetes cluster if we have a choice, even if we are not considering multi-cluster today. Because having flat network give us this flexibility to build flat network connected clusters. And if required, we can still build gateway connected clusters as well. But if the clusters are built isolated, especially with duplicated sideart, the only choice in the future we will be left with is to have a gateway connected clusters. Most of our customers prefer to have flat networks, unless the clusters are built with duplicate IP addresses, IPv4 constraints, especially in the service provider environment. IPv6 can help here. Our customer runs IPv6-only cluster, and Istio multi-cluster works pretty well as well. And the third reason is the security isolation, where the customer will pick the gateway connected cluster. Some of the additional factors to consider before deciding on gateway or multi-networks cluster. Limited support for headless services is because the inter-cluster connectivity is done via a gateway. And the endpoint context is lost there. Similar limitation applies for canary deployment as well. But this may change with ambient multi-cluster mode. So please stay tuned for that. Let's look into the demo. I have created three different environments with total six different clusters. Cluster 1 and 2 focus on the use case of multiple networks. In this case, I have client cluster 1 and 2 with duplicate pod sidearts. So assume if we have clusters today that are built with duplicate pod sidearts. We can connect these clusters with Istio Eastways gateway. The IP addresses, the routable IP addresses to the gateway is allocated using Metal LB. It could be different. For example, in cloud environment, this can be the cloud door balance. I picked CNM as a CNI, but this can be any CNI as well. Shear root of trust is required. And that apology is Istio multi-cluster, multi-primary on different networks. Multi-primary refer here to multiple control plate of Istio running in every cluster. There is a verify script that verifies the connectivity from a sleep pod to a hello world version 1 running in cluster 1 and a hello world version 2 running in a cluster 2. It is similar to what is provided in Istio dock as well. There is a troubleshooting script to emulate blocking port 15443 to see what happens if the firewall or external device is blocking this specific port. The code to the demo is available in the GitHub link provided in the slides. Let's look into the demo in more details. So I have a cluster 1. It has one control plane and two worker nodes. And cluster 2, it has one control plane and two worker nodes. CNM is a CNI. Let's look into the pod of Istio system. There is a Istio D running and an Istio East-West gateway in cluster 1 and in cluster 2. Let's run the verify script here. While the script is running, we can see that there is a routable IP address from the kind bridge network that is applied using Metal LBE. There is a routable IP address for cluster 2 East-West gateway. If we look at the sleep endpoint in a cluster 1, we can see that one endpoint is from the port side R of cluster 1. But a second endpoint is via the East-West gateway using port 15443 of cluster 2. Similarly, in cluster 2, we have one endpoint of the hello world is from the port side R, while another one is discovered behind the gateway, East-West gateway of cluster 1. And if we run the curl from sleep pod in cluster 1, we can see the load balancing that is happening between hello world version 1 and version 2. And similarly, from cluster 2 as well. Let's look into the pod as well. Sorry, I forgot to show you that in the beginning. So I have a hello world pod running in cluster 1 and a hello world pod running in cluster 2. And we can verify the load balancing and multi-cluster connectivity is happening correctly. Let's do the troubleshooting part and see what happens if the port is blocked or if we are creating a multi-cluster and we are facing this challenge that the connectivity is not coming up. So this is one of the potential area to look for. So I have used a network policy to block the port 15443. We can see that the service discovery is still happening correctly because the Kubernetes cluster API is accessible and we only blocked port 15443. We can see that from cluster 1 to cluster 2, we were only able to have one hello world or two hello worlds and three upstream request failure. And those one are for sorry, for upstream failure and those were all happening. The failure was were all happening to cluster 2 and we want that were all local port. But from cluster 2 to cluster 1, everything is working because we didn't block that. Second demo, in this demo, we are, I have unique port siders for cluster 1 and cluster 2. I'm using Cilium BGP with a control plane, with a BGP control plane. FRR inside a Docker bridge, the kind bridge, to emulate a VPC of all top of the REC switch. And then there is a BGP pairing between FRR and Cilium running inside cluster 3 and cluster 4. There is a shared root of trust. The multi-cluster topology is same. The multi-cluster control plane is same, which is multi-primary. But in this case, it is on the same network because we have created a flat network using BGP, in this case. And then we will use a verify script to verify the connectivity, which should be a native connectivity between cluster 1 and 2. And a trouble-shooting script to emulate what happens if something is blocking a connection between cluster 1 and cluster 2. So in this case, I have cluster 3 and cluster 4. In this case, we can see that we don't have any east-west gateway in cluster 3. Sorry about that. Similarly, no east-west gateway in cluster 4 as well. Let's look into the pod. Hello, world V1 in cluster 1. And hello, world V2 in cluster 2. Let's run the verify script here again. Let's go through the output. We can see that east-west gateway is not present in cluster 1 and cluster 2. Just for the purpose of display, I already displayed worker 3 node IP addresses that is used later for BGP pairing. Let's look into the routing table of worker, one of the worker node of cluster 3. We can see that it only knows about its local subnet that is assigned by the CNI. And it doesn't know about the pod ranges that are allocated to other node within the same cluster. And of course, it has no idea about any of the siders of other cluster. So the traffic for local clusters, that which means inter node traffic and inter cluster traffic goes to this IP address, which is the FRR IP address as a next hop. Let's look into the FRR output. We can see that there is a BGP neighbor relationship with total of six nodes, three from cluster 3 and three from cluster 4. It learns routing information from all the nodes. And this shows all the pod ranges assigned by CNI to individual node. And that is learned with its next hop correctly. If we look at the output of the sleep pod in this case, we can see that one of the pod is a local pod with 23 as a cluster 3 range from the pod sider and 24 a direct IP address from the cluster 4 pod sider range. And similarly is for cluster 4 as well. And if we run the curl, we can see the connectivity that is happening correctly with load balancing between cluster 1 and cluster 2 with the flat network using BGP. Let's run the troubleshooting part and block a complete access from cluster 3 to cluster 4. We can see that the access within a cluster 3 is working. But everything that is going to cluster 4 is failing. So just in case if you are facing this kind of issues, make sure that you have a firewall that supports open between different clusters. The final topology for multi-cluster is using a CNI to create a flat network. I'm using Cilium cluster mesh to create a flat network and then using Istio on top of it for multi-cluster networking. The pod sider must be unique in this case as well. For Cilium cluster mesh to work, we need to have a cluster mesh API server exposed and this is exposed via Metal LB. Share root of trust is required. Istio is a multi-primary control-print topology with same network. And there is a verified script to verify the connectivity. So in this case, if we look into one nodes, same number of nodes in cluster 5 and cluster, sorry about that, cluster 6. Let's look into the pods, same number of pods and similarly, let's look into the Istio system. Pretty same as the case of BGP. Let's run the verify script. So we can see that there is no espace gateway. But we can see the big difference here, the basic fundamental difference. If we look in the cluster 5 worker node, we can see that it knows about the node ranges of every node within the local cluster and also the node ranges assigned by Cilium of the second cluster as well. And routing is happening using VXLAN between different nodes and between different clusters. So we can see Istio is not aware. Either it is BGP or cluster mesh. As long as there is a flat network. The endpoints are discovered directly using the endpoint IP address from cluster 5 and cluster 6 pods IDARs. And similarly, in the cluster 6 case as well. And if you do the curve, we can see the node balancing and connectivity is working correctly. The fundamental thing here is that whatever technology that we use for bridging must be rightly configured. So in this case, we are using cluster mesh. So we need to make sure that cluster mesh is working correctly, which in our case is already working. But I do have a script to just confirm that as well. Just to make sure the cluster mesh is working. And we have the correct routing information. That's all from my side. Thank you so much. And if you have any more questions, please feel free to reach me on Istio Slack.