 Hello everyone. My name is Sridhar and with me I have my colleague Ashwin. We both work for Red Hat. So today we will be talking about hybrid cloud and multi-cluster service discovery. So agenda for today is as follows. We will briefly be going over the Kubernetes networking model, the use cases that motivated us, the problem domain, technologies that we looked into and finally the solution that we actually came across and we found that it is very promising and we intend to contribute to. Some brief introduction about the Kubernetes networking model is like in Kubernetes every pod has a unique IP address and it can talk to other pods and services within the local cluster, regardless of the node on which the pod is scheduled on. Not only that, a pod IP is generally isolated to the outside world and they are well-defined mechanisms like load balancer services, ingress gateways, using which you can make your applications accessible to outside world. So as you can see, Kubernetes mainly focuses on container networking within the local cluster but it does not talk about cross-cluster network connectivity. Imagine multiple clusters, some of your clusters are on-prem and some on public clouds and this is becoming a common deployment scenario these days. In a hybrid cloud kind of environment, some of the use cases that we kind of envision include high availability like you have an application and you want your application to be deployed in multiple clusters in a hybrid cloud and in the unfortunate situation if one of your cluster goes down you still have access to your application from a different cluster in a remote region. The other use case we can think of is like deploying a service mesh kind of application into your hybrid cloud and some implementations of service mesh require pod-to-pod reachability. The next one we can think of is like stretched databases and the next one interesting one is like enabling access to your cluster IP services from remote clusters. Normally as you know, Kubernetes has different types of services, cluster IP being the default and cluster IP service is generally accessible only within your local cluster. So we are looking at some solution wherein you can access the cluster IP service even from a remote cluster so that is something that we are kind of interested in. The next use case is like you have a database on your prem and your application or your front-end on the edge closer to your customers. The next use case we can think of is like deploying some temporary applications on public clouds to make use of the special hardware that is only available on the public clouds and not on your on-prem environment. And this goes on like you know this is not an exhaustive list of all the use cases but these are some of the use cases that we had in our mind when we kind of looked at the hybrid cloud kind of use cases. Now before we talk about the problem domain, some of the goals that we had in our mind is to find a solution that is cloud agnostic, lightweight and also seen agnostic. We just discussed about some of the use cases earlier and some of them require part-to-part reachability even across the clusters. Like you have a cluster in on-prem and you have an application or a pod in your on-prem and you want this pod to be able to talk to a pod in public cloud like AWS or GCP or whatnot. So we envision that this particular problem consists of four main aspects. One is tunnel management, injecting routing rules, multi-cluster network policy and multi-cluster service discovery. Now this is a pictorial representation of the problem domain that we just spoke about. We envision that we would require some kind of orchestrator that is capable of configuring, monitoring and operating the cross cluster connections. Further to that the four main problem domains that I just spoke about can be categorized into two different categories. One is the cluster connectivity and the other one is service connectivity. So let's go back to the four important things and you know let me give you a brief introduction of as to what we think of each of these aspects. One is the tunnel management. It essentially means like when you have your clusters in multiple regions it's like setting up connections between your clusters so that the traffic can actually pass on from one cluster to other cluster. Now tunnel management at the same time could be like what type of tunnel that you intend to use could depend on the location of your clusters. So if you're talking about connecting clusters from on-prem to a public cloud you might want to go with standard space IP sector tunnel. Pretty much most of the public cloud service providers support IP based VPN tunnels. On the other hand if you're talking about connecting clusters that are in your on-prem environment then you can simply go with some kind of overlay mechanisms like VXL and tunnels or an IP IP kind of tunnels because the traffic is not going into your public domain and you want to avoid the overhead of encryption. Injecting routing rules on the other hand is about programming the necessary routing rules into your cluster nodes so that the pods when they're trying to talk to the pods on the remote cluster or the cluster IP service on the remote cluster they are necessary IP routing rules program so that the traffic is appropriately routed into your cluster. Service connectivity is like consists of multi-cluster service discovery. You all know about service discovery so what we're looking at is we want to make a cluster IP service that is available in one of the clusters accessible from a connected cluster in a remote region. Now when I talk about multi-cluster service discovery we're not talking about exposing each and every service that is created in your cluster to a remote cluster but we are looking at exposing services that are marked with some specific annotation so that only the annotated services are made available in the remote regions and the fourth important thing is about multi-cluster network policy. So as you all know network policy is like an access control list using which you can specify the kind of traffic that you want to allow or deny to your pods. We want to kind of enhance this to make it work with even an hybrid cloud kind of environment. With the goals that we had in our mind we started to explore the various solutions which will allow us to you know address these four problem domains we had. We kind of prototype some of them and to evaluate the technical feasibility of them and you know tried it out on vanilla Kubernetes as well as OpenShift and so on. Now while we were doing this some of our team members in parallel they were trying to explore some of the upstream external open source projects and we came across a couple of them like Cilium, Federation, Istio and so on. We found that Cilium is a very promising CNI. It's a full-fledged CNI which was having lots of capabilities and it was addressing most of the use cases that we kind of spoke about. However since one of the goals that we had in our mind was to have a lightweight as well as a CNI agnostic solution we had to park it. The second solution that you know we looked at was Federation. Federation is a controlling protocol that helps you to federate your services across multiple clusters. Federation currently does not set up any tunnels between your clusters but it makes use of publicly accessible load balancer based services when it wants to make one of the services accessible from a different region. Istio on the other hand is a service mesh implementation. You all know that and it has some fancy like very good layer 7 capabilities. However not every customer might be interested in deploying a service mesh kind of application for cross cluster connectivity. So somewhere in the month of April while we were evaluating all these POCs we came across an upstream open source project called Submariner from Rancher Labs. They also gave a demo at that time. So we found that Submariner was very much close to some of the problem domains that we were talking about and it also had some pretty similar goals that we had. Not only that actually the general idea that we had in mind was also very close to what Submariner was trying to implement and address. So we felt that it would be a good step for us to start working with the Submariner community and add any necessary you know missing elements to it. So now Ashwin will talk about Submariner its architecture and some of the internals of Submariner project. Thanks Raider. With that said like let us dive little deep into the Submariner and its architecture and its components. So this is how a typical Submariner deployment looks like. So in the you can see a east cluster and a west cluster and a third cluster that is called as the broker cluster. So the broker cluster now needs to be a third Kubernetes cluster but our plan is to make it to install it in one of the east of the west clusters existing clusters. Now we install Submariner using Helm. So while you are deploying you have to install it in your broker cluster first. After that you can need to get the secrets and with the certificates and you have to use that to deploy the west and east cluster so it can connect to the broker cluster. Submariner defines two CRDs namely the cluster.submariner.iu and the endpoints.submariner.iu. So the cluster.submariner.io CRD contains the cluster information, the service CIDR and the pod CIDR that are available in that cluster whereas the endpoint.submariner.io cluster contains the details about the cluster like the node IP, the gateway node IP. I will tell what that in detail later and other details like whether there is an address required to reach the gateway node. Those informations are stored in the endpoint.submariner.io. So here we require the cluster ID to be unique so that each cluster should be identifiable by a unique ID. If you had it in the earlier session like Shishidara was telling like there is a effort in Kubernetes multi cluster seek to bring in a cluster ID for every cluster that will be unique across clusters. Okay now let us go into one of the clusters like if you see the west cluster there is one node that is marked as the gateway node you can see the same in the east cluster as well. So in every cluster there will be one node that will be elected as the gateway node and this node will connect to the broker cluster and once it connects to the broker cluster it will update its local information basically it was published the both the CIDs it local information with the broker cluster and it will retrieve the information about all the other clusters to the local. Okay and by this way it learns about all the clusters that are available or like that are connected together and it will create an IPsec tunnel to the gateway node of the other clusters if you can see in the figure there is a IPsec tunnel that has been created from the gateway node in the east cluster to the gateway node in the west cluster. Okay so yeah now let us look at look at the submariner components of submariner. So submariner has mainly two components the first is the submariner engine and the second is the submariner route agent which is a demand set. So this submariner engine runs on all gateway nodes so what is the what does this like mean by all gateway nodes so in our deployment we mark some of the nodes as gateway capable nodes say if you go to the previous here say we can mark worker one or worker two as gateway capable so whichever node is marked as gateway capable submariner engine will be running on that. It performs a leader election and one of the submariner if there are multiple nodes that are marked as gateway capable node one of them will be elected as the leader and this leader node is responsible for connecting with the broker cluster and sharing its information and retrieving information about other clusters. It also interfaces with Sharon to create the IPsec tunnels with all the gateway nodes of all the other clusters that are connected. Okay now the next component is the submariner route agent so this will be running in all the clusters all the nodes in a cluster and this is aware about who is the current leader okay so in the node that is the gateway node it will sit idle it does not do anything. In all the other nodes there are not the gateway nodes in a cluster it is responsible for injecting the routing rules so that all the packets that are addressed to a different cluster will be sent to the gateway nodes and one more thing about submariner engine is like whichever in whichever node that is not a gateway node the submariner engine will be idle in that nodes okay. Now let us see how the gateway node election happens so if you see in the west cluster there is four worker nodes and if you can see from the diagram like worker 1, 3 and 4 in the west cluster and worker 2, 3 and 4 in the east cluster are marked as the label or like they will participate in the gateway node election. So submariner uses Kubernetes simple leader election and one of them will be elected as the leader. So if you see in the west cluster so it was the worker 4 that got elected as the leader and similarly in the east it is worker 3 that got elected. Now worker 4 once it is like elected as the leader it will connect to the worker cluster and it will like as I mentioned earlier it will update the information its local information and tries to retrieve the information about the other clusters. So every other gateway node tries this and learns about the all the clusters that are connected. So as you see as you can see once the election has election happened and the gateway nodes are elected IPsec tunnel will be created. So here you can see from the worker 4 in the west cluster to the worker 3 who are the leaders in their respective clusters and IPsec tunnel was created. Now as you see from the figure like every other traffic to the other clusters the traffic from worker 4 worker 2 or worker 3 to the work any other node in the east cluster has to flow through the gateway node so there is a single point of failure. So we have a high availability framework built in so this ensures that if gateway nodes in an event like if the gateway nodes goes down a new gateway node is selected and the roles are reprogrammed. So if you see in the figure assume that the worker 4 in the west cluster which was the earlier leader went down due to some reason. Now a new leader election happens so whichever nodes are remaining nodes which are labeled participates in the election. So worker 1 and worker 3 here and one of them will get elected as the leader. So here worker 1 got elected as a leader so worker 1 now updates the broker that okay there is a leader change and these are my details. So now in the work the west cluster there will be submeniere outage and component that I mentioned earlier that will also be notified about this new leader change so it reprograms it routing rules which were earlier pointing to worker 4 to the worker 1 so that the traffic to the cluster it will use worker 1. Similarly since the broker is updated about the leader change east cluster also learns about this okay there is a leader change. So what it does is it changes it or like reprograms is IP text tunnel to point it to the newly elected worker node that is the worker 1 okay. Now let us go back to the earlier problem domain that Sridhar explained. So he mentioned about two things broadly like that you can divide the problem domains into service connectivity and cluster connectivity. So what works today is the cluster connectivity part of this of the problem domain. So first thing you mentioned was like the tunnel management as we saw in the previous architecture diagram the tunnels are IPsec tunnels are created and they are created between the gateway nodes and through this IP tech tunnel you can reach all the other clusters and second part is the routing rules. So the once submeniere engine learns about other clusters it will program routing rules in the gateway nodes to reach other clusters say for this CIDR you can go to the east cluster for this CIDR it is available in the north cluster. So these routing rules will be injected by the submeniere engine. Now in the all the other nodes there are not gateway nodes as I mentioned the submeniere route agent will program the rule that could send the packet to the gateway node from the gateway node the rules that are programmed by the submeniere engine will take it to the other cluster. Sridhar will be showing it in a more detailed way. Now there are some prerequisites for submeniere that we expect for this submeniere to work properly. The first thing is like we need to know about the cluster configuration before deploying the submeniere so that each clusters know how to connect to the broker and how he could retrieve the information about the other clusters. So there should be some knowledge about the cluster configuration. Also submeniere expects the port and service CIDR to be known overlapping. What does mean is like each individual clusters like in our diagram we had east cluster and west cluster the service CIDR as well as the port CIDR needs to be currently known overlapping so that submeniere can work because submeniere just like insert rule based on the CIDR. So for this CIDR you can go to this cluster so the routing rules are inserted in that pattern. So we expect that this needs to be known overlapping. Okay now Sridhar will just go through your life of packet. Thank you Eshwin. So let me take a small example of how a packet traverses in this when you have two clusters. Now imagine you have a west cluster and an east cluster just for sake of simplicity we are trying to use like you know as minimal number of you know nodes in this diagram. So as I mentioned you have two components of submeniere running in your clusters on the worker nodes you have the submeniere route agent and on the gateway node you have the submeniere engine component. So let's assume that you know you have two ports port A which is running on worker one of a west cluster and port B which is running on worker two of your east cluster. Assuming that port A knows the IP address of port B I will quickly take you through how the packet actually traverses. Whenever port A is trying to send a traffic to port B you would have submeniere route agent running on worker one that would have programmed the necessary routing rules to forward the traffic to the local gateway node. Once it reaches the local gateway node traffic would be encapsulated into an IPsec tunnel. Assuming that these two clusters are in like you know connected through IPsec tunnel traffic gets encapsulated in the IPsec tunnel and forwarded to the remote cluster gateway node. After it reaches the remote cluster gateway node it gets decypted and it looks at the destination IP it happens to be the IP address of port B and the respective CNI of your cluster would forward the traffic accordingly to which in this particular case is the worker two which is hosting port B. Now this is how the communication happens from a port in one cluster to other cluster the same thing generally happens when a port is trying to access a cluster IP service. Now coming to the current state like we have a healthy future I mean like lots of work is spending. Submariner in its current stage is like we consider this as a pre-alpha so we don't recommend Submariner to be used in any production kind of environments so please bear that in mind. The other one is like it only supports cluster connectivity part today it doesn't support the service connectivity which includes multi cluster service discovery and network policy however we are actively working on you know getting these features you know integrated as part of Submariner. The next one is support for different types of topologies. Today when you have multiple clusters connected via Submariner there is a full mesh of IPsec tunnels that is created between the gateway nodes of all your participating clusters. We are trying to provide a mechanism wherein after you connect after you actually associate all the clusters we want to make it configurable wherein you can say that out of all the n number of clusters that you have you only want to connect some of them so that you can discover and access only certain cluster IP services out of all the clusters. The next one is support for different types of tunnels. Today with the cluster connectivity part of Submariner you can set up IPsec tunnels between your clusters so it only supports IPsec based tunnels and it generally works in most of the cases because you might be interested in setting up tunnels between an on-prem versus a public cloud kind of environments. However we are looking at adding a plugin-based architecture which in the Submariner terminology we call it as a cable engine. We want to implement a couple of cable engines that allow us to support different types of tunnels like VXLAN, IP and IP and so on. The next important thing is monitoring of the solution which is an important thing and talking to different stakeholders and people we came across that they are also looking at supporting overlapping seeders. So as I mentioned earlier one of the prerequisites today for Submariner is that you need to have the pod CIDRs and service CIDRs of your clusters to be non-overlapping. In a ground field kind of environment this could actually be a little challenging because you might have deployed your clusters with the default configuration where the pod and service CIDRs overlap. So we have a couple of ideas in our mind and maybe going forward we want to depending on the feasibility we want to support even this particular use case and last but not least like leveraging some of the public cloud services wherever possible. You can find the Submariner JIT repo in this particular slide. We meet every Tuesday at 15 hours UTC in Blue Jeans and both the rancher as well as that are working collectively to move the Submariner project which currently is in rancher report to multi cluster SIG. So as we speak there is some active discussion that is going on in the SIG multi cluster mailing group. You can find the link to the email chain that has some interesting conversation. So anybody interested in Submariner we would actually be more than happy to welcome you. We have like bunch of folks working on it and you know we are very open to you know seek your feedback and if you are interested in working with us you know we are happy to help. So come join us it'll be fun if you have any questions. Yeah hi I have one question yeah does the local agent update the route to the leader so that the work and communication is the local gateway right. Okay so you know you you said only host nodes have one agent yeah right yeah you know what's the function of agent is that the agent updates or send a route to the leader and so that and can publish the route to all the nodes and so and the work can communication with the and the local gateway and yeah you know you said that you and the and the private or or public cloud use the IPsec tunnel and how to and work to communication for example and Mr. Castle and work one and how to communication is the Github node you use the what technology that use the another protocol or for example route protocol or you just after the loo with a pie your your managing and software yeah for the most okay I think it's the question like how the packet from work and one reach the gateway node is that the question yeah okay so as I mentioned there are two components one is the Submariner gateway Submariner engine and the Submariner route agent so this route agent knows who is the current leader okay and for all the traffic that is for the remote cluster he will insert a rule in the worker one so that if there is a traffic to a different cluster he'll send that traffic to the gateway node so there is a routing rule added for that so it's yeah so it's like you know from what I understand your question is there are couple of agents there are two main components in the Submariner architecture one if you imagine this this particular diagram let's talk about the west cluster you have worker one one worker two now these are the daemon set ports which Kubernetes ensures that at any point of time you have a port scheduled on the worker node what they do is they constantly monitor the local gateway node so there is another port running on the gateway node which we call as a Submariner engine so these ports will try to be in sync with the local gateway node and the local gateway node will be in sync with the broker cluster so your port seeders service seeders in this particular diagram it's like 10.0.0.0 and say 172.11.0.0 now this information along with the cluster information which is a cluster identity will be advertised to the broker and it also reconciles that information of the other clusters to itself so the ports which are Submariner route agents running on the worker nodes only have to know who is the local leader so that I can forward all the traffic associated to my remote cluster to my local gateway node sorry I have two questions the first one is about the router here the router is act is just one normal port of this cluster or just one independent device okay yes that's a good question so today if you download Submariner by the which you actually can do so it is something that you can evaluate yourself like if you're interested in this hybrid cloud kind of environment you can just go git clone Submariner and then give it a try there's an interesting video that is done by Chris Kim from Rancher so coming back to your question currently broker is an independent cluster so what we are doing is yeah I just I just suspect about the performance because you created the IPsec tunnels and you know IPsec tunnel will cost a lot of resource yeah okay so right I hope your question is not like here we are using broker mainly as a data store so what happens is the local gateway nodes update the CRDs on the broker about the information of the cluster but for the actual data let's assume that you have a pod on cluster west and it's talking to pod on cluster east that will never go through the broker it actually goes via IPsec tunnel as you said and yes IPsec has its own latencies and all but naturally I think that is anyway inevitable when you are connecting a non-prem cluster with a public cloud so you want to have some kind of encryption into the data so you don't want to just send plain traffic in a public domain so that is one thing so at the same time we have plans to leverage some of the public cloud service providers hardware like for example if you take AWS we have plans to use AWS hardware VPNs wherever possible at least that can help us you know to speed up the traffic and decrease the latency thank you the second question a question is about the multiple interfaces within one pod supporting such as for covenants we have the multi-task CNI or some dam CNI or some other CNIs it can support multiple interfaces within one pod for that case do you plan to improve it uh so are you saying that does this particular solution support pods with multiple interfaces yeah the multis project you're talking about yes okay we haven't evaluated that but we kind of briefly did some kind of demo with multis but not with Submariner so this particular solution today works with the CNIs that are based out of IP tables kind of implementation so we evaluated this on Flannel, Kennel as well as OpenShift so these are the three different CNIs with which we evaluated Submariner so if you have a CNI that is very much aligned with this it might work out of box for you but to be honest we haven't tested each and every CNI but our end goal is to have some kind of driver framework which helps us to uh have capture the CNI specific implementation in that particular driver okay thank you okay hello uh I have one question uh in current architect how do you solve the service disk discovery problem like if you have a pod in west cluster and another pod in east cluster and is a service and client model and you have a service and the east cluster want to visit the west cluster using kubdns like service disk discovery so how do you fix this problem yeah that's that's a very good question and as as we said we don't have a solution yet for that uh we have some use case document wherein we started capturing uh the kind of use cases that we want to address and we are trying to get some feedback from the community so there are some pull requests both for like I think service discovery as well as for network policy uh once we have some kind of you know idea on agreement on the use cases that we intend to solve we have few ideas in our mind saying how we will solve it okay so we don't have a pr for that yet but yeah so visit is the the service is from uh using the pod ip right you okay sorry I think it's time out but we'll be here so what we can do is like you know we can just okay thank you discuss thank you thanks for attending