 Hello, everyone. This is Vikas and Nick from TATRADE. At TATRADE, we work at Istio and other service mesh technologies around Istio. In this talk, we are going to discuss and share our experiences and learnings that we gained while working with some of our large enterprise customers, trying to help them in machifying their multi cluster installations. This talk is roughly divided into four sections. In the first section, we will discuss about the complexity and the manageability issues in the large scale multi cluster installations. Then, assuming that service mesh and Istio is going to help in managing these problems, we will see the gap and pain points if we try to follow the Istio community suggested multi cluster installation approaches for overlaying Istio mesh on the multi cluster installations. And then we will discuss in the next two sections our approach, which is a bit more practical and how it is addressing the pain points and the gaps that we found in the community suggested installation approaches. So next passing over to Nick to start with the problems of the multi cluster installations. Thank you. Congratulations, everybody. You've made it to multi cluster. You've successfully scaled your application from a single cluster to multiple. But when you're looking back and your SLAs haven't improved, you broke a bunch of security rules in doing so so that you could get the networking to connect properly, and then you realize that management has got a lot more difficult. It seems like the problems that you're facing are bigger than the ones you were doing earlier. So trading off a better SLA for more problems didn't really seem advantageous to you. And so we're going to walk through some of these problems that a lot of our customers have early on when they're adopting the multi cluster model. The first one that we want to talk about is this is a typical deployment that you would see when you go from a one to multi cluster environment. Typically, you would have a low balancer within the cluster. Engine X ingress is the default one that most of you are probably familiar with. And then we connect all of those clusters and do the routing with a tier one, which we call tier one low balancer, which is typically your cloud low balancer. And that's so you have a single point of entry and can out between multiple clusters. But your team one, you've been tasked to move your microservice from one cluster into a multi cluster environment now. You're going to follow the coattails of team two, who's already been doing it for a little while. And so you're going to take your application and stick it behind that same default low balancer that team one or team two has been using. But all of a sudden, you start to know you have more outages than you have before. You've bound yourself now to the failures of team two. Team two's errors have now become your problem. Because when they have problems, they're low balancers. They cause the low balancer to be removed from the pool, which is now causing outages for you. And so there's cases where the going multi cluster doesn't actually make anything better. And so we're going to have some solutions that will address these problems in a very practical way. Secondly, when you go to this typical multi cluster routing, you don't have a good way of being able to tell your microservice how to locally address services if they're within your cluster or within your region. And so to get the HA availability, you're going to have to call outside of your cluster to that tier one low balancer to get access to that other microservice or service that you're trying to access. And so if you typically only have two options, either you route externally to that low balancer or directly internally to that cluster that you're on. But if you route locally and that pod or that service is down, you have no failover. And finally, as your multi cluster architecture grows, you can get some really complex dependency chains. It'd be very difficult for you to trace all of the consumers that depend on you and your functionality. And so without ServiceMesh, you'll find these problems are very prevalent. And so then you'll be looking for that solution, which ServiceMesh seems to be the one that will solve these problems for you. And so as you're adopting ServiceMesh, there's a lot of questions that our customers are asking. And how do you go about doing it in the correct way? Am I adding more complexity to my architecture when multi clusters already made it complex? I chose Istio. And now when I upgraded it, I lost connectivity to everything. And so we're having larger production outages. Do I need to re-architect my multi clusters setup so that I can adopt ServiceMesh? And so a lot of these questions we'll answer later on the sites. And now I'll pass it over to Vikas to start with the Istio solution. Sure. So yeah, there is no denying that Istio is super powerful. It can, Istio can help in managing these complexities of multi cluster installations. But how exactly I'm going to overlay Istio mesh for multi cluster installations? Because there are different teams, different personas, different administrative boundaries over these clusters. So how one is going to do that? So we'll start with taking a look over what the Istio community documentation has been suggesting. So at the moment there are in the latest release, which is 1.7, there are two multi cluster installation approaches. The one is replicated control planes and another one is shared control plane. Because of the resiliency requirements, we can rule out the shared control plane approach. And let's take a deeper look at the replicated control planes approach. So how this works is that in your clusters, each cluster has an independent Istio control plane running. And the CA of each of the Istio's control planes on these clusters is configured with the intermediate CAs which are generated from the same shared root CA. The services which are the shared services and are supposed to be accessed from the remote clusters are exposed through service entries. For example, here the service foo, which is in the cluster 2, is exposed on the cluster 1 using a service entry. And the host name in the service entry has this dot global prefix added to the name and name space of the service. And the endpoints of the service entry points to the gateways of the remote clusters. At runtime, it looks something like this. When a client makes a request, it goes to Kubernetes, a DNS server, Kubernetes, Kubernetes DNS, we have to run with a core DNS plugin, which is shipped by Istio. So all the dot global prefixed host name queries, Kubernetes DNS, pass these queries to core DNS. And core DNS results these host name DNS queries with the virtual IP of the service entry. And using this virtual IP, the sidecar picks up the listener. And there the listener is pointing to the remote gateway where the actual service is, the backend service is running. And this way the request reaches the cluster 2's gateway. And SNI host still has the host name which has dot global prefix, so it won't make to the actual backend service. So there an on-way filter is configured to translate this dot global from the SNI host to the dot service dot cluster dot global. So now this gateway and the cluster 2 can finally forward the request to the actual service implementation. But can we use this in our production environments? Unfortunately not. And why not is because it doesn't support local development of a routing out of the box. We cannot simply, if we have a local instance of the service running in cluster, ideally it is desired that the first priority should be that the request should be served by the local instance. So we should be able to somehow the local instance should be part of the load balancer pool. But we cannot simply here in this approach, we cannot simply put the cluster IP in the service entry endpoints. And the reason is this dot global thing. We can fix this. We can make it correct by adding virtual service to rewrite this host name, converting this dot global to canonical name of the service in the local cluster. But this problem is not the only problem. There are much bigger problems than this in this approach. So the host names in the clients are directly depending on the actual backend implementation of the services. So let me explain you with this example here. The actual service is foo, which is running in the foo and has name space in the cluster too. So the clients on the remote clusters here at the cluster one should use the host name, which is the service name, actual service name dot service name space dot global. And this is pretty bad. Because if there are n number of clusters serving this API, serving this service, then the owner of these services cannot change the backend implementation of the service at all because the clients will get broken in that case. So in other words, we can say there is no abstraction. The clients are directly depending on the actual backend implementation. And there is no very straightforward locality of a routing as well. These points are already acknowledged by the Istio community as well. And we're also part of this community. And we have been working to come up with better installation approaches. So in the one dot eight, we have this a new approach. And this replicated control plane is not there anymore in the coming releases. But so in this approach, for example, which is coming, here also we have a still control plane running in each of the clusters. And what change here is this, that the IstioD of each cluster is watching the API servers of all the remote clusters. So this solves the locality aware routing problem because it has all the local endpoint and the remote endpoint in the same LB pool. But the problem of the direct dependency on the canonical name of the service and namespace is still there. Still there is no abstraction. And plus these scalability issues are there and the security concerns are there. Because these clusters are owned by different teams and they may not want to expose all their internal implementation of the services to other teams. So that's where we had to come up with a different approach to meet the requirements of our customers. And now I'm passing over to Nick to explain our approach. Yeah, so we took a look at this and looked at our customers' problems. And our customers have many, many clusters that they want to connect together and span multiple clouds. And so we wanted an approach that was somewhat simplistic and practical to their use cases. And it didn't require a lot of architectural, core architectural changes within their environments to adopt this service mesh architecture. And then we also wanted to put a little bit of the application developer in mind when we deploy these. And so what kind of developer mindset did we try to assume when we were coming up with a solution? Well, developers just wanted to consume applications as SaaS products, even internally. They wanted to access your API, but assume that it's HA and that you're routing locally or efficiently. And so they don't really necessarily care where your service is hosted. It's that they can just reach it easily. They want to spend more time on implementing the features that are really going to drive that product, rather than spending time on a lot of the implementation details with auth, routing, networking, and a lot of stuff that comes with service mesh or even without. And they want to be able to advertise their own products effectively and easily to either external customers or to other teams within their organization. And so with that in mind, we came up with a much more simplistic approach in Istio installation. And that's deploying Istio essentially isolated per cluster. We deploy the control plane on every cluster. We scope that control plane to only know about services within the cluster that it's residing. And so we have a locally scoped mesh. We say that you should manage these control planes externally. And you can do so with a number of tools, CI tools, GitHub. And so that'll take away the managing everything individually. And then the final point is we really want you to embrace gate-base a little bit differently than you are today and expand upon the use of those. So why do we want to do separate Istio control planes? Well, we really want to align the failure domains of your application to that cluster. And so if you push bad configs or you upgrade Istio incorrectly, and it causes an outage, it is now localized to that cluster that you were working. And so it gives you a lot more control over not bringing down your entire environment. And it aligns that Istio control plane with the underlying cluster that it's running out to networking and node management and stuff like that. And so it's a lot more effective for our customers to manage them these way, especially for outages. And then it also allows you to do safer upgrades. So as you upgrade Istio, you might be able to pick a cluster that has less usage. Update that one. If it goes successfully, then you can roll it out to other clusters. And this would differ from the shared control plane that was one of the Istio recommendations. But we have a problem. We can't. The applications cannot see services in other clusters. And so we need to adjust this problem. And so the way that we touch rate want you to go about doing this is by embracing gateways. And gateways are just low balancers that are fronting your applications. But we want you to use them on a product-focused architecture. And so currently, you're probably using the NGINX in Grass Gateway. Or if you're using Istio, the defaulting Grass Gateway in Istio system. But we want you to get rid of those and move to this product-focused gateways. And so in Istio, that's really easy. You can just stand up any number of ingress gateways and then determine the services that are behind it, or upstream from it. And so these product-focused gateways then allow you to control the microservices behind it as if there are one API. And so you can expose your API internally or externally via this gateway. And now you've aligned the failure domain of this gateway with the product that it's supporting. And so you won't have these shared gateway problem where other products could bring you down. And it just requires you to stand up more gateways. And then you'll be able to talk across cluster a lot easier and we'll explain why in a second. Then you can also push your authentication and authorization circuit breaking up into this gateway and then tune it specifically for your applications that are behind it. And so it's a really purpose-built gateway for your product. So in a cluster, this is kind of what you would imagine logically what is happening from using a gateway. The consumer namespace on the right here is not related to payment API product, but if those consume that API. And so logically you should be consuming it at the gateway level. That allows you to scale microservices, add functionality behind it without interruption of that service. And so when we go to a multi-cluster environment, this becomes really easy. We can then just replicate that payments namespace into another cluster with that gateway as a whole package. And then the consumer namespace now has multiple endpoints to reach you at. And so we add those hosts to that consumer namespace and now it has two options. So making your payments API highly available. But we didn't have to stop there. We can do some more intelligent routing that's saying if your consumer is in the same cluster as the gateway that it's consuming and that gateway is closer than say an external gateway in another cluster, let's prefer routing locally first. And so if you're in the same cluster we can actually use the sidecar to act as the gateway. And so in this example, this is what we're doing. We're using the sidecar to act as the gateway to those microservices to access them directly. But in the case of failure or outage we can actually reroute requests over to the cluster two in the US West region. So you still get that highly available but now we've added a component of this intelligent local aware routing. This is really a cost effective for our larger clients that have high volumes of traffic and egress and they're paying a lot of for egress data. So why should you embrace gateways more than you are today? It gives you a way to abstract your APIs in your microservices and with a product spin not a product focus. It's really architected for growing those in a multi-cluster environment. It's more of a copy paste than a one-off or trying to figure out who owns the local egress gateway so you can attach your resources to it. And then it aligns a lot more with the non-mesh architectures that exist today. They're very low balance or centric and so gateway really fits that low balance. But we're just empowering the gateway to be a lot more intelligent about the traffic that it's routing. So if we put this all together and what we're doing at Tetray is we're making that gateway discovery automated. And so from any number of clusters that you can stand up the gateways will automatically be discoverable and all other clusters. That means we don't need to know about all the other services, microservices running in other clusters. We just need to know where those gateway ingress gateways are. We are making sure that when you route to those other gateways outside of your own cluster that it's encrypted and it's authorized. And then we're also implementing locality-based routing. So to improve your cost savings and then failover to failover effectively. And then finally we're using a newer technology to SEO which is mesh DNS, but it allows you to eliminate the need for that namespace routing that Vikas was talking about earlier. You can essentially use your own abstracted DNS for these gateways to more represent the product that you're offering. So hand over Vikas to talk a little bit more about how we're doing some of this routing and gateway management. Thanks, Nick. All right, so from the service owner point of view, exposing a service to the external user has the same security concerns if the intention is to expose the service within the mesh but to the remote clusters which are sitting across the public internet. So essentially whether it, so even the request is coming from within the mesh but it is coming through the public internet. So with this point, we in our model, the service owner is supposed to just expose the service to the external world over the chosen gateway port and the user is supposed to configure whatever security measures, authentication and authorization, he or she feels comfortable with to deal with the public internet security concerns. And what we do programmatically is that the gateway hosts, the APIs which are exposed to the external world, we automatically expose these APIs within the mesh to be consumed from the remote clusters. So what we do basically is that this 15443 is a reserved port for the Istio MTLS. We discover what the APIs are exposed to the external world and we expose the same APIs over Istio MTLS on this reserved port for the East West traffic. And additionally, in addition to the Istio MTLS, we apply all the authentication and authorization configurations which are with the user has configured for the external world traffic. We apply those on this 15443 port as well. So it is like double secure. In addition to that, now this is a very important slide. We create a local service entry as well. And the point to focus here is that the host name in the service, for example here pay.example.com is same as the host which is exposed to be consumed by external world over 9443 port. What this means is that whether the client of the service is within this same cluster or in some other remote cluster or some external user, all are consuming this service with the same abstracted API pay.example.com. And the remote instances and the local instance are part of the same load balancer pool. So, and just one more implementation detail here. To achieve this, the endpoint in this service entry is the local Kubernetes cluster IP. And now this is how it will look if there are more than one clusters where this backend service has instances running. So, this service entry in the local cluster has two endpoints. The one is the cluster IP of the local service instance and this another one in the green card. This is the gateway IP address of the remote cluster where one more instance of the service is running. Now we are going to see at runtime when the request flows from the client, how load balancing happens and how it reaches the destination service. So, first of all, when the client makes a request, the client makes a request to the abstracted API pay.example.com, whether it is for the local instance. So, the client need not bother about where the service instance is running locally or globally or wherever it is. So, abstracted API is used by the clients. This goes to the sidecar and in the sidecar we are using a feature where the proxy hijacks the DNS queries and it caches the DNS and resolves the DNS query with the virtual IP of the service entry. And then after the DNS query gets resolved, then the request starts from the client container with the virtual IP of the service entry and now in the sidecar matching with the virtual IP, the listener is picked up. The listener has the endpoints to the local instance as well as to the remote instances and because of the locality of your routing, in ideal case, the local instances are picked up and if there is no local instance or the local instance is in failure mode, then sidecar will route the request to the remote gateways and when it reaches the gateway, the request is over the STMTLS plus additionally, the gateway will authorize the request using the extra authentication and authorization configurations which the user service owner has configured for the external world. So here it can be any external oath service which will authorize the request in addition to STMTLS and if everything passes, then the request gets routed to the actual instance. So just to summarize here, here we are using a abstracted API. So there is no dependency directly on the backend service and secondly, what API is a service owner wants to expose he has full control on the exposure of the APIs. For example, like whatever API he wants to expose to the consumed within the mesh or to the external world, the service owner will expose this on the gateway. It is not that the whole cluster is being washed upon by the remote clusters, remote histories. So because of the time constraint, we could cover this much only, though we are doing much more interesting stuff, we are doing multi-tier, we are handling multi-tier architectures as well. We are handling where the clusters are in the, the clusters are part of the different VPCs which are not directly connected but talking through a shared VPCs. So we would really like to explain all these things but because of time constraints, we cannot talk about this. So we can take these things offline if you are interested in knowing more about this. And I think that's pretty much all. Thank you. Thank you.