 Hello. Thank you for joining me today in this session. We're going to talk about multi-cluster service mesh. My name is Adam. I'm a field engineer at Solotio, a company specialized in service mesh and API gateways. Feel free to reach me out if you guys have any question or just to have a conversation. So the agenda today, we're going to talk about first, talk about the service mesh, the origins of it, just as an introduction to it. Then we're going to talk about, take an example of a service mesh. In this case, we're going to talk about this show. And then we're going to talk about a multi-cluster deployment and what's the concerns we have with it. And then we're going to talk about how to solve the complexity that comes with multi-cluster service mesh. And at the end, we are going to end with demo. So first, just as a reminder, let's talk about the origins of a service mesh. So to talk about the service mesh, we need first to talk about the microservice architecture. The microservice architecture came to solve a lot of problems that we used to have with the monolith architecture. So in the monolith architecture, we used to have a big component with all the logic in it. So for example, to change only a part of the logic, we used to wait for a long-release cycles. It used to be hard to maintain, hard to scale, because you can't scale only a part of the system. It comes with a lot of drawbacks. Now, to solve these issues, we moved to a microservice architecture where we split all these components into different services, microservices, and they all communicate through the network. And every single microservice will deal with only a part of the logic. So we would have only one service dealing with, for example, let's say a banking account. You have a user microservice. You have an account microservice and other services like that. So in this scenario, you can scale only a part of the logic. So you can scale only one component is needed the most, for example. Or you can also change only a part of the logic. For example, if you want to change the microservice user to a new microservice that deals like a new version of it, you will have to only change that microservice. So you just see that the microservice architecture brings a lot of flexibility to this, to what's actually compared to what we used to have with the Melanith. Now, the microservice architecture has also some drawbacks. The problem is now we have multiple services. So to solve a problem where we need to identify if a request, for example, a request failing between a shaming of some services, how to locate what service failed the request. And also, what if we have network issues? What about telemetry? What about unified logging? And all the things like that. So to solve this issue, a lot of companies start first to create libraries to add with their services. So every microservice used to have a library that deals with this cross-cutting concerns where it deals with the retry, network retry, tracing and things like that. But it gets also complicated to maintain. Imagine if you have a polyglot environment. So imagine you have multiple services in different languages. You need to maintain different libraries. And it's not easy usually to unify them and things like that. So to solve all this, that's the raise of basically a service mesh. That's where the service mesh is used for. So you have different services. And your components will have a sidecar. So a component attached to it. And that sidecar will deal with all these cross-cutting concerns, for example, network retries and enforcing some sort of policy, securing traffic between components, telemetry, and all these cross-cutting concerns that we talked about, that we need to have to have a more resilient environment. So in this example here, we see that we have, for example, three services, service one, service two, and service three talking to each other. We have a sidecar with every single service. And that deals with the service-service communication, the Microsoft, the sidecars will deal with, for example, enforcing the telemetry, the MTLS between the component, having a unified logging, and all these things. And you have your service actually dealing with only the logic of the service. So the service won't have to deal with all the cross-cutting concern and doesn't have to worry about network failures and all these things. All that is dealt with. We handle it with the sidecar. The developers will only focus on developing the services and only the core components and logic of the Microsoft. Now, in a lot of service measures, we use Envoy as a sidecar. And this is why we take an example of Envoy here. Envoy is a lightweight gateway. It's HTTP2. You can deploy it next to your service to deal with, for example, we talked about service-service MTLS, enforcing policies like that, enforcing like having a way to get telemetry, tracing between components, all the things that we needed in the Microsoft architecture to make it more resilient. Now, for your data planes or from service-service, you need actually a control plane that controls this configuration. Push it to your sidecar to enforce this policy and manage and basically configure the traffic from service to service. And here in this example, we're going to take an example of a service mesh, Istio, where we have a control plane that control your data plane and push policies on your sidecars. Here we see them in red. We have Envoy. Here we have your service account, the account microservice that wants to talk to the user microservice. In this case, you're going to go through the sidecar. The sidecar will have some configuration that being pushed through the control plane. The control plane also configure the service discovery and other things. So we'll have a way to know where to call the other service and how to call it. In this case, we have MTLS. We can have some policy enforcement that we need to give access to the service user from the service account. That's the role of a control plane. And this is an example of how Istio is actually implementing the service mesh. Now, to configure your control plane, to configure Istio in this example, you'll need multiple configuration. You'll need multiple components, multiple configuration here. We have a virtual service, for example, that is used for routing traffic. You'll have destination rules. You can have a service entry. That's an example for Istio, but every single service mesh will need some sort of configuration for routing between components. And we will talk about access control in Istio because it's needed later. I'm going to talk about it more. In this example here, in Istio, we use Spiffy framework, where Spiffy is a framework that allows you to enforce some policies on trusting communication between workloads. And here, for example, you'll have a Spiffy ID in Istio. You'll have Spiffy the trust domain. Then you have the namespace and the service count. And that's if you go back to the example of Istio, let's say if we have the service account that wants to talk to the service user, you'll have first the MTLS that enforce security between the two. But after verifying MTLS, you also verify that the account microservice has access to the user microservice. And that's through saying the creating a policy, a security policy in Istio that can't figure out this authorization, allowing the traffic after establishing MTLS, allowing the traffic from the control, from the service microservice account in this example to talk to the service user. Now, after just talking, giving an example of an introduction to how Istio operates and what's the control plane, now we're going to talk about multi-cluster service mesh, how to install a service mesh or how to manage multiple service meshes. We give an example of a service mesh on only one cluster where you have your control plane, Istio, configuring the data plane and side cars between the components. It's pretty simple to look at it into only one, for example, one cluster when we talked about a multi-cluster installation that comes with bigger difficulty here to allow traffic between services and service meshes, you'll need more configuration and it comes to another level of complexity. Now, how we do that in Istio, there's different ways of handling this multi-cluster deployment. In this case here we have a way where we can have a control plane in only one cluster, but we need to give access to the API server so the control plane needs to have access to the API server on a different cluster to get the data about the services running within the cluster and other things. We have also in a mode where you have multiple control planes, every single cluster has its own control plane, but also we always need to have access to the API server on different clusters. Imagine the complexity when you have 100 cluster, you have 100 mesh, right? How can you, it's going to be N2, so every single microservice need to know about every single, sorry, every single cluster will need to know about every single other cluster to be able to establish this multi-mesh configuration. Now, you guys see that when we have multiple cluster, it's not coming with different problems. Like if we take an example of Istio and let's say we want to configure something like that, right? Where we have two cluster, cluster one and cluster two, they both have different components. In this case, I'm using the example of the Istio Book Info example. We have four microservices. They have a product page and the product page needs to talk to a detail microservice and a review microservice. The review microservice will talk to a rating microservice. Now, you guys see that we have this on, let's say we have this on two different clusters, right? And you want to have a way to basically say, let's say you have a new version of the review microservice and you just want to try something else. You want to try to route the traffic, some traffic from the cluster one to the version three of the reviews of the other on the in the other cluster, right? That's a common use case, specifically, like especially if you want to do like some blue-green deployments and things like that, right? Now, if you want to do that using Istio as an example of a service mesh, you will have to create a lot of configuration. So here, only in the cluster one, you will have to create a virtual service that points to basically the review microservice. Then you would have to create a distinction rule, a service entry. That's only on cluster one. And on cluster two, you also need to create other configuration, like when the traffic is routed to cluster two, you need to transform it to a local call to your review microservice. You also need to define your distinction rule and then basically you're going to call the pod version three. You're going to call to your review v3 on the cluster two. But you guys see that this is a pretty standard scenario, but it gets pretty complex, pretty fast to create something like that. Imagine we have hundreds of clusters, how to manage all this configuration, right? And you also need to authorize the access. We were talking about specie as an interaction. Here where we use it, basically in this, we're saying that we allowing, you know, you guys can notice here, the specie ID that's basically is used to allow the traffic from the service, the cluster one to the cluster two to the service review. Now, you guys see that this is a pretty simple use case. It comes with a lot of configuration. And we see that when we have multiple clusters, it comes with a lot of concerns, a lot of issues that you can have like while scaling. So one of the issues can be, you need to have a way to see the configuration over all your microservice, all your service measures. You want to know every single what's the status of your single service match. You need to have a kind of a service discovery over every single, over all your service meshes. You need to have a way to unify the identity. You need to have a way to, for example, say, if you have mtls between one cluster and you have mtls in the other cluster, you need to have a way also to unify the entity to have, for example, the same RUCA. So you can have some sort of mtls between cluster one to a service in cluster two. If every single cluster has its own identity, that won't work to have a multi-cluster communication. You can also have, you need a way to have a configuration, a pretty easy configuration deployed to only some management cluster, and that configuration needs to be deployed to every single cluster. You want to, like, we took an example of a pretty standard way of doing, like, some canary routing here between two clusters, and you guys see that the complexity was, yeah, it's pretty complex, pretty fast. So we need to have a way to simplify this. You also need another way of enforcing high-level policies. Like, if you have different personas operating your service meshes, you want to just have a way to create a policy that says, hey, this team or this user is allowed, basically, to route traffic from his service mesh to only this service mesh or only these services. Regardless of the implementation, how we do it behind the scenes, for example, in the control, like, in Istio, the APIs are pretty low-level. So, you know, for doing the routing, you need all virtual services and you need service entries and distinction rules. So regardless of all this, what is more important than just saying that this person, I can do, like, routing or can enforce this policy or they can change the aesthetics. You don't want to have to go to a lower-level configuration. And also, you need to have a way to, like, you know, isolate your fault. So for solving these issues, you need a management play. You need something like a component that knows about all the service meshes, knows about all the clusters, knows about all the control planes, and be able to manage them all, right? So, in this use case, I'm taking, as an example, GlueMesh, which is a more familiar with, which is a management plane that operates on multiple clusters. For example, here in this use case, you'll have your management plane where you register different clusters, all these clusters, you're going to have, like, a global view over all the services you have, and you can define a policy, like, an easy way to route traffic between services. And that's what I'm going to show you now in this example. So let's say I have, so I have my management plane where I have two cluster registers. So I have two control planes, right? In this case, cluster one and cluster two, they both have these two installed. And if I can look at the cluster one, I can see that the workloads, here we have a review version one and review version two, right? Now, if I go back to the other cluster, we can see that in this cluster, we have version one, version two, and version three. So if you want to go back to the architecture I was going to show you, so this one, you will see that the other cluster has only, has the three versions, and the first cluster has only two versions. And we want to do some sort of canary deployment to basically route only part of the traffic to this cluster, right? Now, the only difference between all these services in the review, the different versions, is that the version one has no, like, when we look at this UI here, so version one has no color, version two has like a black stars, and version three has red stars. So if you see here, the first configuration we have, obviously we're talking only to the cluster one that has version one and version two, you guys see that we can't basically, we are just showing the two colors, which is the no colors and the black color. Now, meaning that we are routing only to version one and version two. Now, let's create an easy configuration to route to the service to the version three of the service review in the cluster two. For that, I just need to create one configuration on my management plane, saying that what I need is like when I try to call the review service in my cluster one, behind, like when calling the service, the routing they need to operate is that I'm going to route 75% of the traffic to the version three in the cluster two, and I want to route 15% to the version one in the same cluster and 10% to the version two on the same cluster. You guys see that's only one policy here to manage routing over to multiple clusters because we have a management plane. Now, I go back to my terminal, then let's say I apply this, apply the configuration, create it, go back to my service that was routing only to basically the two versions on the same clusters. If I refresh a couple of times, you guys see that now we have the rest stars and these rest stars are coming from a different cluster. The traffic is going to cluster one and then route it to the cluster two to version three of the review service, which is the rest stars. You guys see that in comparison with what we needed to create to operate this without management plane, if we had to do that directly using issue service mesh, as an example, you will have to create all this configuration on the cluster one and all this configuration on cluster two, just to do something simple as one routing from one cluster to another that we operated with did that only with one configuration in our management plane. So you see that a management plane solves a lot of issues that comes with the complexity of a multi cluster service mesh. Here I'm taking again an example of using blue mesh, but as a management plane and you see that using a management plane you can create one configuration in one cluster, get replicated to all the clusters registered, have a global view over all your clusters, be able to create easy routing policies like routing between clusters, failover between clusters. You can also enforce some high level policies that say I can allow a user to create traffic policies or restrict a user to only do manage only one part of the only some subset of some clusters. You need this high level view that comes with a management plane. So you guys see again that the management plane is really important when scaling and when having multi cluster service mesh. With that, thank you for listening to me. I'll be happy to have the conversation going on the social media. Obviously, you can guys reach me out. I'll be happy to have any conversation. Thank you.