 Hi, this is Meng Han. I'm a GRPC maintainer. In this talk, I will talk about GRPC and service mesh and the service mesh spot within GRPC. So, first of all, what is GRPC? I think many of you must have known this, so I'm not going to talk too much about that. In short, GRPC is an RPC framework that is good to use for building microservices. But in practice, for microservices, you will need more than just the RPC framework. For example, before you can send RPCs, you will need to find the new services. While sending RPCs, you are care about load balancing and security. Before service mesh spot is added into GRPC, GRPC provides limited support for those problems. There's one DNS to resolve. There are two very simple pick-first and wrong-loving balances. TLS is also supported in GRPC, but it's only configurable at startup time. For observability, there's no building, but there's an open-sensors plugin. Of course, you can write your own plugins to get those advanced features that you'll need, but writing and maintaining plugins can be a lot of work. So, because of those problems, there's a need to integrate GRPC with service mesh. Let's take a look at how service mesh solves those problems. So, service mesh is an extra infrastructure layer added to the deployment to control how different parts of the system interact with each other. There will be one service mesh control plan that is responsible for all the configurations. The service mesh part of... The service mesh part deployed with reflection will get the configuration from the control plan, and then it will decide what traffic should go and also do all the other features like security and observability. So, in this more concrete example, this is how typically service mesh is deployed. The application, the GRPC application will be deployed with a sidecar proxy. The sidecar proxy gets the configuration from the control plan, and there will be one connection between the GRPC application and the proxy. So, all the traffic will be sent to the proxy first. The proxy then, based on the configuration, it got from the control plan, decided where the traffic should go. But one thing you can see in this model is that much of the sophistication within the GRPC is not being used. For example, there's only one connection between the application and the proxy, so the connection management feature within the GRPC is not being used. There are some other potential problems caused by the proxy. One of them is performance. So, there will be one proxy on the client side and one on the server side. This can add overhead. This can add latency. And also, proxy could become the bottleneck of the system if the applications are fast enough. Another problem is that the proxies are standalone binaries that need to be deployed along with the application. So, this means you also need to manage the lifecycle of the proxies. You'll need to deploy them, to upgrade them, and also to health check them. A third problem is you will not have end-to-end security because the proxies in between need to intercept the traffic. And end-to-end security could be important in some situations. So, because of those problems, we were thinking if we can add service mesh support into GRPC, so we don't need those proxies. And this leads to a proxy less GRPC service mesh. In this deployment, it's very similar to the proxy-based deployment. The only difference is that there's no proxy in between. So, the GRPC service will talk to the control plan directly to get the configuration. And also, the GRPC service will talk to each other directly without going through proxies. And the module we added into GRPC can understand the configuration from the control plan, so it will do all the features that was done by the proxy. That's actually one more thing to decide before we can move on. In the proxy-based model, you can pick the proxy and the control plan in pair. But this will be a native spot of service mesh within GRPC, so we actually need to decide which service mesh to support. And as always, what matters most is not the implementation, is the API. And this is the API between the application and the control plan where the configuration is exchanged. So, we want to pick an open and popular API that has very strong community support, so there will be multiple implementations to choose from. This will also help prevent vendor logging. And eventually, the API we pick is XDS. XDS APIs were developed for Envoy, and Envoy is used by many popular service mesh implementations at the proxy. So, this makes XDS the default standard for data plan APIs for service mesh. This is an overview of the XDS APIs. I'm not going to get into many details of the protocol, but I want to give an overview because it helps understand the demo at the end. From bottom up, the first thing is endpoint. Endpoint is a server instance. It contains the server address and also whether the server is healthy. Above endpoint is the locality. Locality is a group of endpoints. You can understand it as a zone. And a special thing that locality can do is priority. So, let's say we give this locality p0 and this is p1. The lower priority locality is only used when the higher priority is not healthy. And we'll have a demo to show this at the end. Above locality is cluster. A cluster, you can see it as a deployment. And by deployment, it can be... You can have deployments for different services. You can also have deployments for different versions of the same service. So, this can be helpful if you have two versions of the same service and you are migrating from one to the other. Also, cluster is where load balancing is configured. So, this blue box will contain how to do load balancing between localities and uneven for endpoints within locality. Above locality is the route. So, this is where request routing is happening. This is where you will do pass matching, header matching. You can send specific traffic to certain cluster or even you can do traffic splitting between two clusters. And this is also where timeout retry and other kind of features for certain route is configured. Above route is the listener and virtual IP. So, this is the path that makes more sense from a proxy's point of view because as a proxy, all the traffic comes from a listener, from a virtual IP. And this is where the configuration for this, for all the traffic from this listener is configured. This doesn't apply very well in GRPCs, so I'm not going to talk too much about that. Many of the XDS features are already available inside GRPC and to enable those features, you have to do three things. The first thing is to pull in the XDS dependency. XDS dependency is not part of the GRPC core because we don't want to have unnecessary code for those people not needing XDS. So you'll have to explicitly add the dependency. The second thing is you'll have to change the resolver scheme to XDS. This tells the GRPC client to use the XDS resolver and resolver will then trigger all the other components like the load balancers for XDS. A third thing is you'll provide a bootstrap file. This file have the XDS server address in it and you can also specify other configurations like you can set a node ID to identify this kind. There are some limitations of the proxy list of GRPC service mesh. The first thing is there's still a feature gap between what GRPC can do and what the proxies like Envoy can do. The XDS API is a very rich set of services and we are still actively working on adding those missing features. The second thing is that you are still need to deploy a bootstrap file with application but comparing to deploying a proxy this should be much simpler. Next is that there is a big ecosystem around Envoy especially regarding to observability. But fortunately many of those plugins are also available as GRPC interceptors and open sensors and status handlers like open sensors and also we are actively working on observability support in XDS. The last thing is that you need to recompile recompile the applications because as I said you need to explicitly add the XDS dependency but this should not be a problem if you already have a CICD system. And another thing I want to say is that you don't need to migrate all your applications to proxy list at once even within the same application you can set different resolver schemes for different GRPC channels. So the channels using XDS resolver will use the XDS modules within GRPC and those let's say use DS resolver will still go through the proxy if you configure it that way. So it is pretty easy to set up a mix and match proxy and proxy list deployment. And also another thing I want to say is that the goal of proxy list GRPC service mesh is not to put the proxies like Envoy out of business it's more like to provide an alternative to coexist with the proxies. If you have non-GRPC applications you will for sure still use the proxies and even for GRPC applications in some situations it might make more sense to go through a proxy. So as of October in GRPC release 1.33 our XDS client already supports the four main XDS services and also we support load reporting via LIS. Note that this means this includes XDS version V2 and V3. In terms of features we support the load balancing within localities and revising for localities we also recently added support for pass matching, head matching and traffic splitting. So those are the things that we are actually working on. We're working on timeout, circuit working for injection. We're also working on adding XDS support on the server side. Two other big features are security and observability. You can watch the progress on GitHub. You're also very welcome to contribute. This is a list of resources that put together I included the GRPC design docs here in case some of you might be interested in how the XDS implementation is done within GRPC. So that's enough of talking and we'll do a demo to show how this works in practice. In this demo I will use GRPC wallet. This is an example implementation we created to show all the features and for the control plan I will use traffic vector. This is Google's managed service mesh control plan and I'm not going to show you exactly how the configuration for the control plan looks like. I'll mode to show you the service setup and then show you what the client can do. So some context for the GRPC wallet application. So GRPC wallet is a wallet for a special coin called GRPC coin. We'll have three services here. The account service. This is where the user information is stored. The start service. This is where the price for GRPC coin is reported and the wallet service. This is where the number of GRPC coins for each user is stored. So the client can make a price call to the start service like the green arrows. So the client will send the user information with the RPC. Then the start service will verify with the account service for the information and then based on the response you will send the price back to the client. The client can also talk to the wallet service to get a balance for a user. Similar to the price call, the client will also send the user information and the wallet service will verify with the account service and then the wallet service will send an RPC to the start service to get a price. So based on those two responses, the wallet service will send the balance response with number of GRPC coins and the total worth of the GRPC coin to the client. The first demo we're going to show is traffic splitting. So in this demo, the client will talk to the wallet service to get the balance. And we will create two deployments of the wallet service, V1 and V2. Imagine that you are migrating from V1 to V2 and before you are confident that V2 is stable enough, you want to split traffic between V1 and V2. So we can make the RPC fetch balance to send 60% of the traffic to V1 and 40% to V2. And as the V2 service is getting stable and stable, you can gradually inquire traffic to V2. So this is our wallet client here and we're going to run this command to tell the client to talk to the wallet service to get the balance. And note that we set the XTS redeveloper scheme here to use the XTS redeveloper. And also we tell it to make the unary RPCs in a follow-up so we can see the different responses from different plugins. If we run that, we'll see those responses from V1 and there's one from V2. So you'll see that the distribution is not exactly 60% and 40%, but you will see that there are more traffic from V1 than V2 because the algorithm we used here is not, it's random-based when it run robin so it's not exactly the conflict you said. But there's definitely more traffic from V1 than V2. Okay, so the next demo is header matching. In this demo, we'll use the client to talk to the status service to get the price. And we also have two deployments of the status service, the status service for normal users and the status service for only for premium users. The premium user gets a price update with a higher frequency. The client will send the user information with the RPC. It also includes whether this user is a premium user or not. And then based on the header, we'll route the premium request to only the status premium service. So around this command, so around this command to tell the client to talk to the status service to get the price. And also we still set the XTS three-double scheme. And we are using Bob as the user in this command. Bob is a normal user. So if we run that, we can see this, the response comes back from the status service. If we run it again, it should still come from the status service by the different backend. So if we change this to user Alice, we'll see that response is from a status premium cluster and then also the updates is much more frequent. So that's header matching. And the third demo we're going to show is failover. Failover is the priority feature of localities I talked about in the XTS overview side. In this demo, we're going to run the client in US central and we're going to start multiple services, some of them in US central, some of them in US west. The logic here is if the client and the server are in the same zone, the server from the same zone will have the higher priority. So US central servers will have priority zero and US west servers will have priority one. We want to see is that all the traffic will go to US central at first. So none of the US west server should receive any traffic. And then we're going to turn down the servers from US central and then the traffic will go to US west. And so this is the setup I... So this is the wallet client and this is the server from US central. This is also US central. This is US west. This is US west as well. So if we run this command tell it to talk to the status servers to get prized and also run the general RPC in a for loop. We'll see both the US central servers are getting traffic but the US west servers are getting nothing. So if we kill one of the US central servers, you'll see the other US central server is still getting all the traffic but the US west are getting nothing. So if we kill this one as well, we'll see that US west servers are getting the traffic now and this failover happens fast because the US west servers already send to the client. They are just a lower priority. So now if we restart the US central servers, the traffic should go back to US central in a minute and this is slow because the control plan need to do health check of those servers and the traffic will only go there when those servers are healthy. So you can see one of the US central servers already recovered and the other one is taking a little bit long. And now the traffic are all recovered for the US central servers and also it's doing run-robin between those two servers. All right, with that, that's the end of this presentation. So if you have any questions, you can find me and also you can find the GIPC team here. And if you have any questions, let me know.