 Hey, welcome to our talk today. We'll be talking about decentralized routing for a sharded application on ServiceMesh. I'm Vinay Gauranguntala, and this is Pankaj Sikha. So we are software engineers at Intuit, and we work on the ServiceMesh platform at Intuit. So let me go over the agenda for today. We'll be talking a little bit about what ServiceMesh is and how it is set up at Intuit. A little bit about the sharded application and how routing is done on a sharded application, our design, and some challenges we faced. We'll go over the demo, as well as some future investment we have on the solution. So a little bit about Intuit. Intuit is a leading fintech company. I would consider ourselves more of a leading edge, a platform company. We are really honored to have received the end user award for 2019, as well as 2022. If you look at the CNCF portfolio, Intuit pretty much uses most of the products there. We also are a big open-source contributor. We contribute to more than 75-plus projects, and some of the projects we maintain and our open-source are Argo, Kiko, Admiral, and NUMOproj. So let me go into a little bit of the scale at which Intuit operates. Intuit has more than 245-plus Kubernetes clusters running in production. We span about 16,000 namespaces in those clusters. At peak, we recorded more than 77,000 nodes are running in production with more than 7 million pods. And we have 2,000-plus unique services which are running among these clusters. And these services are a mix of both end user services as well as internal services. So next, I want to just walk through Service Mesh. How many of you here know what Service Mesh is? I think most of you do, so I can go a little fast on this section. Service Mesh is an infrastructure layer to facilitate service-to-service communication. I think the most popular approach right now is to use a sidecar to intercept traffic for ingress and egress communication through the application. This allows us to do a few cool things by adding a sidecar or using Service Mesh. Some of them are security by providing mutual TLS observability, by taking a snapshot of the entire system and finding the bottlenecks, routing by providing things like canary or traffic splitting, and also reliability by automating retries. Some of the most popular Service Mesh offerings out there are Istio, Linkerty, and Console. We had intuited to use Istio as our Service Mesh platform. So there's a little bit about Istio. Since most of you are familiar, I'll just go through this pretty quick. Istio has many services, mainly divided logically into two parts, Control Plane and Data Plane. The Data Plane part is where the service or the application lies, and this is where the communication happens. And to basically maintain or configure these proxies, we have a Control Plane section. And here I'm just mentioning two important components in Istio. One is Pilot and other is Gali. Pilot is responsible for configuring the proxy, and Gali is mainly for user configuration management. And it listens to your request and sends the instructions to Pilot. So at Intuit, we have a multi-cluster Istio setup. So here I'm just representing three clusters with Istio installed individually, Istio Control Plane installed individually in each of these clusters. Now think of a communication, say, from Service C to Service B, which is within the cluster, and Istio is responsible for that. Now, for example, if you think of communication, say, from Service A to Service C, or Service C to Service D, this would require some sort of a multi-cluster setup. We use admiral as our open source or Control Plane component, which manages different Istio resources among these clusters and allows communication between different services here. So now that we have an overview of Service Mesh at Intuit, I'll hand it off to Pankaj. We'll talk a little bit about our use case. Thanks, Vinay. Thanks, Vinay, for the introduction. So before I dive into the details, a little bit about the traffic of the way it's handled at Intuit. So all the north-south traffic at Intuit is handled through Intuit-managed API gateway. And internally, we are going through this journey of moving all these internal application traffic to Service Mesh. While we were going through this journey, we came across this use case where an internal service wanted to talk to a sharded application, which was distributed across multiple shards. Before we get into the details about, so basically what we had to do was solve for which particular shard request needs to go based on the request context. Before we dive into the details, I think I'll touch upon why a sharded application is required and what sharded application is. So think of a normal web application which is fronted by API gateway and has a bunch of users that are making requests to it. As the number of users grow, there's some kind of, some need of some scaling on the web application. One approach to do so is sharding, where the web application splits its data into multiple shards. And API gateway, the way it makes the routing decision to these shards, it could do so by maintaining a static list. But as the number of users grow even further, static list is no longer a viable approach and there's a need for a better solution. Add into it what we do is we have an external lookup service that has the logic and it's using basically the same back end as these web applications. So it knows where the data is sharded and it also has an algorithm to determine the shard where it needs to go to. So gateway makes a call to this lookup service and then it knows what sharded needs to route it to. To make these lookups more efficient, we are also using a distributed cache. So one example of an application that does that into it is QuickBooks Online, which is an accounting software, more than 150,000 companies use it and there are more than five million customers across the globe. So let's look at how this routing looks like when we have to do so on mesh. In this diagram, what you see on the left is service A, which has a sidecar proxy. It's trying to communicate with a sharded application which is distributed among three different charts. So here, the routing decision can be made on the on by proxy or one approach is to let this request go through API gateway as well, but then that introduces an additional hop between the service to service communication and usually the services that talk to each other a lot are co-located within the same cluster, probably also within the same VPC and then this call adds network latency by going out of the VPC and back in and so on. So we can avoid that by using service mesh. So the approach we took was to implement the routing logic within the proxy itself, but before we go there, I want to go through the goals we had for this approach. So as the shards are maintained by the services themselves, we did not want the client to be aware of them. These decisions need to be transparent to the client services as well. So we did not want to have any changes or if any changes, we wanted them to be very minimal there. Also, as I spoke about the QBO use case which shards its data based on the company IDs, we have to handle that and also other applications that have millions of users, they may shard their data based upon a user location, multiple criterias, our solution needed to handle all those use cases as well. So when, imagine I'll go back to the QBO example, when the data in a particular shard grows beyond a certain threshold, there could be a requirement of dividing that data into multiple further shards and this needs to be supported in near real time as well. So we don't want to cause any disruption to the ongoing traffic and the routing decisions still need to be made correctly. As I mentioned earlier, service owners control the routing configuration. We wanted this to be transparent to the client side, so no changes on the client side, we wanted to give this control to the service owners. So we went out and looked at the existing solutions that the service mesh had to offer. Since we were using Istio service mesh, I'll look at Istio examples here. Istio provides a custom resource called virtual service, which defines a set of traffic rules that apply when a given host is at rest. In this example, whenever a request is made to demo dot greeting dot mesh, these traffic rules will apply and each traffic rule defines a protocol and the matching criteria. In this example, company ID one and two, those requests are routed to shard one, whereas the requests for company ID three are routed to shard two. This is very basic example of a virtual service. And but then the virtual service did not work for us due to some limitations due to our use cases. So there were maintainability problems. What I mean by that is if we have to incorporate a routing logic in the virtual service, there has to be some kind of coupling between the sharding logic and it has to go to the virtual service which needs to be created into multiple clusters where the client services exist. So this is a lot of overhead and then also the size of the virtual service object itself can grow a lot, which is again a management problem. As I mentioned earlier, we need near real-time updates. Imagine a new company getting added and we need that sharding info reflected in the virtual service. This is not possible with the virtual service approach. We have to update the virtual service every time a new company gets added across all the shards again, across all the client clusters. And then if the data can move among the shards as well, virtual service as you are aware it's a static mapping so we cannot make that decision in the virtual service. So due to these limitations, we started designing our solution and I'll walk through that approach, what we took. In this diagram we'll see a request that is originating on the, this block is the client that is making a request to a sharded application divided into three different shards. So when an HTTP client makes a request to the sharded application, the on-way proxy intercepts this request as always. So on-way proxy has a filter chain and Istio provides a custom resource called on-way filter that gives the capability to extend the on-way proxy. And we have built the logic in an HTTP filter, HTTP on-way filter that applies to all the sidecar outbound requests. So what this on-way filter does is, as if you remember the API gateway use case, the same lookup service is called to ask where a particular request needs to be routed to based on the request context. And the lookup service responds with the DNS of the shard that it needs to go to. Then within the on-way filter we are able to route the request to that shard. Now this creation of the on-way filter, we have automated this through admiral, which is an open source tool under the Istio ecosystem name space that Intuit manages. So what admiral does is it looks for, defines custom resources that allow it to create on-way filters and other Istio specific resources in the client clusters. So this is how, with this solution, this is how the dynamic routing at Intuit looks like. This typically source service and the destination service live in different clusters. And for admiral lies on top of all the clusters, it watches all the clusters for the custom resources it defines and it creates Istio resources in all the clusters where it needs to to make the mesh, like the life of the mesh operators and developers easier. So for this dynamic routing use case, we introduced a new custom resource called routing policy. Routing policy defines just the config that needs to go to the on-way filter. So the on-way filter can be created with that config. Now admiral watches for it and it has a mapping of all the clusters where a particular services dependencies exist. So it's able to create the on-way filter and the source cluster as well. So this is how the routing policy gets translated to the on-way filter. Now we understand, so this is basically the manifest of a routing policy on the right and how admiral translates it to a on-way filter on the left. If you observe the config section is pretty much the same, it's just that it's rearranged to fit the on-way filter spec. And the on-way filter that we create is also a HTTP filter that matches all the requests that are for the sidecar outbound context. Now we understand the solution. There were some challenges associated with it that I'll walk through. So we use workload selectors which are provided by on-way filter custom resource to match the client workloads where a particular filter needs to apply to. So because the workloads filter can only match one workload as of now, so what it meant for us is that if a given service has 10 different clients in a cluster, we have to create 10 different on-way filters because the OR operation was not supported, we were not able to use that. We also use TinyGo and Wasm for creating our on-way filters. Due to the limitations that TinyGo has, it does not expose full language feature set. We could not use an internal cache within the on-way filter. So we had to create an external caching service to make the calls to the lookup service more efficient. Also, the logic where the on-way filter makes a call to the lookup service and then it also retries based on if the data is present in a shard or not, that logic had to be pre-built in the on-way proxy. We did that because so anytime we need to update that logic, right, we have to rebuild the on-way proxy and then make sure the client workloads are rotated. We had to do that because the version of Istio we were using earlier did not support Wasm plugin, but now with the newer versions, we can use Wasm plugin, which can load all this business logic dynamically. So that's it for the challenges. I'll hand it over to Vinay now. He'll show us a live demo. Thanks, Pankaj. So before I go through the demo, I just want to explain a little bit about the demo setup. So this is how our demo is set up. We have a sharded application with three shards and we have company data in the 9,000 range in shard one, 8,000 range in shard two and 10,000 range in shard three. And we have a call client, which is basically requesting data from this sharded application. And as you saw before, we have a lookup service and a cache lying in between, which gives us the information about the shard. So let me switch the screen here. So this is a Kiali dashboard. Kiali is an observability dashboard, which shows exactly what a traffic is running within a service mesh and within our cluster. So here you don't see any connections because you don't have traffic, live traffic yet, but you can see all the components that I described. We have all the three shards, the client, the router and cache and the lookup service. So let's go to the configuration first. I hope it's visible to everyone. So this is basically the routing policy we are using. In the same spec what Pankaj explained about. So we have a cache section here and we have a host section here. So host section is basically the URL we are gonna use from the client, which gets overwritten to one of the shard URLs. And on the left, I'm doing a watch. So let me rerun the watch once more. So I'm doing a watch on ordinary filters on the client cluster here. So this is gonna update automatically because once I create the routing policy, admiral adds a on-wi filter onto the client cluster. So let me create that. And here you see a new on-wi filter that got created. We can take a look at that on-wi filter to see how it looks. So if you see here, pretty much the same configuration is copied over from the routing policy to the on-wi filter. One thing you can notice is the workload selector. Workload selector applies only to the client here and not to any other workloads. So this is determined by admiral by maintaining a map. Next I have, I'm tailing logs from the different shards. So this is shard one, shard two and shard three. And also I'm tailing logs from a lookup service. So now we know that on-wi filter got applied to a client so I can run some requests from the client section. So let me copy over the request. So I'm just doing a slash company and using the same URL as one we configured before. So now first I will search for something in the 9,000 range and we expect the request to go to shard one and the lookup service. So here you see a log showing up in lookup service. First it looks up where the data for 9,000 is and then routes it to shard one. So rerunning the request multiple times, you see it just goes to shard one because we cache the data inside our service. Similarly if I try request say in the 8,000 range, we expect it to go to shard two and you see a log in lookup service as well as shard two and repeating the request you just see it going to shard two. Similarly I can also do a request on shard three which is a 7,000 range company and you can see that showing up there. So now we can see this a little bit better in the Kiali dashboard. So here you can see client talking to shard one, shard two and shard three and you see some unknown connection happening to router and lookup. Right now Kiali does not have ability to show on-line filter calls. So it shows it up as unknown but basically the filter is making a call to router and lookup, getting the shard DNS, sending it to the, or rewriting it for the client and the client communicates to the multiple shards located here. So now let's go back to the slides. I just wanted to talk a little bit about the future work we planned for this. We want to also add client side rate limiting using the similar approach of creating on-line filter and add the client side and rate limiting it there and services can control that. Also we wanted to work with the Istio community and see if we can enhance the workload selector. I just saw before we have to create one workload selector per client. If you could find a way to add one workload selector per cluster and apply it to all the clients, we would want to explore that approach. Also we use tiny go as our bosom language and we did that because of more familiarity with the golang but tiny go has a limited feature set and we wanted to look at maybe C++ and Rust to enhance the entire solution to better cater for our use case. That's it for the presentation guys. If you have any feedback, please take a snapshot of this and let us know. Also you can talk to us on the admiral channel in the Istio Slack and also we are open for any questions you have on this. We have one here. Yeah, just one, give me one second, just skip. So in the introduction, you mentioned about the shard application for magnetic applications. Can you elaborate more how you shard your magnetic application to the pods? Can you repeat that once more? So you mentioned this is a sharded application, right? For a magnetic application. Can you explain how you shard that? So we sharded based on company IDs right now. Pardon me? Company IDs, I mean the sharding logic is a little more complex. It's based on business logic but we sharded based on the company IDs. So for us, QBO is the application which is a basically a sharded application and we sharded based on company IDs and that could be based on region or based on locality and things like that. So it's a complex logic. It's not just based on IDs of this range. So the sharding logic, we don't need to, for our solution, we don't really need to think about what the logic is for sharding that the applications are using. We as mesh operators, mesh platform owners, what we do is we provide them the capability to have that sharding logic reflect in the lookup service. And from mesh standpoint, we just make a call to the lookup service and it returns the shard that is there. And it could be the sharding logic itself could be depending upon use case, right? For example, QBO, which shards its data based on the company IDs can do so using the company information and it can identify where a request goes to depending upon the request context, right? And then another, like for example, other teams that enter that shard their data based on customer location, for instance, they can just return, you know, get the request context based on the origin and stuff like that. They can return the shard information and all this is managed within the lookup service. But do you do anything to rearchitect your applications to make them sharded? No. I mean, it was built that way. So we didn't, I mean, if you're asking, are we moving to a regular application to sharded? That wasn't the intention. So they are built that way as a sharded application for scaling. Thank you. I think you mentioned about drawbacks of virtual service and limitations for those. Specifically talking about being able to build routes dynamically and possibly growing virtual service. You know, did you consider creating or adding these through controllers? Yeah, so I just wanted to clarify, it's not like a drawback of virtual service itself, but you know, the kind of use cases that we were trying to solve for with virtual service, we could not use it for. So to your question, do we use controllers to update the virtual services? As I mentioned, we already have admiral that is like a management plane for the STO control plane. We already had that in place and that was the easiest route to take and it could potentially update the virtual services, but then we would have to try the sharding logic. Admiral has to know the sharding logic and then update the virtual services accordingly. So we wanted to avoid that complexity. Thank you. Did you notice any latency issue or memory change in memory or CPU usage in the envoy side cut? Good question. So first thing on latency. So initially we were doing a lookup for every request that definitely added a couple of milliseconds of latency. So that's when we introduced our cache. We still have sub-second latest. Latency, but it was acceptable for our solution. And for CPU usage, yes, we do incur a little bit more whenever you add a new envoy filter, but it was negligible to any other solution we were using. Okay, first of all, great presentation. Wonderful engineering. My question is, so on which cluster did you apply the routing policy? So the routing policy, as I said, the service owner controls how the sharding happens and then the routing policy happens on the destination cluster where the service is running. So the service owner defines the routing policy as they also dictate the sharding logic. So that's the config that gets translated to the envoy filter. And so do they apply it on one sharded cluster or an admiral takes care of the rest or do they apply it on every cluster? Yeah, admiral knows where a particular service's dependencies are. There's another custom resource that admiral defines. It's called dependency. That is how, you know, when a service gets created, there's also a dependency custom resource they define which tells them what all clients it needs to apply to. And each workload that is running in our multi-cluster setup has an identity associated with it and it's unique per service. So admiral knows what all clusters, the clients of that application are running in. And that's how it's able to create the envoy filter in those clusters. It can do so in all of them, to your answer. Thank you. So in the client, in the filter that you built, you need to extract the ID to route the shard to or to route the request to. How are you doing that extraction and doesn't that tightly couple your client filter? We don't. Yeah. So the entire logic of that was in the lookup service. We just forward the path, the entire request path to the lookup service. Path and headers. Yeah. Path and headers and lookup service gives it back. That's our implementation. But I mean, as you said, we could, you could, you know, regex it and break it up and send it. So yeah. So our solution just involves lookup service to take care of that information. Yeah. So our solution provides you a capability to define a routing policy, which creates an HTTP sidecar outbound filter on the clients that that service has. What you do with that on the filter is, you know, dependent upon your use case. Okay. Thank you. You mentioned why the virtual routes would not work for your use case. Are there any other methods that you tried or explored that didn't work for any particular reason? I mean, we tried one more, but it wasn't really for this use case. It was the authentic based on my filter. But it wasn't really for this use case that you can do an external call to an authentication service and get and get it back. But that didn't give us, that doesn't give you a metadata back. It just gives you 200. Okay. And forwards the request. So that kind of didn't work for our favor. So yeah. That's right. Also, we wanted to, you know, any other solutions did not give us the flexibility to adapt our solution for multiple use cases. As I mentioned, QBO charges data based on a company and then other multiple other applications that tend to it with millions of users. They shot their data based on user data. So we wanted to support those and routing policy allows us to give this flexibility to the service owners to define the config they want to use and also, you know, define the lookup service that needs to be used for that. Yeah. If you guys want to try this out, we will push this up onto the admiral repo, which is part of the STO ecosystem. So yeah, this feature is live on admiral. So you guys can play with it as well. All right. There's a question. Yeah. So I have one question regarding, so when the source cluster you have created on my filter, right? There is a VM underscore code section and you have added a binary called wasm binary, right? How do you guys are storing that binary in the container? Yeah. Because we had some issue with binary storing the container because we build the binary and we created some config map and the config map supports are like 100 MB of some maximum. If it is more than that, it cannot even store, right? Yeah. So what we did it basically, we use the engine x proxy and we put all the binaries in there and then we are calling that engine x wasm binary from the container. So how you guys like, if the binary is like more than 100 MB or like how you guys are like using it? I mean, a very good question. So for this solution, right? I'm not sure if this is what you wanted to hear, but for this solution, we prebuilt the on my proxy. We have a, we just do a copy of the wasm binary into the on my proxy image. And we just reference it in our on my filter. Yeah. So this is the reason for this is we used 1.10 and below Istio when we designed the solution and that was the approach we took. But wasm plugins would give you that opportunity, right? Yes, we tried it, but at that time this was like authentication problem was issues. I'm not sure how it was when we tried it around like one year back. Yeah. Wasm plugin, we did some POCs with it. It worked for us. We pull it from something like Artifactory or a Docker hub and it works just fine. Yeah. And did you mention that there's a limit of 100 MB for that plugin to be loaded? No, I'm saying in Copenhagen we've tried with like config maps. Oh, the config map is exceeding the limit that it CD allows. Our wasm binary was more than 100 MB but it was not fine. Got it. Got it. And if you're writing a wasm plugin more than 100 MB, that's pretty much, I don't know. I'm just saying it's pretty big. I do believe that's all the time we have for questions today. Thank you. Thank you. Thank you.