 Cool. So, uh, hello, I'm Aditya. I work at touch rate and I'm also a senior in high school and Hey, I'm John Howard. I'm a software engineer at Google. I've been working on the Istio control plane, which is a non void control plane for about three years now. Cool. Okay. So in this talk, we'll cover basically, um, uh, Delta XDS. So what Istio is first of all, um, we're also going to cover what we've done to our control plane and why that wasn't sufficient for scaling it up. We're going to talk about the solution that we think is good. And it's Delta XDS and how that helps with the scaling. We're going to talk about how we implemented it, like the specific details and, uh, the problems and solutions we faced, um, with implementing Delta XDS. And we're also going to talk about the future of Delta XDS with an Istio. So that's a lot to cover. Okay. So first of all, we have to understand what is Istio. I'm sure most of you know, but it is a service mesh that uses on void as its backbone. So it uses on void as a sidecar proxy and also as a gateway. Um, we can see that a configuration is distributed to the proxies via the control plane, um, over the network. Um, there's a component in the Istio control plane called pilot, which does all the configuration, um, management. So that's how configuration is sent, uh, to all the proxies. And we'll talk about that as well. So then the other big thing is what is XDS. So XDS is essentially, um, it stands for like this wild card discovery service where, um, like you can have it's a way to like propagate configuration to Envoy's dynamically, uh, whereas, uh, the alternative would be statically like hot restarting Envoy with this configuration file. Um, it's either implemented over the file system. So you'd set up a file file system subscription or you can do over the network, which is usually the case with the rest or GRPC, and it's usually a GRPC bi-directional stream. Uh, so that's what XDS is. Um, so over the years as Istio has become more and more popular, we've seen that users are also deploying it in larger and larger environments and we've kind of had a cat and mouse game of catching up as people are scaling up. Um, so in the early days of Istio, if you were deployed, you know, even a moderate sized mesh with say a hundred workloads, uh, control plan would oftentimes be using multiple CPUs just to service that. Uh, whereas our users are trying to deploy in environments with, you know, hundreds of thousands of pods and some of the larger cases. Um, in many cases because we support multi-cluster as well, we're not even bound by the size limits imposed by Kubernetes. And so there's really no bound on the scale that our customers may require. Um, so over the years we've invested a ton of effort into optimizing the control plane. Uh, we've done a variety of things to do this. Uh, a lot of work we've done is just in micro optimizations of our code, you know, reducing allocations where we can splitting up a loop where we, where it's able. Uh, one of the big changes was, you know, some changes to how we encode our proto buffs because all of this configuration is sent over to your PC. So a lot of time is spent on marshaling and marshaling those, uh, proto buff objects. Um, some other things you've done are trying to make our control plane smarter. Um, so when a configuration update comes in, say like a Kubernetes service changed, we would typically just send, recompute all the XDS configuration and send it to all the proxies. Uh, a lot of times that's pretty wasteful. There's a lot of updates that can happen to an object that don't actually change the configuration. And so we've added some, you know, logic to skip pushes when they're not necessary, which has also made a huge improvement. Uh, we've also done some other things. We've changed the actual configuration we send to make it smaller, uh, not for behavioral changes in envoy, but just to improve the performance. Uh, we've started doing caching of pre-martialed XDS responses so that we don't have to go through the full, uh, generate, Marshall and send, we can just skip to the send. And we even started doing some more segmentation of configurations that the proxies don't see all the configuration they see only a smaller subset, which is of course more efficient to optimize or to send. So despite all that, we still haven't really met our performance goals. Um, you know, while on a larger size clusters with say a thousand pods, uh, it's usually no problem for Istio as we started to get into the mega clusters with hundreds of thousands of endpoints, we're still having issues with scalability of the control plane. Uh, fundamentally what that comes down to is just the math of the, of the scaling. So if we look at what impacts the load of the control plane, there's really three main factors. Um, the first is the number of workloads. So for every workload connected to the mesh, we need to generate configuration for it and keep that up to date. Um, the next one is the resource size. And this is actually impacted directly by the amount of workloads because the configuration contains information about how to connect to each other workload. Um, so just all the list of all the other IP addresses, but it's actually quite a bit more information than just IP addresses. So this adds up quite a bit. Finally, the third factor is the rate of updates. Um, this also is typically tied to the number of workloads just by laws of random chance that as you have more and more workloads, it's more likely that one of them is going to scale up, scale down, be restarted for an update, et cetera. Uh, so putting all that together, we kind of have a polynomial scaling problem. Um, which is obviously not good. So to put that into perspective on a, a cluster with say a thousand pods, for example, if we had a one megabyte resource size, which is something that is not unheard of in our air control plant implementation, anytime of hod changes, we would need to generate and send over the wire a gigabyte of data. Um, that is not just a gigabyte of garbage data either. We need to go generate these proto buffs and then we encode them. So there's a lot of CPU spent doing that as, and then we also have to garbage collect all that gigabyte of data and send it over the wire in the case of a rolling update, say we're, you know, rolling out a new update to all a thousand pods. Uh, I can add up to a terabyte of data that we're sending over the wire throughout that rollout. Um, that is quite a bit of data. Um, you know, we're not a CDN that's designed to be serving high amount of data like this. And so this is imposing pretty serious issues. So what we kind of realized was that no matter how fast we make the control plane, even if we really made it instantaneous to generate the configuration, you know, with the scaling problem like this, it doesn't really matter. We can't keep sending like a terabyte of data over the wire when there's a rolling update. Uh, otherwise we're just not going to be able to scale to where we need to be. Yeah. So, uh, I'll talk about, um, the current type of XDS we run on. And so configuration is distributed through XDS. Um, so the way we do it right now is through the state of the world. Um, essentially what this means is that for every single proxy that cares, we have to send every single resource that it needs to run every single time. Uh, so that can be quite a bit of data, as John mentioned. Um, so if in this example, you have three clusters, only cluster two changes, you still have to send all three clusters. This is a really trivial example. Usually it's a lot more clusters. Um, so the control plant and envoy need to process these configs. So it's kind of like both of them need to process it, even if only like a few of them are changing. And this can cause combinatorial explosions during these updates, uh, triggering, triggering gig gigabytes of data, even if only one service is changing. Um, so that's state of the world. And that's where we are right now with Istio. Um, but what we're working on is Delta XDS, which is the solution we think. Uh, so there's also incremental XDS and this allows us to only send the resources that changed, um, to envoy. So in this example, if only, if only cluster two changes are the three clusters, you only have to send cluster two. Uh, this seems like the intuitive way to do it. Like it's just, if only that cluster changes, you should only be able to send that one. Uh, we implemented it with state of the world and a lot of control planes do because a it's easy and B it's reliable. Um, so that's why we did it that way. But now we're trying to move with the Delta XDS. Um, and so again, this dramatically decreases the configuration. We have to send over the wire and then we'll go over some benchmarks as well. Um, yeah, I just want to go over like some of the key differences in the request response around a review between the control plane and envoy. So with the state of the world, um, you, you, but you can specify a bunch of resource names that you want. Um, envoy usually will just say, I want all the clusters and it'll just say, and that that will mean that I want a subscription to every single cluster. Um, that I care about and the response will contain all these clusters. Um, and like John said, this is a lot of data, but with Delta, um, it's basically the same for the request. It'll just subscribe to all the clusters, but with the response, like it'll only send what changed. Um, so in the resources field, whereas previously you'd have all the resources. Now you only have like if anything's added and if any resources were removed, uh, then you can send those as well. Uh, the names of those resources and envoy will remove it internally. Um, so yeah, that's the way, um, the request response works for these two methods enabling, uh, Delta XDS within Envoy is really simple. Uh, you can see it's just a API configuration change from CRPC to Delta GRPC. Uh, so it's really simple. You just put that in your Envoy configuration and start Envoy. And now it starts listening or looking for the Delta API. Um, go control plane, uh, which is a control plane that I think, um, one of the speakers talked about, uh, supports Delta XDS out of the box and we'll talk about how that happens. Um, but yeah, so if you're using go control plane, uh, you can just enable Delta XDS and you're good. Uh, no additional configuration needed. Um, yeah. So now we'll go over the go control plane implementation. Um, and this is important to understand the contrast with Istio's implementation. So the way it works with go control plane is that like, uh, the users of the go control plane library will provide a snapshot to, um, the two go control plane and that snapshot will be served to the proxies. Um, and so this snapshot consists of all the resources that, uh, the proxies need to care about. Um, and the key point here is that the computation of the snapshot and the serving of the snapshot are completely decoupled. Uh, so it's not that like one triggers the other. Um, the snapshot generation happens and then whatever the pro whenever the proxies need the data, it is served to them. Um, and so this makes it really easy to either do state of the world or Delta XDS because, um, if you want to say to the world, just send the snapshot. And if you want Delta XDS, you can store the old snapshot. And then when you get a new snapshot, you just diff them, you get the deltas and then that'll be the deltas that the proxies care about. Um, so that's the way go control plane does it. Um, so incoming Istio, uh, so we don't store any snapshots in memory at all ever, um, because it'd just be too many. Um, if we were to do that, we'd have to store a different snapshot for every single proxy whereas with go control plane. It's just one snapshot for all the proxies. Um, and so this would lead to like what gigabyte stuff in memory data. And that's relying on that. So we can't really do that. Instead we do configuration generation on the fly. Uh, so what this means is that whenever we get a request, uh, from a proxy saying, Hey, I want this resource type or something, um, we will generate that for them on demand and then send it back to the proxy, um, on a per proxy basis. So the, so the response, like the resources will be different for every single, uh, proxy, uh, because their visibility is different. Um, and so basically, uh, because we can't store the snapshots of these resources in memory, we have to find a different way of you doing Delta XDS. Um, and we'll go over the way we do that. Um, but yeah, so the key point with Istio is that the config generation and serving is completely, um, coupled. So it's when you get a request, then config is generated and then it served. It's not like it's stored somewhere. Okay. Okay. So our initial implementation, uh, for the Delta API was just literally taking the state of the world generation and sticking it on to the Delta API. Um, so if you use Istio 110 and enable Delta XDS, I'm sorry, you didn't get any performance improvements because, um, well, we didn't send in deltas. Uh, it was just a state of the world generation over, um, the Delta API. And this was just in an effort to get the core infrastructure in place for Delta XDS without rewriting the entire configuration generation, um, pipeline. Uh, so yeah. So here you can see that even if one of the clusters changes on the, it'll be on the Delta API, um, all three clusters will still be sent, um, for the initial prototype. Um, yeah. So now John will talk about problem and solution. Yeah. So once we implemented the initial Delta XDS, uh, you know, we were all very happy, pat ourselves on the back, and then quickly realized there was a lot of issues with it. Um, so the implementation that passed all of our tests, which are reasonably robust. And so we had a lot of confidence in it, but as we started to go into actually optimizing Delta XDS rather than just kind of the state of the world over Delta XDS, uh, we found all sorts of issues, some of them very obvious. And we were very confused because everything seemed to work. And what we realized is that there's a lot of ways that you can have a control plan that is totally broken, but also works correctly in Envoy. So for example, we found a bug where in every single response for EDS, we were telling Envoy to remove every endpoint. Uh, obviously that seems horrible, but it turns out that Envoy actually ignores the removed resources for end points, um, which is because it has a parent resource, the cluster. Um, another example is that when we, we occasionally failed to remove a resource, a resource at all, and that's generally fine. We are not really bothered by extra clusters in general, uh, but it does leak a lot of memory and waste a ton of resources and both Envoy and the control plan. And so at this point, you know, obviously we fixed these bugs, but we were worried about the general problem. How do we ensure that our Delta XDS implementation is as robust as our old one? And so what we came up with was sort of this dry run mode. Um, so what we would do is that for each request, we would run both the old state of the world code, which we trusted was fairly robust and the new Delta XDS code. Um, and so what we wanted to do was compare those two to see whether the Delta XDS response was accurate. Um, in order to do that, we started storing the previous state and this is when we're in like debug mode only. So normally this would be far too expensive. Um, but with the previous state of the state of the world response and the new state, we can then kind of compute the optimal Delta XDS response. And then we can see how we did with our actual Delta XDS response. Uh, so this helps us detect bugs such as failing to delete the resource that we should have. If we deleted a resource that we shouldn't have, or maybe if we failed to update, uh, failed to send an update that was required. It can also help us optimize, um, our solution because we can see if we sent an update over Delta XDS, that wasn't really required because the configuration didn't change at all. Another issue we saw was with big resources. So we had kind of naively hoped that Delta XDS would be the silver bullet, um, which wasn't quite the case out of the box. So with Delta, you get the granularity of a single resource. And if your resources are giant, it doesn't really help that you can only send one at a time. Um, you know, for example, at a Ingress gateway, it's very common to have many, many different virtual hosts, you know, over the same port. And so if we still have to send that entire set of virtual hosts every time, one changes, we haven't got much benefit out of Delta XDS. So one of the solutions that have been, has been kind of actively pursued an envoy and it's still ongoing is kind of decomposing the resource types. Um, so far there's been three types that have kind of undergone this. One is the virtual host discovery service, which is splitting up the routes by virtual host. Uh, that's kind of the example I showed before. Uh, another one is the locality endpoint discovery, which is splitting up the end points that you can send smaller chunks of end points. This is really helpful if a service has a ton of end points that you don't want to send on every update. There's also the extension config discovery service for sending individual filters. And so what this allows us to do is to break up our components more and more, so that we can send smaller increments, uh, you know, rather than the full thing. Uh, finally we have an issue where, you know, if we have a giant configuration, then it's, it's slow at startup. And this can be problematic, especially in cases like serverless where startups be directly impacts, user facing latency. And so with Delta XDS, when we start up, we still have to send all the configuration. It's only on the updates where things get quite a bit better. Um, so one of the solutions that are being pursued on the Envoy side is this concept called on-demand XDS. And so rather than send Envoy the full configuration, we can send it a very small amount of configuration that tells it to just lazy load or just in time load the configuration it needs. So for example, when a client requests example.com, if Envoy doesn't know about that virtual host, it can ask the control plan for the route that route or the control plan can send it back and then Envoy can serve the request. Um, so that allows us to have this rapid startup speed and still get the benefits of, you know, the vast configuration that we want to send to Envoy. Okay. Yeah. So now I'm going to talk about the implementation with an Istio. So our Envoy, so the way we use Envoy is the Envoy configuration is dependent on a bunch of Kubernetes objects. Um, and so based on whenever these Kubernetes objects change, we can determine what Envoy resources change and then determine what should be pushed basically to every proxy. So for instance, if justice service change is with a certain host name, we have to find all the clusters with that host name. And then once we find those clusters, we have to push those clusters to the proxies that care. So this is essentially the same as sending the deltas or like Diffing, like the control plan does because we're getting updates. So yeah, this is the example. So if you like service A and that'll correlate to a few clusters with the same host names, the Envoy proxy.io will have same, those same host names as long as they match, we'll push those clusters, we'll generate and push those clusters as opposed to generating every single cluster ever. So it's much, much better. So yeah, now I'll go over like the push flow. So for, we've only implemented Delta XDS with the cluster discovery service or CDS. And this is because we saw the biggest impact in CDS without like the most implementation. So here's the push flow for CDS. So essentially the Kubernetes Informer API, we use that to figure out when these objects change. And so when a certain service changes, we'll get notified of it through the Informer API. And so this will lead our control plane to generate configuration data. And that configuration data is just the deltas because we got the update as it's like a service was updated. Then we know that whatever comes from that service, whatever clusters are associated, those are only the changes of clusters as well. So we then gather those clusters. And so we know those are the deltas. And then we'll push those selected clusters to those selected Envoy's that care about those clusters. The same works for deletions. If a service is deleted, like basically we'll just compute like what clusters are still associated with that service. And then we'll just tell Envoy, hey, delete these clusters. So yeah, that's the push flow for the cluster discovery service. It's pretty simple, but as we add more and more like services, we're working on destination rules. And so as we add more of these objects, it should get more complicated. So the fun part, performance results, so John created this benchmark where we scaled a mesh from zero to a thousand pods and then we updated a service every second, changing something arbitrary and essentially this is a graph of network throughput with the time and the area under the curve from like any given point is the data being sent at those times. So which state of the world, which is the first one, like we can see just kind of like almost constant, huge wall of configuration being sent. That's a lot of configuration. I think if you found the area under the curve, it'd be a non-trivial amount. And so that's what's the state of the world. And we can see that it remains relatively constant. So that's a lot of data being sent. And then we can see after like around 16, maybe 13, we can see the delta benchmark. So essentially the area under the curve there is also the amount of data being sent and it's significantly less. There is that initial bump that we see. John talked about it. It's the slow bootstrap. We have to send that because on one needs to know about itself and other things as well. The solution to this is just on-demand XDS. So if we do that, then there'd be no bump there and just be like flat. But essentially after that initial slow bootstrap, you see it kind of trails off to almost nothing, but that almost nothing, that like a sliver of data is actually the amount of data being changed and that needs to be sent to the envoys. So the difference between delta and state of the world is the amount of redundant data that's being sent across your network from the control plane to envoy. So this is the benchmark and it shows that delta is significantly better performance-wise. So once this is implemented mesh-wide for a bunch of resources, there should be quite a bit of performance improvements, like a noticeable performance improvement in your mesh. So use delta XDS. Yeah, thank you. Any questions or? Okay, yeah, sure. So you talked about enabling this through the Go control plane, but could you explain exactly how to enable this in Istio? You said it's not enabled in 110. Should we expect it to be fully enabled in 111? Definitely not. It's nowhere near where it needs to be, but the way you enable it is just by setting an environment variable. I think it's Istio, delta XDS, true on the agent and the control plane, and then that'll work, but it's nowhere near stable. Like, don't get excited about it being implemented by default. The issue with delta X, or not the issue, but the inevitability with delta XDS is that you need to get it right, because if you don't get it right, you're going to send incorrect updates. And if you do that, well, your mesh is out of sync and that's bad. So we have to get it right for it to be on by default. Yeah. Any other questions? Yeah. Um, yeah. In the protocol itself, I don't think there's any sort of reconciliation. Um, oh, sorry, I'm supposed to repeat the question. The question was, is there any sort of reconciliation loop in between Envoy and the control plane to make sure that it doesn't fall out of sync at some point if we miss an update? Um, so like I was saying, there's no specific, like, reconcile aspect of the protocol itself, um, but the control plane does have the ability to push, uh, whatever configuration at once. And so you could reasonably say every hour we're going to compute all the configuration, uh, you know, like the old state of the world code, send it over to Envoy just in case we, you know, made some mistakes in the past. Uh, we're not currently doing that, but, uh, you know, it's definitely possible. I do think that if we, you know, it's a good fallback plan, but if we, if you're in that state where you need that, then you may have some serious issues because, uh, you know, if you miss a security update or, you know, some other critical update, even if you have the loop an hour later, it may be too late. Do you have performance numbers about CPUs and memory usage? Do you have performance number about CPU and memory usage, not the bandwidths? Um, we don't have a graph to show you today, but we have done some, some analysis and it is, it is quite a bit better. Um, you know, generally the ECO control plan is spending 90 per 5% of its time on this code, um, you know, generating the XDS configuration, especially marshaling it. And so we expect as we're doing far less of that, that it will go down substantially. Uh, one thing we're also pretty interested in is the CPU and memory usage at the Envoy side and processing XDS, uh, which is oftentimes a bit overlooked because it is relatively small for a single Envoy. But if you have 100,000 Envoys, then even a small increase makes a big difference. Uh, so we haven't measured that yet, but we are excited to see, uh, the improvements there as we, as we progress. Okay. Thank you. Thank you. Thank you.