 OK, echo, echo, echo, echo, echo. Oh, OK. OK, that's much better. OK, so I'm R.V. Tuch, and hey, I'm going to be providing you a tour through the Next Generation Envoy Service Discovery APIs. This is sort of collaborative work between Google, my employer, and Lyft, the famous McFlyne. And hopefully, I'll be able to present the work of these folks in a good light as we go on a tour through, as I guide you through these APIs. So a good sort of starting point for the Envoy APIs is what does the Envoy configuration look like today? Cherie talked a little bit about this. In terms of where configuration comes from, there's really three main sources of configuration for an Envoy instance. There's the static JSON file, which is read at startup via Envoy via the command line. There's a hierarchical runtime feature flags tree on the file system, which is monitored via iNotify for dynamic updates. And this provides feature flags for Envoy, as well as the ability to override some of the parts of the static configuration. In addition to this, there's dynamic discovery of resources via these XDS APIs, things like LDS, RDS, and CDS, and so on. This comes via REST polling from one or more management servers. So to just give you a feel, I think actually Cherie did a great job of describing what goes into configuration. We can actually just look at a simple example of a concretely. And this is actually probably the smallest example of what an Envoy config looks like. And this is just a simple file, a config which tells Envoy to forward all requests on port 10,000 to Google.com. And the config is a JSON file consisting of a number of resources. It's the idea of listeners, which describe which IPs and ports Envoy listens on. Not in this example, but in general, this would also include TLS certificate information. It includes route configurations, which provide mappings from domains and paths to services of clusters in Envoy parlance. And then from these clusters, it provides a mapping to the actual endpoints, which we want to forward traffic to. Many of these resources are discoverable by the RESTful APIs which exist today. And there are five of these. Well, the four key discovery APIs and the rate limit service. And I include that because that's also part of the V2 work that we're doing. But I'm not going to actually dwell too long on this slide, but I think that we've already had a really good introduction to what each of these different kinds of XDS services does. So jumping into the V2 APIs, we had a number of design goals when we went into this exercise. This is an opportunity to add things to the Envoy APIs to address new use cases, some limitations in the existing APIs. One of these was, and in particular Google, is we're interested in supporting a variety of deployments for Envoy. So today, Envoy Istio is used, let's say, as a sidecar. It can be used as a middle proxy. There are some things that need to be done to make it a better proxy to use on the edge. So we tried to address some of these in V2. We were interested in looking at some, and I'll touch more on this later, different aspects of consistency for API delivery and this efficiency of API delivery. This polling rest methodology has additional latency due to the polling delay and to use a necessary load on the management server. We were interested in providing a migration path from the V1 APIs forward. But the key principle I think that we're working with today is that once the V2 APIs are in place, the V1 APIs will be essentially frozen in time. And we're not going to do any further work on enhancing the V1 APIs. V2 is essentially, once we've switched these out in production, will be when new features get added. In addition to basic resource discovery, which is what the V1 APIs are largely about, we're also interested in adding ways in which Envoy can communicate back to the management server to allow it to make more intelligent decisions when it comes to things like endpoint discovery. These include providing more useful information that provides statistics and resource utilization for load balancing and also for health checks, providing the ability to have scalable health checking. And this is what this problem was also touched on just before. Finally, the V2 APIs are a great way for us to go in and clean up the technical debt that existed in the previous APIs. So why GRPC? Well, GRPC actually maps really nicely to many of the things we want to do in our APIs. All the sort of core APIs which exist in V1 today and still exist in V2 are essentially subscriptions. You declare each of the management server. I'm interested in this set of routes, route configurations, this set of clusters. I'm interested in all the listeners I'm supposed to consume. And please push me updates when they arrive. The rest, you do this by rest as additional overheads in terms of load, latency, you do sort of, you generate the request to ensure you don't storm the management servers in a synchronized way. It's, yes, you basically sort of avoid any of this by sort of switching to a streaming API. In addition, the current sort of pole-based rest API is essentially a one-way communication. The management server is saying to Envoy, here's an update, do with it what you will. In order to be able to do things like, for example, have Envoy say, I'm interested in the resources for cluster X, please send them to me. And then I will acknowledge whether or not I actually accept them, perhaps due to, you know, say a configuration mismatch between the management server and the Envoy instance. You kind of need this sort of two-way communication to happen between Envoy and the management server. In addition to communicate back things like loads of balancing and sort of help check information from Envoy to the management server, we need this sort of two-way stream to happen. And that's very hard to map on to rest API semantics. Things like X and PP have done this, but it's very tricky to actually get rights and GRPC is a natural fit here. The V2 APIs largely follow on from this sort of trend in the V1 APIs towards making things more dynamic. In fact, most of the resources that you're interested in discovering today with the recent introduction of LDS can be discovered dynamically. You don't need to write a static JSON config file. It can really be quite minimal. There's still those significant overlap between things that can be specified in the static JSON config, which is read at boot time, at initialization time, and what's delivered by APIs. The V2 API, we're planning on making all this essentially dynamic. We have a very minimal static Bootstrap configuration. And even if you want to specify and configure Envoy by the file system, this will be consumed from a separate set of files which are monitored via iNotify in a similar manner to the runtime feature flags. And these can be actually just dynamically written to and updated. In addition to that, we will have, as I mentioned before, streaming APIs via GAPC and Proto to Management Servers. And for a subset of the APIs, we'll also offer a V2 REST API, which will also work with JSON, although a different fragment of JSON than the one that's used in V1. This will essentially be the Proto's in JSON form. So the configuration story looks kind of like what it did with V1 with a few differences. We have GAPC between the Management Server and Envoy. And we also have iNotify sort of watches of these configuration files and the ability to specify them via the JSON or Proto. There's a bunch of neat features we're trying to also add in V2. We would like to have versioning of all of these API updates. And that plays a role in how updates are provided by the Management Server. Envoy will announce the version that it's currently running to the Management Server. Management Server will deliver what it believes is the next version that belongs to that Envoy instance. And Envoy is then able to actually act or knack by essentially reflecting which version was applied in its next message. We have added the ability to add metadata to various objects in the configuration, which they can then be combined as a request match occurs as you pass through the listener selection, then route a match, and so on to be able to actually provide rich information in log messages and statistics and so on. We have canaries of first-class concepts for configurations, not just RAM points. So we could actually have Envoy treat separately in terms of its behavior, in terms of what it's putting out in stats, what's going on when a configuration is being canaryed. And Proto 3 has become the single source of truth for the API. Previously, we had a JSON schema and documentation written in natural language. We're now moving towards a world in which we just have everything written in Proto 3. And that will essentially describe the GIPC APIs. It will also provide JSON equivalents of all of these objects, because Proto 3 is a canonical JSON representation. And we also plan on generating documentation from these Proto's. These are all available on the GitHub, which I linked to in my first slide in a little bit. So in terms of how the APIs have changed, the most interesting ones, I think, to pay attention to are LDS and EDS. For LDS, and this sort of speaks to our interest in using this sort of flexible and diverse configurations. We've made PLS context significantly more powerful. You can specify multiple certificates per listener. If each port that you bind to, you could provide a bunch and use SNI or certificate type to select between them. We've added a bunch of TLS features to improve security there and make them more useful. We've added the ability to sort of listen on, actually, a large swath of IP space. The destination IPs that you bind to and listen on can now be described using a very flexible scheme using CIDR prefixes and suffixes. And each listener can actually have distinct filter chains associated with it. And these are selectable using these IP ranges, things like source port, as well as SNI. RDS hasn't changed a ton. RDS is actually similar to what it was in V1, with the ability to do some additional hash-based affinity using cookies and a few cleanups. EDS is probably where things are actually most interesting. And EDS is actually a rename of SDS, the service discovery service. This largely reflects some ambiguity and confusion in that name that presented. EDS is pretty clear what's actually going on here, where learning about the end points for a given cluster. The basic EDS API looks very similar to EDS in V1 of Envoy. But when you get into the more advanced features that can be opted into, it allows, for example, and I think it was a question about this earlier, the ability to load balance not just equally across all endpoints in a particular cluster, but based on either some weighting or also to take into account their locality, for example, using their zone or region, or even like a subzone specifier to influence how traffic is weighted. There's actually a hierarchical weighting when load balancing takes place. We also support something similar to the label-based routes as the first class. Label-based routes from ISCO is the first class concept in V2. We are able to essentially specify labels on end points that are delivered by EDS and have the route configuration specify which endpoints you're actually interested in load balancing between. For example, you could say, 10% of my endpoints version B, while the rest are version A, and specify AMB in the weighted clusters as provided in RDS or in the route configuration. Finally, we support this stats reporting, which I'll talk a little bit about in the Advanced API section. CDS and the written meaning service barely changed. They basically just changed from JSON to Proto. So that's the end of the core APIs. These are all available, as I mentioned before, by three mechanisms, file system, REST, and GRPC. The Advanced APIs are GRPC only because they require two-way communication between Envoy and the management servers. First of these, and the three of these, is EDS, multidimensional load balancing. The idea here is the management server might want to not just use, for example, the QPS, which you can grab from the stats today to make a load balancing decision when assigning end points or weighting these end points. They actually want to use things like CPU utilization and memory of the end points themselves, as well as stats that Envoy itself is able to report. With the stats reporting for multidimensional load balancing, Envoy, and this is sort of an opt-in thing, listens for HTTP response headers that are provided by end points, which actually include essentially some of these metrics. And it's able to then aggregate them for all end points, and periodically report back to the management server, both the stats that it can gather without the involvement of the end points, as well as the endpoint stats. And then the management server is able to make some intelligent load balancing decision using this information and supply this via EDS. So that's pretty much what it looks like. OK, help discovery service. So as you scale up the number of Envoy instances and end points, you're going to do today sort of an n squared problem of in terms of a number of health checks that are taking place, because Envoy is performing these health checks locally and actually sharing this information. And this presents a scalability problem. HDS is an API by which the management server can communicate with Envoy instances and assign them specific end points to health check, have the Envoy report back the health status, and then share this information with other Envoy's largely through the form of EDS by actually through the actual endpoint assignment, only healthy end points are assigned. And the final of these sort of advanced APIs is ADS, the aggregated discovery service. So a couple of times today, it's come up this idea that Envoy is eventually consistent. So what that means is, for example, if we receive an RDS update mentioning a cluster transition for some particular route, that cluster information hasn't been delivered yet by DDS or EDS. We'll have a period, pretty small today, which we're essentially going to drop traffic on the floor. By careful orchestration of these updates, it's possible for the management server to avoid this. And this becomes much easier if all of the APIs that are actually being delivered are coming down from a single management server, i.e., we have hard stickiness with single management server, and these are delivered on a single GRPC stream. That's essentially what ADS is about. It's about muxing the various core APIs across a single GRPC by-dice stream and allowing a management server to then carefully sequence API updates to actually get something a little stronger than this eventual consistency which we live with today. This is kind of important in scenarios where it's not acceptable to drop traffic on the floor. So this is pretty much what Envoy looks like today. You may actually be speaking with one or more management servers. I mean, in the Istio configuration and so on that we've seen before, this has actually been a single pilot, single management server. But in general, this is something which we're going to see in arbitrary Envoy deployments. With ADS, we essentially have the ability to multiplex each of these individual APIs on that one spin. So in terms of status, we're actually it's a feature completeness now. We've basically locked down the major points in the API, and these are up on GitHub. And we're actually working on implementing this in Envoy. And this isn't a complete freeze in the API. There's likely to be small changes as we discover bugs during implementation, things that are necessary for V1 backwards compatibility. V1 itself still continues to evolve since V2 is not in production yet. And so we actually have to chase a moving target there. We're working on adding support for the base infrastructure to actually consume these APIs. And most of that's there. We have the EDS V1 subset implemented with CDS in flight of V2. And Matt recently implemented an LDS V1, which is actually a necessary precursor to LDS V2. This is what the roadmap looks like. And it's quite a bit actually involved here. This will probably play out over at least a year. We're initially aiming to reach parity with V1 with the basic APIs, add the ability to generate documentation, achieve things similar to what we do with the JSON schema today, add it in the Bootstrap configuration. And at that point, we should actually be able to run Envoy with the V2 APIs with the same set of functionality as V1. Following from this, because we have an interest in this at Google, we'll probably add ADS as the next step. And then fill in some of these additional features in LDS, RDS, and EDS, which are actually the defining features of V2. And then way down the track, we think like HDS is probably where we want to go. I mean, obviously, if someone's interested in HDS before and is willing to actually go in and implement HDS, we would love that. So with that said, yeah, any questions? Yeah. No, at Google, we're building internal cloud product. Well, we're building cloud products based on this. So yeah, we're not using Envoy in a side cast scenario. So it's closer to, let's say, the middle proxy example that we've provided. Oh, I mean, for Microsoft's developers, absolutely. You need an additional level of abstraction there. This is really for folks who are building, I guess, of interest, like sequencing API updates and so on, is the audience who are developing tools like Istio, or actually, for example, using Envoy as an edge proxy in the cloud offering or something like that. Yeah? Right. So today, Envoy produces a number of different logs and stats. So it provides HTTP access logs. It also provides sort of debug logs, telling you what's going on when warnings are triggered and things like that on Envoy itself, which are nothing necessary to do with the request, but have to do with the actual behavior of the Envoy process. It also exports out statistics via stats D, which are pretty flexible. And on V2, we're actually planning on adding in support for pluggable stats and logging providers. The idea is you may actually want to send logs, not, let's say, by plain text to a text file, but instead, by a proto, or be able to send out stats also by a proto and GIPC to some arbitrary server. So instead of having to rely on stats D. In addition, the idea is to provide the ability. And this is what this metadata point I touched on before is, maybe by annotating things like your routes and your listeners with arbitrary labels and things like that, this provides a lot of insight into exactly what happened during the request match. So if you log, for example, you're producing HTTP access log and you're able to produce some label which was assigned during listener match and during route match, maybe even that was used during the endpoint sub-setting, that actually also provides an additional insight. Let's just say that stuff's all in flight right now. I probably can't talk too much about it. I'm not sure what we're saying publicly about that, but we talk to each other, yes. I would say we should have this parity with V1 within a couple of months. Do you actually reach the point where, so maybe we will hit three or four within a couple of months. Hitting six is probably quite a bit further out and seven is off on the horizon. We don't have anyone actively looking at seven today. Right, so the idea is we'll continue to allow V1 to be consumed by Envoy. We don't actually have a translation tool. What we have is code in Envoy which takes JSON objects and turns them into proto's suitable to pass into the code path where V2 is consumed. The idea is that as we, once we switch V2 into production, we feature freeze V1, folks would be expected to transition towards V2 by updating their configs. This should be largely seamless to many folks who are using Istio or other abstraction layers where they don't actually need to manage these configurations themselves. Just in some Istio updates, you will start producing V2 configs. Well, if you're building your own controller, essentially, no, you'll have to go off and implement the V2 APIs there. Yeah, I mean, that won't happen until we essentially reach step three here. But at that point, I don't think, let's say someone comes along with a new feature or wants to add a new configuration item. I don't think at that point we will add it to the V1 API. So there'll be a period which they live alongside each other and it's essentially frozen in time. That's correct. Yeah, I mean, the key thing I think varies that the endpoint itself will also participate in providing that information via these HTTP response headers where it's able to say, hey, I'm at 80% CPU and 5% memory utilization, that kind of thing. Yeah, I mean, I think that that would be a natural thing to do would be to factor that out, yeah, into a filter. Yeah, okay, thanks.