 Good morning. Thank you for your interest in the future of Envoy-based caching. I'm Todd Greer, and today I'll be describing the implementation of Envoy's HTTP caching filter. But first, I've asked my colleague, Josiah Kiel, to say why you want caching and how to enable it. Josiah, why does Envoy need a caching filter? The architecture that we have in mind when designing the cache filter is one where Envoy serves as an edge proxy. We have all of these clients out on the wide internet connecting to our infrastructure through an Envoy, which then does back-end service picking and returns the content from those services back to the clients. In order to reduce the load on these back-end services so we can scale them up more, as well as reduce the latency for retrieving the content in the first place, we want to have that Envoy cache the content where possible. So whenever the content is cacheable, when it comes back through the Envoy from a client request, we will insert it to the cache via the cache filter, as well as proxy it back to the client. That way subsequent requests that come in will go to the cache filter, get a cache hit, and go straight back out to the client without incurring the back-end service cost. This is particularly useful when you have widely distributed architectures, where the services could be in different data centers or different cloud regions or however you might imagine. We want the content to be as close to the requesting client as possible, and so we can deploy Envoy instances way out in satellite locations, which may or may not have instances of the service that they're asking for deployed there. That Envoy would then route that traffic to the data center where the services exist. That request would be processed, content would be retrieved, sent back through the internal infrastructure to the Envoy where the client's requested it and give the content back. At that point, the content will get cached locally as close to the client as possible, making all future requests substantially faster, because we don't have to make these long distance remote service calls. Another situation where this might be useful is if you have Envoy deployed in a service mesh, where Envoy is handling the intra-service communication within your back-end infrastructure. This isn't the first architecture that we're considering when designing the cache filter, but I can imagine especially with an in-memory cache, it could be useful to cache the content that one service is requesting from another to reduce the traffic passing between the services. How do I use this cache filter now that that sounds great? We can see how it will help. The simplest way is to take a look at the cache filter sandbox, which exists for cache filter developers, to spin up a quick Envoy instance that has caching enabled. The config that turns the caching on is one that adds the cache filter to the HTTP filter chain. At the place where the cache filter is inserted into the chain, the request coming through will make a look up in the cache, and then we'll make a look up to the cache that's configured here. In this case, it's configuring the simple HTTP cache and retrieve the content from there. Anything else that affects the cache behavior, such as what very headers do we respect from the back-ends of how this content will differ from request to request, that also gets configured in this config. Very likely as feature development continues, we will add a bunch more configuration options to the Envoy config. The things noted on the slide don't exist yet, but we expect them to in the near future. Thank you, Josiah. So how does cache filter work? If you're watching this presentation, you probably have some familiarity with how Envoy manages HTTP filters. Envoy has a chain of filters. When a request comes in, filter manager iterates through the chain of requests in order, notifying each one. When the response comes back, it goes through the chain in the opposite order. Some filters are only involved in one direction or the other, but cache filter is an encoder-decoder filter, so it's active in both directions. Now, there is no one-size-fits-all to caching HTTP traffic. Some deployments are well-served by one small in-memory cache, while others require the scalability of a large distributed caching system. In order to support that flexibility, cache filter delegates the actual storage of responses to a C++ plugin interface, which we call HTTP cache. Cache filter handles the intricacies of HTTP caching semantics, things like parsing relevant headers and determining what can and cannot be cached, and it handles implementing Envoy interfaces. This allows HTTP cache plugin implementers to focus only on storage or other value-added responses that their plugin needs to provide. This enables the writing of a wide variety of plugins for divergent needs. Those plugins can be HTTP-aware if needed, but they can also be simple key-value stores. We have an example simple HTTP filter or simple HTTP cache that is in fact just a wrapper around a hash map. When Envoy has parsed in HTTP request setters, it calls the decode headers method of each filter. When it gets to cache filter, if it's a get request, we look in the cache for a matching response. If one is found, we interrupt the normal filter iteration and return a response from cache filter. So let's take a slightly more detailed look at what this process looks like from the perspective of cache filter. When the filter manager calls decode headers on cache filter, we ask HTTP cache for a lookup context. Lookup context is one of the interfaces implemented by the plugin provider along with HTTP cache itself. It represents the active lookup operation. We then kick off an asynchronous get headers request to find headers from a cache response. While this is happening, we return stop all iteration and watermark, which is a status code that tells Envoy to pause the current request. Otherwise, it would get sent upstream while we're busy checking the cache, which would cause a problem if we got a hit. When the cache plugin completes the lookup, it invokes our callback with the results. In the case of a hit, those results will include the cache responses headers, which we pass on to filter manager by calling encode headers. This tells Envoy to send those response headers to the client. If the results indicate that the cache response has a body, we then make one or more asynchronous get body requests to retrieve it, calling encode data to send each chunk of data on to the client. At the end of this process, the entire response will have been streamed to the client from the cache. Now, I'd love to go into much more detail like this, but time is short. But of course, not every request is a hit in the cache. We certainly intend for most of them to be cache hits, but those that aren't are referred to as cache misses. And of course, this miss can happen because it's literally not in the cache or we could be talking about something that was found in the cache, but is too stale to serve or something like that. For some reason, the entry in the cache can't actually be used. In either case, if we back up to the point where we are getting the response back from the lookup context, in the previous scenario, we got a result that said this is a cache hit here are the headers. In this scenario, we get a result that says, sorry, this is a miss. And when that happens, instead of calling in code data and giving Envoy headers to send to the client, we simply call continue decoding, which tells Envoy, hey, you know how we had you pause earlier? Yeah, sorry about that. Just keep on going. Nothing to see here proceed as usual. And so, of course, Envoy does. It iterates through the remaining filters and on we go. Now, of course, when that happens, that response, that request will presumably generate a response that comes back into the cache filter on the other direction. And that will be as we'll see the headers in the encode headers call from filter manager. In encode headers, of course, we've got actually quite a bit of logic to do to figure out. We've got to look for, you know, look at the different rules for whether something is cacheable, you know, is there an authorization header? Is there a cache control header? What are the, what are the directives? All these different, is it a response to conditional headers? All these sorts of different things that need to be evaluated. We evaluate them. And once we've done that, if we determine that in fact this is a cacheable response, we will, of course, then cache it. So, in particular, we will in a now familiar pattern ask HTTP cache for an insert context. And then we'll use that insert context to insert headers. Now, we don't really care what the results are in terms of affecting our behavior. We need to, you know, probably report some stats. The stats is one of the outstanding items. But we're going to respond to the same. We're going to allow the package to pass through the same. So, we don't actually wait for a response to insert headers. We call you and, you know, fire and forget, keep going. When the, when envoy eventually tells us, hey, here's a body, assuming in fact that there is a body in this response, we get told that via the encode body callback from filter manager. And as you'd expect, we then turn around and insert that body into the insert context. And we fully expect that it should be able to deal with it. And if it can't, then that, again, won't affect this response because the primary thing that's happening is routing the response to the client. Inserting into the cache is a secondary concern, an important concern, but secondary, nonetheless. By the way, it is perfectly reasonable for a cache filter, a HTTP cache plugin to arbitrarily refuse to insert some requests, perhaps a server is overloaded or there's some header, some non-standard header that it looks at for whatever reason. If it wants to, it can simply refuse to insert these and that is fine. See the comments in the insert context class for more details. We are going to be making a few changes there in the near future to better report statistics. So to write a plugin for cache filter, these are the four classes you need to implement. HTTP cache along with HTTP cache factory and the lookup context and insert context, which is the analog on the insert side. Now, I mentioned before that there is no one size fits all approach and one of the consequences of different approaches is that we can have some caches that are synchronous and return a response immediately while others might issue an RPC and then come back on some unpredictable thread. The way that cache filter deals with that is via its callbacks. All of the callbacks that it provides are able to be called on any thread. You can call them before you return control back to cache filter or after it doesn't matter and cache filter takes care of moving things to the right thread when it's necessary. So you don't need to worry about it. Now, with that, I'll hand it back to Josiah to talk about the current state of development on this project. Josiah? So is the cache filter production ready from a cache semantic standpoint? Like, is the cache filter RFC compliant? In many cases, yes, for basic cache requests, including cache control and variant validation request flows with eTags and last modified, that's all implemented and ready to go. Some of the more advanced validation logic, like with if none managed, et cetera, like those listed there, that's not yet implemented and we'll actually just skip caching if those are present. And the cache control extensions like immutable and these others, those are also not yet implemented, but they're not as commonly used. If you're asking, will it work with the cache that I have in my infrastructure today? The answer is no. We do not have any production ready implementations of HTTP cache. The only cache implementation that exists today is the example, one simple HTTP cache, and that's really just there so that if you wanted the Envoy cache filter to work with Ignite or with Memcache DU, whichever, then you would have to write an implementation of HTTP cache so that the cache filter could use it and serve content from that remote. From that remote cache. There's a whole list of issues on GitHub that we know we need to have done before we can declare this thing production ready. One of the most important is that the in-memory cache, which I mentioned, the simple HTTP cache is not scalable. It currently doesn't do any memory management. It will, you can spin up Envoy, have it cache your content, and it will very quickly run out of memory because it doesn't do any sort of management on the backend. There's also some other basic functionality like serving head requests and important things like gathering stats on cache requests and just a whole list of other things that need to be done that are all filed under the area slash cache label in GitHub. If all that sounds great and you're ready to dive in and help, one of the most important things that we need people to contribute are these plugins for the various caches. So if you have expertise in any of these caches and want Envoy to work with them, please write an implementation for the HTTP cache interface so that Envoy can talk to it. The interface is ready to go, and it would be great to have these implementations to test the cache filter itself against, and so we would happily support that effort. If you need to get in touch with either Todd or me, you can find us on the Envoy Slack. We are almost always logged in there because this is part of our day job, and the list of issues that we know need to be done are currently filed under that label that I mentioned, and if any of those catch your interest, you can either post some comments in the issues or tag us on Slack and we'll get you started. So that does it for our presentation. Thanks for following along, and thank you even more if you're looking to get started contributing to the cache filter. Our contact information is right there, and we will take questions from here. Hello, Ken. You're hearing? Okay. So just so I think you may have said something, but I didn't hear anything. Okay, I wanted to mention something about, there was a question earlier about Cache Purge. One of the things that we need to figure out is the approach to use for Cache Purge because different style caches have different needs, for some of them you can do a, hey, what is literally Cache Purge? You go and delete the entries that you want to be gone. For others, you do an invalidation approach where you record entries that say, hey, if you find the thing in the cache, don't serve it. That invalidation approach makes more sense for widely distributed caches, and maybe we can both together in a weird way, so we got to figure out the next event. Just another mic check. Can you hear me now? Yes. Excellent. I think one of the other questions that we have and addressed in the chat is, is it possible to cache just one route match from the list? I'm assuming that means like cache key configuration, like deciding what parts of the path contribute to the cache key, like deciding whether or not to include query params or whether to include the protocol and those sorts of things. I have a different idea of what that question means, but go ahead. Okay, so to answer, if that's the question, then that is a plan feature. It's not currently supported, and that would be one of those things. I might have mentioned it in the slide about things that we would add to the config and like how you decide how the cache decides whether to split issues or split entries or not. And some of that is already in the config, just not, it doesn't have any effect yet. Right, it doesn't go anywhere. Yeah. Another thing, so what I think the question was, was talking about the fact that the filter config is per listener, and that tells you what filters are in the stack and then you wanted to have different routes, have different config. That is something we definitely need to add is per route configuration. And that just, that's just a matter of getting that done. So we say like, does the interface to HTTP cache plugin allow for coalescing? I believe it does, but I think, Todd, you would probably have a bit more insight on that. Yes, it absolutely does. So all you need to do for coalescing is basically have multiple things come in and if their misses just don't respond to the second, third, whatever one telling us that it's a miss, just let it go. I just let it sit and wait. And that works out now. Probably we would need some configuration around like, you know, maximum delays and stuff like that. But fundamentally, yeah, you could do it in a plugin today. Yeah, the next question is, how do items get pushed out of cache now? The short answer is that's up to the plugin and the plugins currently implemented like the simple HTTP cache just doesn't do it. And so depending on how the cache works that we're talking to, whether it's like a remote cache like Redis or something else, or if it's an in-memory cache that's written completely within Envoy, it's going to be plugin specific how that's managed. But the simple HTTP cache just doesn't do it. Yeah. And just to be clear, that's just because we haven't gotten around to it. Yeah, sure. I mean, we are not going to go into production without a feature like that. Like this is just like the simple HTTP cache is good for development and it's good to say, hey, my caching semantics work, but it is not good to put in front of live traffic. Yeah. I do think that we are going to need, there are more configuration options that will need to be added. So like I assume any cache plugin is going to need a max space option. Yeah. A max time, yeah, probably so as well. And, you know, so there are probably some other things that are universal. We also in standard Envoy fashion have, you can specify, you know, opaque to cache, stuff that's opaque to cache filter that is just handed to the plugin for whatever configuration you need. Yeah, like whether that's custom headers or other sort of like metadata that gets passed along for sure. Let's see. Have we missed any questions? One thing that I wanted to explicitly mention just because I don't know whether, I don't think we've mentioned it before, is that in terms of cache admittance policy, and that is, you know, we might expand where you can have different policies in the cache filter. But another option is that just because we call your plugin and say, here's something, please insert it, you don't have to actually insert it. You can say, gee, thanks, nope. I'm going to pass and not insert it. So you can do whatever you want there. Yeah, to answer Shakti's question. So the plan is to say yes, that you can use Redis as a remote cache with HTTP cache filter or with the cache filter, but there is not currently a plugin implementation that implements Redis' API. So once somebody gets inspired and says, hey, I'd really like to use Redis with Envoy and writes the plugin for it, then Envoy will support talking to it because the interfaces are all there. We just don't actually, we don't have the plugin for Redis. And it's designed to do that sort of thing. We just need the plugin. Yeah, the root concept behind this was, hey, at Google we have these really kind of weird requirements that most people don't have. Can we make it, how do we do caching in a way that handles our requirements and also other people's requirements? And the answer was, hey, let's make it a plugin so whatever's special is in the plugin. So like, I don't think we're going to be contributing to the Redis cache just because that doesn't happen to be relevant to Google's business needs, but we are absolutely going to do anything we can to hold your hand while you add it. Oh yeah, right. And like to build on that, we are not Redis experts. We don't use Redis. And so it's probably not good for us to be writing that plugin anyway. We wouldn't be very good at keeping up with releases and making sure that it, and that sort of thing. Like it's not good for us to own things that we don't use, but it is in our best interest to have somebody else contributing those so that we have users adding requirements to both the cache filter itself and the HDB cache interface. Like if we're missing a piece of the interface that something like Redis or Memcache and stuff need, then we want to be extending the generic portions of the code to support those things. And so if somebody comes along with the specific needs that Redis has, we're happy to support those needs. We just don't want to own the Redis HDB cache implementation itself. So yeah, please file bugs, PRs, questions. I think, as I mentioned earlier, we're routinely on Slack. We answer email and all that stuff. So we are motivated to help any efforts on this. And somebody can ask a liar question if you feel so inclined. You know, we don't have to. It'll be the only one that's talking. We have four minutes left. Yes, this is in the main repo. Yeah, and in fact, if you just add the cache config that I mentioned earlier in the presentation, then it'll load it into your filter chain because it's merged into mainline right now. Yeah, now it is still considered alpha. We haven't done fuzzing on it, you know, which is really. Right. Yeah, it is definitely not bad. We don't think that it should be used in production. But if you have the time and ability to bulletproof it, then by all means. Yeah, we're not taking a claim on any of this. If you want to help out anywhere in the code, we are happy about that. Yeah, we plan to get it to production ready, but it's not there yet. What is missing to run it in production? An HTTP cache implementation, that is scalable. The only implementation we have right now is not production ready. That's the primary thing. The basic cache semantics are ready, like it supports cache control and all of the basic like TTL type headers. There's some more advanced stuff that it doesn't support yet, like some of the more unique validation flows. But for basic caching, it'll work. And it'll handle note requests. And where it doesn't work, it's still RSC compliant. So it ignores some, you know, if it doesn't understand something like if range, it just says, okay, never mind, I'm not caching. For sure. To describe scalable in this context, the most clear way to point out that simple HTTP cache, the only HTTP cache implementation is not ready for production is that it does absolutely no memory management and it will keep adding entries to the cache until you run out of memory and probably crashes. So that's the most obvious flaw with it, but it also doesn't do sharding or other things that impact performance. Like you're probably going to get lock contention and like it's just like it's written as an example implementation to prove the interface, not to serve live traffic. Yeah. And we think we can turn it into a production quality thing while still being a good example. You know, if that proves wrong, maybe we'll split it, but that's the plan. We actually had a slide on that. How many, like it's a relatively simple interface. I wonder if we have an easy way to show that. Less than a dozen, much less than a dozen. Where is this? Yeah, if you check with slides, it'll be in there. We don't have time. Yeah, we've got about 30 seconds left, but actually if you, no, it's fine. It's like if you look up the HTTP cache class in the GitHub like search, then you should be able to see it. Like it's pretty straightforward. Yeah, that and lookup context and insert context. Right. Yeah, the two context objects and then the HTTP cache interface itself. But yeah, it's, we're talking like less than 20, probably. Like maybe if you add them, add all the classes together, like it's not obnoxious. And with that, we're at one o'clock. And so the next presentation is about to join.