 Okay, should we get going? So good to the agenda. First up, Otto, he's going to talk a little bit about Nighthawk, which is an open source load test framework that he's been developing with us, or Envoy load test. Yeah, so I've been working on a benchmarking project, we called it Nighthawk, and it's based upon Envoy's libraries. Well, currently it's in terms of functionality, it's similar to what WRK has to offer. It's still pretty early stage and simple, but it already performs fairly well. Well, we've been exploring various approaches to how to best implement it, and I think that we're narrowing down towards the right way to go with it. Currently it supports HTTP1 and 2 over plain HTTP and TLS. It has HDR histogram support. Today I've been looking into the HDR histogram is able to apply corrections to measurements to compensate for a form effect that's called, I'm not sure I forgot the term, but when the benchmarking tool has to wait for the server before it can initiate a new request, it's kind of like idle and it's missing its timings, and when you're not very careful, you will be not measuring at the time that latency spikes and HDR histogram is able to apply corrections for that, but in any case. We had an open loop design there anyway, so the idea was we wouldn't be doing that kind of close loop thing. That's right. Yeah. So, well, that's about where we are right now with this. Yeah, I'd also point out that this works being upstream right now into the Envoy Perf repository. We're trying to actually get it to follow the best practices for development that we have in the main Envoy. The Envoy repo includes things like client tidy and coverage. Coverage we don't have yet though, because basically we have a ginormous hack to make that work in the main repo. We want to use native battle coverage support, and the person who's responsible for that has actually been out and they'll be back this week and hopefully we can get them to help us out to make that happen. Ideally, we make this sort of a coverage driven effort and we get similar kinds of close to 100% coverage that we have for main repo applied here. One interesting thing to think about might be is how we structure some of these ancillary scripts and tools which exist today in the main Envoy repo. We want to use these in other repositories that we host into the Envoy proxy organization. It would be nice to perhaps even have these as a sub module which we share across them. Because right now we sort of have a lot of sort of copy and paste and boiler stuff. Copy and paste and boiler. Hey, so with the Perth thing, is there like a roadmap doc that people can look at or is there, I'm just curious, is there something that can be shared with people just to see what the different milestones are and what our plans are? Yeah, we have a statement of work. We could turn that into I think a roadmap essentially. I mean, whatever's easiest, it could be a little roadmap. It could be even in the Envoy Perth, we could convert it into issues with some checkboxes. I just think it would be nice for people to understand where the project is going so that people can comment if there's things that people would actually like to see. Yeah, maybe if we could take out all those features that we have in the statement of work and convert them into issues and like tag them at least with some notion of priority or likelihood of landing in the next milestone or two or whatever we want to call them. Yeah, that would be great. And then also, I mean in GitHub now you can do project boards, you know, you can have milestones, you can have labels. So that might be just a good way, you know, to generally track what people are working on in different milestones, just because as we get further, you know, we've talked about this offline. I think this project has the potential to become very widely used outside of Envoy, like it's a pretty cool thing. So it would just be nice for people to understand what's going on. Yeah, for sure. Yeah, I don't know, I already had some of the promising numbers I think around the agency measurements and we're actually like better than so many of the other tools out there today. So I think this could be, yeah. Better and supporting H2. Yeah. Well, and H2 and then we'll eventually get quick support and a whole bunch of other things. And actually there's some there's some nice synergies, because if we want to load test quick will need quick client support. We also need quick client support for running Envoy on the client. So I mean, there's a, there's a bunch of work that that I think comes together pretty nicely here. Yeah. Yeah, and then and then one other thing just just from a roadmap perspective, I'm sure it's not in the first version, but longer term, figuring out how we eventually, you know, get this into our CI system so that there's some way on stable resources that we can run. You know, even if it's not every commit but just to look for like a like a weekly trend of performance so that we can see if we've had any major regressions on CPU or or memory usage. I mean that the benefit there would be so tremendous. Yeah, I would be excited to maybe use dedicated VMs for that so we can get some very great numbers. Yeah. Yeah. Yeah, one thing that we can talk about and this will come later is that CNCF does actually have a bare metal cluster somewhere. We don't actually know how one gets access to it. And of course, if you're trying to provision stuff on bare metal, you know, there's like a lot of complexity there. But I think that even if we're talking just a couple of VMs or bare metal machines, you know, we can set up an environment with which we give us stable results. Yeah, I think that's all going to come a bit later like right now the scope of the work that Otto and his TV are involved in is largely just about the tool itself. But it's basically it's the enabler for building the rest of this and to the extent that others are interested in contributing to, you know, doing this infrastructure wiring job, we would welcome. Well, and it's it's something that I suspect that once we get a little bit further on, we could have CNCF pay a contractor to help with some of that plumbing work like because that's less systems work. That's more just tying it together into provisioning and CI, not that it's not hard, but you know, it's a different skill set and we could probably get someone else to be paid to to do that. Sounds good. Yeah, I wanted to add one thing. So it's just kind of like for for your information, but I have this latency tunes bare metal machine over here. And on purely synthetic benchmark against and for a surfing like a static Laura Mipsum file, the one that was already in the end for your repository, I get like super tight standard deviations like within 12 microseconds or something like that. And that's interesting because I think that if we can run bare metal and CI, we should be able to, you know, set a pretty tight benchmark there for us to maintain. Yeah, no, this is this is super exciting. And that's why, you know, I mean, there's there's always a lot of public conversations about how, you know, projects should publish perf numbers. And I'm always saying that so hard, like, I mean, it takes literally months and months of effort to do this correctly. But we're we're actually doing all of the months and months of effort to do it correctly. So it'll be really amazing if in a couple of months we can we can get this working so that we have it in CI and have published results. Awesome. Well, thank you. Thank you. So what do we have next. Certification program. Sounds exciting. Should on boy have a certification program see GitHub. Is there a test suite analogous to this is to certify management service or I have no idea. Are you on the call Chris. We should probably just table this until Chris can come and talk about it because I don't, I don't, I don't exactly know what this means unless there's someone else on the call that that knows. Yeah, I was hoping we'll have an on voice certified developer program and I can put that on my resume but apparently not. Okay, so see I update. So, for people out there just just a quick update so you may have noticed that we have a lot of queuing, or we should have had we shouldn't have any queuing anymore but we had a lot queuing before yesterday to make a very long story short. So, the Circle CI was gracefully giving us free CI for many months, and we reached a resource level with circle where they weren't willing to give us any more free CI resources. That means that we need to pay for our CI so behind the scenes we've been obviously looking at a whole bunch of different things from how do we pay for our existing resources to, you know, how do we make our builds faster. How do we develop some tooling so that we don't run all of the CI jobs on on each PR or each commit you know so maybe have like a final test pass for tests that that you know typically don't fail if the main test pass. So things like Mac or compile time options or things like that. So there's a there's a couple things happening in parallel. We're working on getting direct funding for our CI bill. I don't want to share anything publicly about that right now but I feel confident that that will have the funding that we need. If you are listening to this and you are a company that appreciates on boy and you would like to help contribute to our CI bill. Please reach out to me or the maintainers CI is probably like one of the most valuable things that we do. It's not cheap but it keeps our project at super high velocity. So if you would like to contribute x dollars per month, please please contact us that would be great. And then we're also investigating some things to make builds faster and stuff like that. I think Lee's on is on the call. Do you want to briefly talk about what your thoughts are there. Yeah, so we turn. So there's like a couple items to make a builder build faster. We turned on the GCS cash back a couple days ago. So that's how we set that up. So that's when showing a very good performance in recent release runs that turns out that some some of builds finished in 20 minutes, which was like near two hours before so that was great. I'm also had a meeting with the remote build executor and going to give it a try. So those are things that we were exploring currently to make the build faster. And there's some tech that can potentially make the build faster as well that has failed the issues. So dynamic linking right, which we never dynamic link one is also the one that I make a link probably not we're not speeding up test really much. I'm not sure I will give it a try to see how how that goes. And using all these probably can make the release build linking faster. Yeah, I mean that that one seems like a no brainer like it seems like we stick with GCC for now we could link with with with the other linker. I, you know, per other discussions like I would be also, I think we're reaching a point where enough people are using clang now that I like I would be fine switching our official build over to clang. The only thing per our private discussion is, and for people out there is, I don't think we can stop doing CI with GCC. I mean, I just think there's too many people that are that are still using it. Whether we do like we don't necessarily have to do all the tests with GCC like we could just compile the server and just make sure that all the prod code compiles right. It's not optimal but I mean these are just things that I think that we could we could think through. Yeah, I would hesitate on that one because we've seen historically issues around things like different, you know, implications of STL and that kind of thing which have manifested themselves in tests and actual real test failures as we've switched between the different compilers. So that seems, you know, if it was usually the case that they were producing effectively identical binaries and we didn't really ever see any behavioral differences. I would totally agree that why don't let's not bother running those tests, but I kind of feel that there's been enough differences that we've seen in the past that we should continue to run them. Yeah, well and and that comes actually back to I think once it a developed some of that additional repo kid a tooling for us like there's a bunch of tests that we're running now which don't really have to run on every single commit in every PR. Right, so it's like, you know, some of them we may decide to really like only do a master or some of them we could do with this magnifier thing on like risky PR. So I think the tooling will give us a bunch of flexibility to really cut our bill and also just make people faster because they're waiting for less tests, you know. Okay, like some docs PRs we don't need to build a full the relationship and things and stuff. Well, and and those honestly we could fix right now because at the beginning of the scripts for docs changes you could just look and see if the diff only affects things in the docs folder and then like not not do certain jobs. So I mean, I feel like there, there's a lot of low hanging fruit here. You know, we just need to resource someone to actually look at it. Yeah, I definitely like the idea of running most of, you know, most of the heavy test only at least at the end, you know, when somebody said really good to go go review it. Because at least in my workflow I tend to check stuff, you know, make a PR so that I can see how it looks like in that context. But I don't really need to start running the test yet. We have, we have noticed that because you are one of the heaviest CI users by by dollar amount. I'm just kidding. Yeah, right. And, you know, there's just, there's just so many wins here like it'll make people faster. Yeah. Cool. Okay, should we move on to caching? Yep. I'm sure so. Unfortunately, Todd is not here this time he we didn't talk about it two weeks ago but at least. So you got kind of the intro but now it's a doc everybody can get to I think the easiest way to get to it if you didn't see it on the slack channel is that it's issue number 868 and Todd linked the dock at the bottom of that one so you can open it up. But I'll kind of go over the high points of it really quick and see if there's questions. So, the idea here is that this is a kind of a plugin based architecture where the, the proposal and we don't have code yet to share is that we would supply an HP filter that would perform caching in on boy, but it wouldn't have a cash in itself, you have to plug in a real cash that you want to use. So mostly what this dot specifies is what is the interface between this caching filter that we will supply eventually, and, you know, and the cash back end which might be, you know, something that's proprietary and different networks or it might be, you know, we can do something based on Redis or a TS or something in memory. So the idea here is that you might have multiples of these and have multi level caches but that's kind of left as an exercise for the reader. So, there's a few important points that I want to kind of cover in this doc, which is that this is designed kind of to be performant at high with very large objects and is streaming kind of at every level of the hierarchy. So, if you have like a slip, you know, a huge object that takes a lot of time to stream it out through the cache. This works kind of in a similar way to the way that we fetch data from upstreams. So it doesn't block the envoy thread and it will, it should work reasonably well. It's also designed with a lot of kind of wisdom from HP caching at Google handling things like variants, which is actually a very customizable thing range requests. The variants are, so if you don't know much about variants, variants are when you say, well, I'd like to send this response to, for example, clients that specify a specific HP request header in a certain way, but that's part that'll become part of the key that so you can pick an HTTP header that you want to become part of the key. And a good example of this would be the accept header or the accept encoding header. And that way you could have responses that vary based on those and you have to specify kind of in your installation, which of those you care about for your server. So user agent might be one that people would consider, although that's generally a bad idea for front ends because there's huge amounts of entropy in user agent. Range requests are super interesting for large content, especially video, or for doing downloads. And that basically says I can download a, you know, bytes, 7000 through 992,000 of this response. And, and then you can piece those together on the client, which is kind of how like resumable downloads work in Chrome, for example. We discussed quite a bit, but do not have in the spec anything about invalidation that would be if the operator of cash wants to say let's let's toss out a particular result that wouldn't get any more. That's going to be kind of up to the implementer of the back end. The spec is agnostic about how that ought to work. The spec actually used to be much bigger. And we reduced it down. There used to be kind of a layering thing where you could start with a very simple key value store with no knowledge of HTTP and, and all kind of the layers of these HTTP specific as we'll be in there. That is currently not in this spec. We may do another spec which talks about how you could let you know later all the semantics that are required for this HP cash plugin based on a Q value store. We'll be iterating on that and also iterating on getting the code, which is currently, you know, not in, you know, kind of, if it looks like we've got kind of alignment around this dock is kind of the generation, you know, and kind of evolve towards thinking this is the right way to go then we'll start shuttling the code in that has a sample filter or a very simple cash that implements it. So, I think that's about the right amount of background on this for this meeting but I'm happy to answer any other questions that you have. So for the for the general caching infrastructure, I guess you have some particular use case in mind. Do you, like, do you plan on having like they'll be the main filter, of course, but then you were saying that, like, do you foresee having whether whether you read us or something else like do you foresee having some reference like full and complete implementation in the public repo or like do you think that people are going to have to go off and like build build some back end basically. I think we should, we should evolve. Yeah. Yes, we should provide all that whether or not that will be done by us. Yeah. Or whether that was gets done by the community or somebody wants to pick that up. Yeah, we'll have kind of a toy implementation that will definitely open source. It's just going to be a map. Yeah, then we'll want to have something based on Redis remote cash D. Yep. Okay. Yeah, I mean, given the existing Redis support that's already in in on boy, I think doing doing Redis there would be pretty interesting because you get Redis stats and like Redis clustering and like a bunch of you just basically get it all for free. So I may actually not even ended up being that much work quotes quotes but but you know it's something to think about. Yep, I agree. Redis is kind of the right first first thing to do. I also have one question about the about the dog. So say that I would want to implement a cash that actually is internal to and for you in that filter. Currently, the training model is that I think all outbound connections all I always is running on the same dispatcher and same threat as as the outbound IO right so the client and the server connections both are aligned. And I was wondering if it's possible to do that. When I plug this cash underneath or will there be like a threat switching going on. So what would the fast path look like. Yeah, we've talked quite a bit about this. I think that if you have. I think you could imagine one scenario where you would have an in memory cash per thread but that would be a little insane because you would really get really bad rates. I think the most of our thinking is that what you would probably do is you would suffer some lock overhead. When you used an internal flash but what you could do is you could program how much of that you were willing to accept by sharding the cash into as many shards as you wanted. And so if you have like 100 core machine maybe you would make, you know, a 19 way shard shard and cash and you would get barely infrequent hits on that. I definitely would make those caches not too huge, because you want to do most of the heavy lifting and some kind of shared back and like Redis, and that's that for the super hot requests. And another question that floated to the top of my mind when I read this is, would this make with this complicate life for plugin builders. If there's like different threading falls or is the case is the cash like off transparent to the filters running up and downstream of the cash. I would hope that it would be completely transparent to the other filters running in on boy. Yeah, that's because kind of in another server I've seen that complicate life a lot. And I think that, you know, the speed aspect aside, that's an important one. Yeah, if you, if you would be great to see some comments in the dock about like how the problems you've seen in the past with those kinds of interactions between a cash and a server and other filters. We should make sure that we, you know, it's definitely not too late to try to design around that to prevent that from occurring. Okay, I'll add a comment on it. See if I can contribute. Awesome. Yeah, I think that's it or unless there's anything else that folks were to discuss. See you all later. All right. Bye bye. Bye.