 Um, yeah, we may have been initially instead of next, uh, there are at least some people that directly here will be there. Okay, uh, so I have two topics that I wanted to cover. The first is discussing the recent EVs. So as I'm sure most folks here are aware we had a security release on Friday and this covered two, uh, vulnerabilities. One was related to path normalization and the other headed matching. We discovered that roughly the same time it had impacted roughly the same area, they had the same sort of attack vector and threat model. They were fixed by the almost security team under embargo, over a period of about two weeks together with various Googlers and this was really our first run through the Envoy security release process. So I'm first of all happy to answer any questions about these vulnerabilities if anyone has any. Uh, and second, I'd like to sort of discuss how we move forward from here and specifically we would like to take process improvements. We're going to do a post-mortem. I've got a long flight back to Boston on Friday, which I'm basically going to use to write up as much as possible. We have a doc for jotting down initial thoughts if anyone has any, uh, to a share based on their experiences during the handling of these vulnerabilities. Um, and I think we should probably set up a longer meeting next week because we always kind of time, uh, cramped in this meeting to discuss some of the fallout from that and I think some of the things we're going to want to do are first of all make various process tweaks and improvements based on our experience. Uh, this includes things like looking at, you know, what are our notification windows are going to be from our private distributors from when we have a candidate's set of patches for the fix till the end of the embargo period. Um, we would like to deal with some very, um, umbrick community context specific things that include how do we sort of, who belongs in the distributor list and, uh, you know, we're not regular shrink wrap software or even the distribution like red hat or something like that. We're our own thing and we have both distributors and service providers and we have this tension between wanting to make the list larger to provide advanced notification to as many people as possible and trying to keep it small to try and ensure the embargo isn't broken and we need to have like a period open discussion about that. And finally, um, we need to think about how we can do things like canarying and staging rollouts of docker images where, you know, the unvoiced site compiler is actually visible to other applications and potentially users of distributions, um, in a way which doesn't actually break embargo and leak information. And I think these are all sort of like, uh, deeper topics. So I, I, Bill will talk about them now. I don't know if anyone wants to say anything briefly about them, but, uh, instead I think I will schedule a meeting and share it with everyone for probably next week. Does that sound good? Yeah. That sounds great. Okay. Uh, any other questions on security stuff or should we move on? Okay. Uh, the other thing I wanted to do is just put in a plugin or an advertisement for a design doc, which I'll probably share in the next L2. I have an internal version and I'd like to share this externally and that is, um, uh, on what we call Orca open request cost aggregation. And this is this idea that Envoy is already capable of providing load reports to, um, a low, uh, control plane load balancer via low report service. But in order to, uh, uh, it's also part that LRS always had fields there to support backhand specific information. For example, not as Envoy's load, but to be able to channel things like the CPU utilization of backhands or application specific metrics, which are provided backhands. And this idea is that these flow either in bands via, uh, response headers from the backhands or they come in via, um, uh, sort of a GAPC service up out of band. And this allows the global load balancer or the control plane load balancer to make decisions based, um, not just on Envoy's and, um, its own idea of what the backhands look like, but based on actually the statuses observed by, uh, Envoy's. Um, so we're hoping this can be sort of a standard, which many different sort of application and microservice frameworks will eventually adopt and make available to Envoy to facilitate this. But as a first step, we know that the GRPC LB folks who are busy working to adopt XDS are interested in using this for examining this person. So they're going to be driving, uh, the initial implementation there, but I'll try to share a design doc and we can see if, uh, what changes need to be made. I don't think it'll be very controversial because there's only like so many ways you can really do this, but, um, yeah, that's coming. So, okay, uh, Matt? Uh, yeah. So I just wanted to talk about the deprecation stuff again, specifically the host deprecation for the cluster load assignment thing. Uh, and then there's the other one, the deprecated tcp deprecated v1. I think that one is not, I think that the tcp deprecated v1 is not controversial. I just, I think we need to have a policy basically that unless there is an explicit replacement, we can't, we can't have something be deprecated. Yeah. I rolled that one back with notes. I thought you also wanted, uh, sorry, host undeprecated as well. Well, that's what, that's the one that I want to talk about here. Um, and that one is more subjective. I think my personal feeling, and I'd love to hear what other folks think, is that I don't think deprecating it is worth the pain given how simple they transformed is. Um, so, but this is, I guess, this is just a general like policy question, which is, you know, like technically cluster load assignment is a super set of hosts. So, you know, if we keep with our, uh, you know, idea that everything is machine generated and there should be a simple transform, uh, you know, then we can probably deprecate host. It's just one of those things that seems so pervasively used that I just know that it's going to irritate people. Um, and it just doesn't, it just doesn't seem worth it. So I would be curious to hear what, what other people think. I guess my thought is I would prefer that we not have redundancy in the API long term, but I am 100% fine. Like, I think my purpose would be to leave it flagged as deprecated, not make it fatal yet. So like, not cause the pain and then see what happens with the conversations we've been having regarding kind of our general API refactors, which we'll have a better sense of in a couple months and bring to the community and get all the approvals for. So I think by next release, we'll have a better idea of what we want to do for readings and delaying that pain until then sounds good and then leaving it in. Yeah, now's probably actually a good time to point out that we are starting to think about how to deal with the long-term evolution of Envoy's APIs and how do I better sort of deal with major breaking changes and structural changes. We'll probably have some sort of strawman proposal coming shortly and this probably would be best binned for that one of these like major structural changes. Yeah, no, and that, and that sounds fine to me. I just know that if we make a fatal by default now, we're going to get a bunch of people complaining. And yeah, it feels fine to me that if this would be the type of thing, and again, this is up for conversation, but in our yearly cleanup or like whatever we pick that cadence is of like doing major cleanups, this seems like something that we could throw in there. Okay, so I guess we're agreed. So I guess my question, Alyssa, then is I haven't been fully tracking how the scripts work. Is it the kind of thing where we run the script but then we can hand edit it or like is it going to keep breaking because like we're not doing what the script said? We need to hand edit it every time, but again, I think because we're just punting it till next release, and I think by next release we'll have an idea of what we're doing long-term, I don't think it's a, I don't think it's terrible to win off. Okay, but sorry, I just want to make sure that I fully understand when we're hand editing it, we're going to commit what we hand edit. So when we run the script next time, like we would see it in the diff that it's getting moved back and then we could discuss it then, right? Exactly, yeah. So it'll come back next quarter with the exact same question and we'll hopefully then have an answer. Okay, great. So I mean, that sounds good to me. So it sounds like we're in agreement to not make it fatal by default. We will leave the warning and then we can defer our discussion until next time, basically. Yeah, can we just add to the script so I come, you know, a whitelist of things that we don't want to duplicate? We can, we already have a for part of it, but again, like, yes, I can totally do that. Okay. Okay. But I don't want to because I don't think this is something we want to do, so I'd rather figure out what our plan is than hack the script to have one of our grads. If we're done with this, I did have one more comment on the security thing. Or is there any other stuff that anyone wanted to chat about on the deprecation topic? Okay. One thing related to what Harvey was talking about is one of the things that we're going to be discussing, and this is not necessarily even for discussion now, I just want to put it in people's minds and for people who are listening to the recording later. One of the things that we're trying to figure out is like who gets to be part of the pre-announced list, which is, which is an interesting discussion. And there's, you know, there's a bunch that's already written in our existing security policy, but there's been some interesting discussion just about how, you know, Envoy is not quite the same as other software, so trying to figure out who gets to be on the pre-announced list and how to keep that list useful, but also not so big so that it makes the embargo stuff kind of pointless. So it's probably not something that we're going to discuss now, I just want to throw it out there for food for thought, that if you have comments or thoughts on, you know, what should the criteria be to get on the pre-announced list, we would love to hear your feedback. So you can have public feedback or you can email the Envoy security list, whatever works. Yeah, in particular if anyone's sort of aware of other open source projects which need to deal with this kind of thing where they're, you know, they have use, for example, let's say in edge networking on cloud service providers or this kind of thing or, you know, cloud service providers, not, yeah, like, you know, just a few random examples, like people like in a email, Pinterest or Square. What does like a patchy do for the HTTP server? Right, that seems... Yeah, that's probably a good one to look at. So, engine X and so on. Yeah, we should definitely take a look there. Yeah, I still think that we're going to be in a slightly different situation just because of the whole sidecar deployment model and like vertical product situation, which is a little different typically than engine X and Apache, but I agree it's definitely worth looking at those. Yeah, I think like the sidecar model takes us closer to a regular distribution like Red Hat or something like that, whereas the problem is we have distributors who themselves have partners and so on who then may want to provide early notifications to their customers and that's a that's a much tougher dynamic to manage. Yeah. Can you quickly review what you mean by like the Red Hat? Kind of bucket? It takes us closer to being a component of a regular distribution when we have folks like for example Istio building their own products based on us because you can think of Istio essentially as a distribution of Envoy. Okay, that makes sense. Thanks. Yeah, and the stance that we've taken so far is it's basically that anyone effectively building a product or service clearly known and based on Envoy, who is not just serving themselves. So like Lyft would not be on the list, Pinterest wouldn't be on the list, eBay, etc., which is typical for how most of these lists work, but again we're just trying to gather feedback. Yeah, I mean there are all kinds of ways this can be structured. I mean we've had the point made that for example the Linux kernel community to some extent you know your amount of contribution and influence the community can impact how likely you are to see patches early and is this a good model? I think but it is a model and an alternative to for example the one that we've discussed. Yeah, yeah so anyway I think again we're just looking for feedback. So comments, thoughts, other projects to look at would be super useful. We also have a channel Envoy-CVA if anyone wants to join that to discuss. That's a pretty good place to have public discussion. Cool. Anything else? All right, we're having a lot of CI flakes is like around coverage. Harvey what's the status of the basal native coverage? So it was blocked of me figuring out how to deal with plumbing of various flags down to external dependencies now that we switched rules foreign cc. There's a new version of rules foreign cc which apparently fixes that situation so I need to go in there and see if that magically works. That does then that unblocks one thing and then we can go back to the original PR I had out so I'm hoping to get you that this week. But as far as I know the native coverage fails on the coverage report merger part. That seems a problem I'm not sure if I follow the right way that that's what my local run from Harvey spending PR like a couple days ago. A lot of failures I've seen were with one specific test. So the coverage that fix should be fixed the GIPC test should be fixed 6229 the remove CI workspace. That one triggered the coverage consistently and the cost is one of the timeout is set to one second then increased to 10 second that should be good for now for the coverage failure. I think that is like depends on how the coverage tests are the order to run and it was used to be work but not working when some people add new tests and it failed. How does increasing the timeout help the order? No I mean the I mean the changing the order seems to affect the whether that test will fail consistently or how often that test is failing. Okay we should we should fall offline because I mean anytime I see a timeout like that that looks a little shady we should probably make that use either the simulated time thing or or something else. Yeah but that's a like integration test like talking to a real GIPC server with the with a real network to local host. Okay the only so the only reason that I brought up Bazel and Bazel native coverage is mostly that no one can ever figure out how to run the coverage stuff locally. Right it was so it just it just was like it would be you know like it would just make it easier for people to actually debug. Right it was hard to debug that that cost me like a couple hours. Yeah I used to be able to run coverage locally and that seems to degrade for reasons I haven't figured out yet it seems to be giving me errors which are just wrong like it's telling me things that are wrong with my build file that seemed obviously right. I think the how to run coverage locally haven't changed it just the run the script. Yeah that's the script I used for sure. I don't I haven't I haven't figured out if this is something in my environment or something that I've done to my object file. And then the timeout that I saw was not a timeout of a particular test but a timeout of the whole process which I think might be related to how much code you have in the PR. Yeah so I've seen that issue many many times now is that the one where it's like killing the basal thing but like it times out doing that or did the build fail? That one I'm not very sure I will I will keep eyes on the runs. Okay yeah my time died after like two thousands not for an individual test but like two thousand seconds for the entire run which is okay all right yeah we we just need we just need to keep an eye on it just because at least in the last few days I've seen a ton of people like the coverage is failing constantly. Yeah so one of that is the gipc client integration test should be good for okay all right so I guess let's just make make sure that people merge master one one one idea actually that comes to mind we should talk to Ite. I'm wondering if like if we know of a case when everyone has to merge master I bet we could do something with the bot and actually like get the bot to go through all open PRs and just like say like please please merge master. Yeah that would be great. Okay let me I'm making a note now and I'll email Ite because yeah it's like it's such a pain when this happens like everyone just keeps running retest and it doesn't work so I will I will email him uh okay got it made a note of my still what I wanted to say one other thing so um I'm simulated is probably not good for all integration tests yet I'm kind of working on the background but I shelved it because I was too busy with other stuff but there's still some semantic confusion to run what that means the way that it's used and I have some in-flight code to try to clean that up I just need to get it to work okay but I think it's good for some integration tests and and it should be good for any unit test if that helps it's been it's been great for me it's it's super fantastic so I definitely recommend everyone use it okay thanks that motivates me to try and finish up the stuff that I'm doing to make it easier to reason about yeah I mean I I think the stuff that we have to figure out is what you and I have talked about offline which is just that in the integration test we wind up in the situation where multiple threads are incrementing the time and then and then it just gets like super complicated yeah exactly cool all right um did anyone have anything else um we're at next some of us uh anyone else is around and wants to chat happy to I was thinking if I could have a minute to introduce myself yeah okay so I'm Ismo Pustinen I work at Intel here in Finland so I'm part of a bigger cncf team here which is trying to contribute contribute to cncf projects Kubernetes are done why our focus is sort of in this like how to get everything running the best on intel hardware like accelerators and cpu features and what not I'm myself particularly right now looking at this QoT support and so on but the other task for our whole team is sort of to try to make these projects as like hours and as possible and help out everywhere we can so so I just want to thank everybody who has been reviewing the beers and and like talking on the on the issues which I've been finding and I hope to kind of interact even more with you guys like like now going on yeah I think you so you right now you have PR XPR around uh the private key uh offload capability in boring SSL rights yeah yeah that's right yeah and I think uh I owe you a review for that I'll I'll I'll try do that today hey thank you great hi sorry sorry just to jump in here uh my name is Eric I work at Instructure uh we we're adopting Envoy internally we have an issue open about improving or making air messages machine parsable and we are also discovering as we're starting to develop with Envoy locally we have a lot of exception messages that are causing some of our developers pain and we're wondering whether it would be easier to just send in PRs to fix those error messages first and sort of push off the machine parsable issue which will probably take longer since we have to change a whole lot of exceptions or if we should focus more on getting the error messages machine parsable first and worry about fixing the actual content of the error messages later if that makes sense I think it's I think it's going to be hard to give guidance without having more more detail on on what type of things you're you're talking about um so I would recommend either opening an issue with some of the changes that you're that you're thinking about um and then we can that we can discuss there okay yeah so we already have that issue open so we just continue commenting on that sure yeah I mean I just like better error messages sound good but I guess I it's hard to know it's hard to know exactly what you're what you're proposing without I think a bit more detail yeah yeah the the the real result is it's just the the message isn't quite clear and so it's it's a question of is it worthwhile to spend time rewriting the actual message of the exception like the string that gets thrown to the user or if it's worth spending more time on making them you know protobufs so that way they can be parsed themselves I mean generally just changing wording can't it's too much I mean it's going to clarify I think that's what I know but yeah yeah no I mean I would say like better documentation better error messages are probably always always going to be going to be approved I guess the the only reason I was asking to to maybe discuss further is that I don't quite know what you're proposing it's like if if if the text is going to be controversial then we might have to discuss if it's just like obviously more clear then it's probably fine but but without knowing the details it's it's a little hard to give guidance yeah and I'm sorry to be vague we don't you know we are we're still very early in into this and we're still very much deciding whether or not this is worth pushing up you know globally or within ourselves so I apologize for that vagueness but yeah that that actually helps a lot thanks yeah no I mean in general like we really really appreciate better better error messages better documentation so feel feel free okay thank you okay cool see you folks next time yeah bye bye bye