 started. Let's get started if we have a long agenda. Okay, is Alyssa here? I think you're first on the agenda list. Oh, I am. Am I? I'm still pulling up the agenda. Are we talking about release stuff? Any versus struck deprecation. Oh, I didn't know that we added it to the agenda. I didn't add it there. Who who who added it? I think I suggested that Michael Payne. Oh, okay. Well, then why don't you lead our discussion? Well, I've got a PR in that Harvey's reviewing. I've got another one that converts all of the tests, yamls. It's very painstaking to convert the the conflicts because they're all pretty much out of date. And then I plan on starting on the test code changes. So there's a fair bit that needs to be done there. And lastly, the learn site, which Chris posted the source for in the chat is all out of date. And I think that's driving a lot of our sort of chatter in the slack about all conflicts. Well, yeah, there's that. And then this will come up in the next thing and maybe I maybe I'll move it and we can talk about it in parallel. But I think those Docker image changes that I made are also confusing people. It's like you can't win here. But I think we're actually moving towards a better place. But on the I guess on the any versus struck thing, I have one fundamental question. And maybe this is for Harvey. I'm not sure. Is there some like I thought there was some problem with using any? Yeah, so the problem is it comes down to what we do is we're hashing protobufs. And I see the GRPT team maybe getting excited there. Yeah, hashing protobufs is probably never a good idea. I mean, because they don't have any canonical hash, stable hash value. And we're trying to actually make it such that I think there's a bunch of issues filed to at least the individual languages have stable hashes so that when you have the same protobuf, you get the same hash, which is a desirable property. And this is this manifests itself as a problem when we switch to any because it would cause those config updates with embedded any fields. They would have unstable hashes essentially in that will force draining high CPU utilization and Istio. So I think some of this has been fixed in different ways. Like I know, I think they've actually fixed this on the go side. I think they've on the issue actually switched back to struct for their last release because of this issue. And they didn't want to see the CPU spike. And I think it still needs to be fixed at the protobuf level. So the yeah, you know, I guess my my high level question is given that it seems that any is not like fully production ready. Yeah, should we table these documentation PRs until it's ready? Or do we think that we should have all the documentation point to any? I think that's my concern. I don't think adding documentation for how to use it is bad. I think as long as there's a warning somewhere saying like, Hey, by the way, there's still this hashing thing. If you have concerns about that, be aware of it. Well, but what but what Michael's doing is he's going through and like he's fixing all the configs to translate it over and yeah, but Matt, I think it's fine for all the configs. I think the only place that any is a problem is when we have a dynamic control plane. And people are concerned about draining behavior specifically. So those are the ones we shouldn't convert over and we should put a big slap a big warning label up there. The rest I think are perfectly ready for primetime use. Okay. Sounds good. So so I guess, Michael, the other question that I had is for one of the configs. Don't we run all the configs in the repo through config tests? Like we we run config gen and then we run those configs through the config test. So shouldn't they be broken if they're out of date? Or is it just that you're trying to get rid of all the deprecations? Yeah, just trying to get rid of the deprecation warning. So and so I've got some some test stuff coming through as well. Yeah, I've running running everything through the config and stuff as well. So assist all configs that are not hard deprecated yet. Yeah, one one thought there and this is a question for Alyssa is, I think for config test, anything that runs through config underscore test dot CC, we should make it fail by default right away. And that would catch this at PR time. So is that is that possible, Alyssa? Sorry, you're wanting to basically catch anyone using fields that we haven't marked as deprecated or once we mark them. Sorry. So basically, in the repo, we have a bunch of example configs like we have our Docker containers. We have our like config gen examples. And what I'm suggesting is that those configs today will continue to run even using deprecated fields. And what I'm suggesting is that if we somehow make it so that when we run those configs through config tests, we make them fail by default. And then when people do the integration test. Sorry, what's that? It's config test, a unit test or on my integration test, it's unit. It's it's it's its own thing. It's a it's a unit test. So but I think we could do similar things to what you did. And I would guess 80% likely it's gonna fail. But let me actually take a look. I could take a lot in for that. The reason I caught the integration test thing is I flipped the flags and like a shit ton of stuff broke. No, right. Yeah. And I had to actively override integration tests, but I'll take a look. Okay, let me let me point you at or if you just search in the repo, it's config underscore test.cc. Okay, one other thing though, on this is, does anyone if Harvey thinks that fixing the stable hashing problem requires upstream proto changes? Has anyone taken on reaching out to the party team and seeing the bugs a file? The question is, who's going to actually take action on them? That's what I'm asking. Yes. So have you filed a bug with a photo? I know Istio has a bug. Yes. I would hope that is I mean Istio or the people that cared about this, like, I mean, I've always felt honestly that this seems like a lot of churn for for not a huge reason. But if Istio is pushing on this, right, yeah, contact person, I can lean on them to actually do it. And say, look, you know, we need to make progress on this or, or, you know, see my blog post on the strike versus any, it's wrong. And it's the conclusions are you can basically it doesn't give you that much. I mean, it's nice. It's, it's more efficient. But that's all Okay. Just in the interest of time, we should probably move along. I did want to mention that for everyone out there, I did it. So there's been a constant source of confusion, which is that we have one Docker image repo, and we mix in the tagged images and the master build images. We constantly get questions about, you know, where's the version release or where the daily releases. So what I did is I split them so that we're going to put the tag releases in the envoy Docker hub repo and master releases and envoy dash def. This has exposed a bunch of similar issues where there are people out there who were just running latest, which is probably not great. If you're out there, please don't do that. Who I think have gotten confused because now latest snapback to version 1.9.0. So I have a PR up right now, which actually fixes the docs, try to snap stable docs and stable stable images so that when you're looking at the 1.9.0 docs, you'll actually point you at the 1.9.0 image or point you at the shop or the image that the docs are actually snapped to. So that PR is up. There's still some work which I was hoping Michael might be willing to help with at some point, which is we need to do the same thing for our Docker files. And again, in the interest of the time, I'll stop. I think we can do some stuff with environment variable injection in the Docker files to make it run from the right place. But you could track that work on GitHub if you're interested. But the goal, from my perspective is we should get to a place where people that are running master and are looking at master docs, they see docs that are consistent with all of the configs and the image that actually runs those docs. And for people that are running tag releases, if they look at the 1.9.0 or the 1.10.0 docs, they will actually see docs that point to the correct image. And I think if we can achieve that, it's going to fix a bunch of our support issues. So any questions or comments about that? Yeah, I'll definitely help. I'm on vacation this week, but I'll help when I can. This sounds like a great vacation activity. That's that's what I've been doing. My daughter has been, can't snow ski, so I've been doing config updates. This is really painful. Some people like to take vacation on their vacation. No, I agree. I just yeah. Okay. Anyway. All right. So are we on to? Yeah. API versioning? So how do we want to approach this? Do we want to give an overview? Do our folks in the GIPC team want to sort of give their pain points that they're experiencing? I'm not sure. Like, It would be, I think really helpful for me and everyone else if either you or the GIPC team, yeah, could just in a couple of minutes just describe to everyone, you know, what you're trying to achieve, why the current thing is broken, and then we can start brainstorming or discussing different solutions. Yeah, I think a special dog does that. Yeah, I can take a stab at that. So let me let me see if I can get my screen share working. But while I'm doing that, just a quick introduction. Yeah. So I'm Doug. I work on gRPC in Go. And where's the thing here? Sorry. And yeah, so I'm joined in the room here with a bunch of other gRPC folks. And let's see how this is going to work. Can you guys see my screen? Cool. Okay, cool. So I put together a quick one page or to sort of cover this. But yeah, so gRPC has been looking at moving to Envoy XDS protocol as our load balancing and name resolving a built in solution to replace something that we designed and developed in house. And so just load balancing. Oh, sorry. Okay, just for load balancing. Okay. Great. And right. So while we were in the process of developing this at least for Go, we got broken a couple times by some changes that were made in the proto files. These were both I believe instances where a field got moved from outside of a one of to inside of a one of and that that breaks the Go API but not in other languages or at least not in C. So so yeah, we started sort of digging into the breaking change policy and Envoy. And we were a little bit concerned, I guess, by what we read in the form of discovering that features could actually be removed during a major version of the XDS protocol. And this is going to cause some problems for us. So I don't know, do you guys have any questions yet? Or should I go into like sort of a diagram that sort of shows why this breaks that I can run through in a minute? No, thanks. That's super helpful. I think that makes sense to me. But yeah, I guess if other people have questions or would would love to see those diagrams that would be useful. I was actually going to say if this is designed for actual consumption, and is it shareable, if you can share it to me, I can stick it on my Chromium account and then actually stick at the meeting notes. Okay, sure. I can do that. And I can do that right now. Yeah, I mean, I think that's good for posterity. This is only a one page or such. But as long as as long as it's okay for me to look externally, this one. Yeah, I had I had another version that had some more Google specific concerns in it. So this is this is just removing it's actually just boring stuff. But so so yeah, so let me go through these pictures then. So so this sort of just shows like a, you know, happy deployment here with several versions of gRPC running in, you know, a couple cloud environment years, we got a Kubernetes deployment and a GCP. And in GCP, we have, we have a traffic director, which is the envoy management server for that environment. And in Kubernetes, you know, a customer is probably running some random version of Istio pilot for that purpose. And so here we'll show like these are future versions of gRPC that, you know, hypothetically support XDS. So so everything's working happily. So if in the future, let's say, Envoy looks at the protocol and decides, hey, you know, this feature, sorry. Okay, so so, you know, feature foo here is, you know, not really what we needed. What we needed is feature bar, which has, you know, the same capabilities as foo, but it also lets us do some other things. So for instance, a boolean being changed into like a floater and into represent a percentage instead of on off. So so Envoy decides we're going to deprecate this old feature. It's superseded by a new feature that has more functionality. And so based on our reading of the breaking change policy, this is allowed. But this is fine for now because even the traffic director here has updated to the latest version of the XDS protocol. And Envoy everything still works because foo is still declared and it's still supported by traffic director. So everything keeps working. Everything's good. And now let's so gRPC has a decision to make either we keep using foo or we switch to using bar, because we either want to work with the old versions or the new versions. And so here's here's where things go wrong. Step one. So gRPC in this case, we decide hey, we want to keep up with the latest and greatest Envoy version. So we're going to switch over to using bar foo is going away. So we can't keep using that obviously. But here now if a customer has an application that's using gRPC and they update the gRPC version in that to use this bar version, then the customer's Kubernetes deployment now has a problem where gRPC can't talk to pilot because they haven't upgraded their pilot version yet. Well, yeah, I mean, you know, some of this can be solved by version negotiation, which takes place out of band like you know, and the go Envoy versions proposal. But anyway, so so far maybe but let's let's go to the next picture. So now when Envoy releases version one dot 11. Actually, the foo field entirely will be deleted. And it would be impossible to even do version negotiation, except maybe to negotiate that you can't work together. Because that field isn't even present. So you can't support that old feature anymore. So essentially traffic director and sorry, the colors here, I forgot to mention yellow was things that work with foo and purple are things that work with bar. And before we had a gradient on traffic director because it was in that intermediate version. But now traffic director is updated to the latest latest version that eliminated foo entirely. And now you see all of the existing gRPC deployments that we're working using who can no longer talk to this new version of traffic director. Meanwhile, you know, the new version that uses bar still can't talk to the Istio deployment in the customers Kubernetes environment, because that hasn't been updated either. So I think the issue here is when when foo gets deleted from the proto, we have we have problems because we can't set it any more in gRPC and no on boy management server can even read it. But okay, I'm I'm a little lost though here because I I definitely understand some of the concerns around versioning. I mean, that that makes sense. And I thought this conversation and maybe I was incorrect. I thought this conversation was mostly around the fact that we don't want like a Shaw or version based API, we want something that's tagged. But if I understand it now, I mean, you're you're suggesting that we can't deprecate fields. Like, I don't mean that doesn't that doesn't make a lot of sense to me. Right. So so deprecate, I guess has different meanings to different people. So so we're okay with deprecation, it just needs to remain available and supported until at least by the protocol, maybe not by the latest versions of envoy, if that's, you know, how envoy works. But it needs to remain in the protocol because, you know, it can be marked deprecated, but it needs to stay there because otherwise you can't even read the fields. And so you can't support legacy applications once it's been removed. And so, so there's actually like best practices for managing proto's have widely been agreed upon as if you break backward compatibility, then you really should create a new proto package that contains all the same messages, but with the things removed that you were intending to remove. So the once you've added something to a namespace, you can never remove it from that namespace. So it's essentially semantic versioning for proto's. And you do a major release by changing the proto package. So so that part makes total sense to me. And we can talk about a variety of different proposals by which we can make that better. I mean, I think we'll end up splitting hairs because I think you can do a lot of this from a shop based perspective or namespace. But yes, I totally agree that we can do better there. I still want to come back to requirements though, because that's where I think I'm getting a little confused, which is to me, there's there's two separate things here, right? Previously, we've been talking about, you know, essentially XDS as an API for envoy and management servers, there's some deprecation window of which we can move things around. You're obviously going to use the XDS API for a different purpose. But I guess what I'm still trying to understand is that, like, let me let me just suggest something. If we kept the same deprecation policy, but we basically just kept incrementing the namespaces so that you could go back in time to a particular shot or version that you support, but we're still removing things every three or six months. Does that work or do you actually need the fields to live for a longer period of time? Like, that's what I'm trying to understand here. I guess I didn't follow what you're saying. So you're saying you will still remove the fields, but what I'm saying is like, let's just say for the sake of discussion, that every three months, I'm going to go from V2 to V3 to V4 to V5. I'm going to change. I'm just, yeah, just for the sake of discussion, like I'm going to change that every every three months. Right. So this would, I think this would help, right? As long as V2 kept all of the features that V2 started with plus whatever was added in the middle. No, but it, but it, but it would be deleted from the repo, right? So, like, you would have to go back to it. We still need some long-term stability here, right? There needs to be. But that's what I'm asking. We need to keep working for a while and we need some policy that sort of says, okay, this version will live once it's created for such and such a great time before it stops. So, but that will, that will solve one of our problems, right? I mean, so, so the problem is backward compatibility and being able to tell GDRPC users versions of XDS it works with, right? So we can say we'll work with you soon. That'll solve a tactical problem. It doesn't solve. Right. So, so folks, there's, there's, there's two, hey, let's just, just, just time out for a second. There's two separate things here. And that's what I just want to clarify. So I understand what the problems are. One of them is how we version things. And again, I was saying before we can split hairs about this, because technically you could say that the proto is a version under a particular shot and that's what they are. I'm not saying that's what we should do, but that would technically work. But then there's the other concern, which I think I'm hearing. And that's what I really want to understand that to you, our deprecation policy is too short and it needs to be longer. Like, is that, is that accurate? It might be a combination of both. I mean, realistically, like it depends on how often you're deprecating features that we need, right? So, so, I mean, I think the, the time window there, you know, it's, it's somewhat of a problem, but on the other hand, so, so if we go to our users and we say, oh, we work with, you know, Envoy XDS at this particular shaw, they're going to look at us and have no idea what to do. Right. Sure. So again, I'm just doing this for discussion, I'm not saying that we should do that. I'm just trying to understand the different problems here. I understand. So, so, but that's, that's sort of like on one end of the spectrum, right? And then if we move to, okay, we work with V2. And then meanwhile, the Envoy project has moved on to V17 by then and it's only, you know, a couple months later, then they're also going to look at us really confused and upset because we're not able to give them something that works with, you know, what they might be running in their environment. Plus, it makes our job a lot harder because we have to support, you know, version two, version three, version four, version five, or, you know, we decide which ones we support. But if we're constantly having to change, then that becomes a problem too, right? Sure. There's, I think both are an issue here is what I'm trying to say, right? And then also the third issue of like essentially what does the API repository look like? Do we actually have, let's say, 70 versions in the repository or the latest version in the repository? So, I would say, again, given that we're changing in this deprecation policy where we have, we have this a six-month window now, right? We plan at some point, TBD, to go to a year. Envoy cuts releases quarterly, so that basically, you know, I think you could say we support every fourth version and you would be able to say, okay, now you migrate from the last version to this version and you're upping your versions once a year. Well, and with what Harvey was saying before, I do think that version negotiation actually helps us a lot. And again, I realize, you know, we have four minutes, like we're not going to figure out the final proposal here, but I think just for framing the discussion, I still think we need to be a little bit crisper about what the actual requirements are, because before we even start talking about solutions, because again, I think what I'm hearing is that the length of our deprecation policy is not long enough, but even there, I'm still a little confused because Envoy, the proxy, as I understand it, may actually not be involved in this problem at all, right? Because you have a GRPC client, you have Pilot or you have, you know, Traffic Director or something like that, and Envoy, the proxy may not be involved, it's just the protocol, right? Right, that's absolutely right. And I think one of the key points here that isn't sort of the problem we're talking about, but is the underlying philosophical issue is, if we intend the XDS protocol to really be an industry standard thing that is not Envoy specific, then tying its versioning to Envoy doesn't really make sense, right? There are all of us here, and it needs to be an independent thing. Sure, and again, I'm confident that we can find a proposal that will work for everyone, like I think the solutions are pretty well understood, and I think with the right tooling, we can figure something out here. So like, I'm not concerned about that. Mostly per my comment on the issue, I just want to make sure that we make all of you happy, but we don't decrease velocity, and I'm actually pretty confident that we can do that. But with that said, I think that actually laying out what the different scenarios are, like you did in the doc and what the requirements are, would be very helpful for helping us build the final solution. And I think that would be just a framing of the different scenarios of using the protocol without Envoy, using it with Envoy. These are the different deprecation dances. These are the timelines that you would be looking at. Is it possible to put something like that together, like not even prescribing solutions, but just talking about the world that you would want to live in? Is that doable? Yeah, I can take a stab at that. Sure. And yeah, I mean just... We'll put it in my trap. Sorry, yeah, we'll need to... Yeah, we'll need to sort of, I guess, work together to put together a doc for that. Yeah, so we can take a stab at that. I guess this meeting's only half an hour, right? Yeah, I mean, we will definitely need to meet again on this and talk more. And again, I want, in the two minutes left, I'll stop talking. I want other people to ask questions. Yeah, I think basically what we do is we get that doc, we stick it to the bug, we discuss back and forth with the doc, and then we schedule a one-off meeting where people can attend that one to discuss it. And that'll be at least an hour long, probably. Yeah, yeah, because again, like I am actually quite confident that with the right tooling, there's, I can think of four different ways that we could solve this problem and they're probably all fine. But I'm still concerned that if we still deprecate things too fast or do things in the wrong way, we're still going to end up making you angry. So I feel like we have to look at this a little more holistically. Okay, sounds good, yeah. And then just a couple of points that I'm going to make is, you know, the proto-fields really have to stick around in order for even version negotiation to work. And that's where sort of the technical aspects of backward compatibility kick in. And then, and then secondarily, you know, we're certainly open to, you know, being able to mark new features as experimental when they're introduced. And those would be available to be removed just like, you know, you do today with anything except, you know, that could even be at a moment's notice because we just wouldn't use those features. And then that would be a clear contract where you say, these are the things that we pledge to support and these are the things that, well, we're still messing around with them. No, and that doesn't make sense. Yeah. And we don't want to get better at that because we do have support for that today, but it feels like there's some communication gap between that and the users. So we should definitely fix that. The other thing that I think that we could do that will likely come out of this, sorry, just super quick, is I think that we can enhance the existing deprecated attributes on the protos to say things like deprecated for Envoy, but not deprecated for the protocol, for example, and like we can do things like that. So anyway, I'm confident that we can figure something out here, but I think that if you could put that initial doc together, that would be really helpful. Okay, we'll take a look at that. Thanks. We have to go. Okay.