 Morning Give people a couple of minutes to join see what we got on the agenda today XDS incremental XDS As long as Chris is here, I'm gonna add the DCO bot to the agenda That's and complains my way my my favorite topic Could we add I think Conversational PR on the VV integration that touches on some of the modularization that I think is needed Parties might be a good place to kick off that discussion as well. Sure. Thank you. Oh, should we just get going? Yes, no, sure. All right You want to go first Harvey with the incremental proposal? Yeah, so Nicholas and myself have What a proposal out for sort of evolving XDS, you know It's designed to I think I've mentioned this in previous community meetings We we have some pain points around scaling XDS up to very large configuration sizes And also dealing with use cases such as serverless where you need to do essentially late binding You need to on-demand load from on boy possibly with that say request halting the data pipeline Additional cluster resources and so on so there is a very concrete proposal out there Currently under review with a whole bunch of feedback threads I think we basically have consensus on where we want to go with that But this is kind of like an opportunity and just to sort of telegraph this to folks that Please speak up if you have any sort of thoughts or opinions here because we would like to actually start turning this into concrete photos and actually implementing parts of these over the coming months and Really awesome to make sure this moves in the direction Particularly if you have like shared concerns if you're working on similar kinds of it's a surplus use cases Would you ever do anything else Nicholas are you mean? Yeah, I'm just curious Once what wouldn't have when did we get enough feedback and when should we like decide this to plan? There's there's a bunch of comments on the dock I would encourage everyone to go and read the doc because it's not I don't consider it a huge change It's actually the implementation within on boy is actually really simple But it has large implications for you know for long-term use So if people out there kind of care about this I would really encourage people to look I'd still like to resolve some of the version comments So if you want to talk about that now for like five minutes, that's fine I don't think we're actually very far off So my my main concern was really as I started to look at kind of implementing on boy side Version tracking and stats and an admin output. I basically realized that in the incremental case Sorry what oh In the in the incremental case having you know one version per resource But not like a transactional version is pretty problematic from a debugging perspective So we can we can do it in the dock But I would like to figure out a way where I get that you might want to have a perversion resource It would be awesome if we could simplify it such that you know There's a there's a transactional kind of version and then any resource that is applied and that transaction just gets that version I think that's a lot simpler, but if we want like a perversion resource I would suggest that we keep the top-level version So there's basically the concept of a transaction version and then we optionally allow a perversion resource So if if no perversion resource is applied on by we'll take the transaction version and apply that to each resource that was applied and If there is a per a perversion resource But we'll keep track of the transactional version for debugging reasons and then on a per resource It'll apply the perversion resource Yeah, I mean the thing is like this I do have a worry about there being some complexity here in particular like you know I'm understanding how you're supposed to use these different versions and different management Servers and they'll be implemented as someone inconsistently for example In some situations, you're only going to have perversion resources and you won't have this like CDS wide Versioning rights. Yeah. Yeah, no, I I totally hear that I just I feel pretty strongly that it's a non-starter to not have this this like transaction version Just because it's a it's a super common debugging situation in which your your example holds like you'd go from zero to two to zero And it's like you don't know what happened, right? I mean, it's like we we we have to allow people to debug those cases I think I definitely want to know I think I would want to know in debugging. What is the last Transaction version that arrived in the wire exactly. Yeah Yeah, yeah, yeah That's different that is just telling me essentially what was the last Local exchange I made with my management server. It tells me nothing about the semantic content of the resources. I upload it use nouns for that Sorry, what the notes the notes for debug? I Mean we could but it Yeah, I mean like these are these are things that we probably shouldn't rattle about here like we can take it back to the doc I don't like I don't feel super strongly about like the semantics of of Like whether we use the nonce or use the version it just I guess my point is I Suspect that most people do not need per resource version complexity And if you gave them that one version field that is actually going to be enough for most people Because the way that I would likely implement it at lift is even if we were doing incremental I would basically I would be I would be incremental But within a particular back-end config shop effectively So it's like as I asked for the resources, right? It's like resources might come back at a particular version and the version field actually might be the same So it might be still like version 4 right and then as I asked for more resources I just internally track them at version 4 when they were applied and then let's say the back-end config system Someone does something and it switches to version 5 so then if I incrementally ask for something the next message would come back with version 5 And that's independent of of knots. So that's why I actually think that you kind of need them to be separate So like I totally get that there's extra complexity here, but I think that what I've proposed It's the most flexible and it'll make everyone happy Like however you want to design your system you can do it basically I mean I said like what you're describing there and using this version to describe like the current state of resources that you're serving up from the management server It's not how the version is used today, right? It's used essentially Just to acknowledge each individual That's well on the wire which is different. No, no, so that's how so that might like I'm not sure what exactly is happening today, but that's how we're using it at lift today Like we have a back-end Shaw basically that is the version and that version stays constant like even through different fetches, right? So basically like we have a Shaw of config and then as the config Shaw changes, right? Okay Yeah, let's continue this in the dog Another issue that popped up when I look through the document is this appears to be the first place Where a new feature will be supported in the GRPC data plane API But not in the rest data plane API. And so I've got concerns about breaking with precedent there Yeah, I mean the main reason for this is just that doing this with the rest is a lot more complicated because you know GRPC you have bi-directional streaming semantics So it's very easy to imagine delivering some partial resources and then later on asking for more and so on having this two-way exchange With rest we would have to decide no way to actually fit that on top Retrofit that on top of rest. I mean is there actually a strong need for this like this level of scalability and Undemandness in the rest world. Yes Can you elaborate sort of on what the use case there is? Our control plane that interest currently uses the rest implementation of the envoy data plane API And we're we're looking at using on-demand config loading Because we've got this similar number of clusters to support as other folks who are also interested in this feature But I mean, but couldn't you switch to GRPC? I mean I Think a case could be made to to rewrite one's data plane See me to rewrite one's control plane data plane interface It is something I would prefer not to do gratuitously Yeah, I mean give me the existing investment in Rest and the existing precedent with an envoy. Sure. So here's I think here's probably our stance is I don't Like I don't think we're opposed to supporting this functionality in rest But I don't think that we can assume that the people who are doing the work like have to backfill it because it's totally non-trivial So if if you or someone else wants to come in and figure out like how to do it with rest I don't think there's gonna be any opposition to to to that But like I think you probably have to do that that heavy lifting. Okay, isn't that fair Harvey? I mean Okay, so I mean in the in the interest of time Why don't we go back to the dock because I I do feel I just want to make sure that we really think through all this Virgining stuff because it's the kind of thing where if we don't think through it now, we're gonna have a problem later So it's worth investing just some time into that now. Maybe I could schedule some time again this week So we can have a half hour just yeah, sure. Yeah. Yeah, I mean, why don't you? What I would do is is there a github issue tracking the dock. I actually can't remember I Think there was at some point on the envoy API repo Okay Why don't we why don't we do this? Why don't you make a new issue in envoy tracking in implementing incremental XDS? Why don't you put a link to the dock in there? And then maybe just say that we're gonna have a meeting later this week and and just see if anyone else wants to join And then we could schedule a dedicated meeting towards the end of this week. Okay, okay Great. All right Let's see. Okay. That's the gsoc. So yeah, it's actually a fantastic. We have a Google summer of code students She's gonna be working with envoy. Oh, it's Matt myself. I think Constance is listed as a mentor as well this is a new route I haven't pronounced it correctly and He'll be working on fuzzing. So we were actually kicked off the fuzzing efforts already And we're making our way through a whole bunch of backlog of sort of server config bus stuff I plan on sort of looking at protocol fuzzing shortly But I think you'll be looking at a hot. There's a lot of work to do that We actually have a hot we open an issue or actually added to the issue yesterday a list of potential projects to work on Please do contribute to that issue if you have additional things you'd like to see buzzed in envoy This is actually a really useful way to find bugs We we have like, you know continuous sort of fuzzing using what chromium's cost of fuzz running and Yeah, it's you you should expect that your adversaries are also doing this Yeah, my my thinking is let's have them start on the server validation which like per our conversation will I'm sure expose like 60 bugs and That will because that's so similar to what you already did It should be pretty straightforward or to actually make that happen So my thinking is to have them do that have them fix like the 50 bugs that occur in the validation path And then we can maybe move him on to more complicated stuff. Yeah, I think maybe EDS up to that. Yeah. Yep. All right Do you know when he starts? I think it's about a month's time Okay, great Sorry, is the idea there that the fuzzing would eventually be run as part of the continuity of integration suite Probably not you need significantly more resources than see I has available but we do essentially have See I for fuzzing with so this cluster fuzz thing. I described this is infrastructure to the chrome project or chromium's Operating for a whole bunch of open source projects on every commit that you make Well, I should just check it out and spin up a bunch of VMs and GCP and Throw a bunch of resources at fuzzing that and actually files issues automatically with the onboard security team When they come up Thanks. Yeah, it's it's super awesome like the the bugs that it's already uncovered just from config loading. It's fantastic Obviously those aren't that scary, but we have ideas of how to the the problem with fuzzing or the quote problem Is that it can't use any network? So how we do normal integration tests basically won't work But Harvey and I have an idea of how we can make a custom transport socket to use for fuzzing And then we should be able to actually basically fuzz the full flow like all the way back to the router And I suspect that will uncover some more interesting and scary issues So that's really exciting Okay, let's just talk really briefly about the ddo bot like I send off a really complete Nasty email to Chris last week. I like I'm kind of at my wit's end Like it's this is like by far the most painful thing that we deal with It's like an endless stream of people that don't know what to do or like the bot is broken So I just you know like to me I put in that email I think there's some like really basic usability things that can be done to just help guide people from the bot Like in what to do and what went wrong so I guess my main question to Chris is Since most cncf projects are moving towards dco like this ncf Invest some resources and making the bot less terrible. Yeah, I mean if you give us explicit issues We're happy to fund some work to send poor I mean, it's all open source. So we're happy to improve it So just just let us know in detail what you want and the NASA gram you sent you listed a couple things So we'll take a look at those but for other folks in the envoy community if you have specific issues You'd like to see improved. Yeah, I mean people should just reach out and say what their issues are my main things Which I think would fix it which are already in the email is just that the bot basically needs to be super clear of like What it was checking what was wrong and actually have a link to some page with like detailed information About how to fix what what you did wrong Possibly with like it commands and like the entire thing It just needs to be a more like hand-holding process to help people understand what went wrong And there are edge cases in the bot were like there was something that happened last week We're like if your email doesn't match like your github email the box even respond. It just hangs Like we have to fix, you know those those bugs. Yep. All right. Got it. Great Okay, do we want to talk about the VPP stuff Sure, so I Did last night I pushed a poll request That is entirely Informational so it is a You know a proof of concept in the classic piece of crap Implementation style meaning I basically just did whatever I had to do in order to Make the thing work so please take the amount of replication and You know just plain hacking with a grain of salt but I think it provides some value in Setting some discussion points on how we move forward To make it work in a deployable fashion So I guess to summarize the the biggest highlights That are points that need to be resolved are We the envoy currently has two socket classes. There's transport socket and then there's connection socket and Because they're different classes and they're not there's there's no so so the The integration of the transport socket stuff through the extension stuff that was just recently pushed Worked fantastically right so that's great. The problem is the listen and connect side of things is a separate implementation And so we need to figure out what the plan is to Unify that or to put in a parallel effort to allow The specification of an altered alternate transport for the connection side I took a stab at you know naively just trying to go refactor things and Got entirely way over my head. So So at this point what you know what I'm looking for is some direct feedback on You know where I hacked things and where things are just You know done Ignorantly because it was what I had to do to get it to work and How we can help so there's really kind of like two phases that need to happen You know one is we need to make the the unvoiced stuff so that the extensions Are clean and then I can go add the the VPP implementation under that and I'm perfectly willing to work on any or all of that and So the question is you know, how do you want to move forward and identify and what's the right thing to do? What's the the other the other caution I have is it in my Attempts to go do some naïve refactoring it became clear to me that This is going to be high risk if we do it, you know the way I would have done if it were a clean slate Implementation meaning I would just have a single socket class and then Dura, you know inherit her inherit that where we had separate Entities that needed to use different characteristics of that I think that's going to be way way too risky to to lop off in one in one chunk And so I really need you guys to let me know Where else how else could we cut this such that we can? Take more baby steps that are lower risk Because I don't think this is a trivial implementation Yeah, I think what I'd like to do is I don't have any comments right now And I don't think anyone else will mostly just be not and I pushed this late last night So that's fine Yeah, but what I what I'd like to do is I will tag like four or five people on on the PR To kind of just take a first pass and kind of look at it And then I think from there we can either Talk about it again in two weeks time at the next community call just because I'm I kind of have a feeling It's gonna be complicated enough that You can do it in the PR, but you know can start there, but I have a feel like we're I also wrote up a Google doc that's linked in the PR that describes, you know the overall scenario I'll continue to extend that I realized it didn't I didn't include the complete test configuration So I can do that so that somebody else could stand up what it is. I ran and and test it And you know, there's there's sections in that Okay, my my one question is do you feel like this is enough that we should loop in the Covalent people or do you feel like we should get like a little further along? Before asking them to actually look at it I would I would tend to think you we want to get a little further along because my concern here is less about You know the stuff that I'm adding from the VPP world and more about the structure of The refactoring that's gonna happen on boy itself So yeah, it's more that I just and this is my comment which we've had in a couple of different emails with with with Ed I just think I don't want to wind up in a situation where we discuss a whole bunch of like extension points and refactors and then we go to the covalent people and the way that they're using EPPF like it like doesn't work for some reason So like I think it's totally reasonable to start the conversation But I'd like to once we think that there's enough to look at like maybe we do like an initial round of reviews I would like to get them involved just so that we don't wind up doing a whole bunch of stuff and agreeing on stuff You know after several weeks, and then they come in and say oh, but like what about this This is not gonna work and then we're and then we're back drawing work I'll see my strong tendency personally is more eyeballs earlier is better. Yeah You know as long as it and this this is a great commuter So this is not a problem as long as everybody is constructive The more people you have interacting early the better. Yeah, they're certainly constructive. Yep So I'd love to see them pulled in I know that we had some stuff around the quick work where there was an interest in sort of things close to this as well Yep, we generalize out from what we're doing right now with sockets I think that would be a very valuable set of input to how to yeah For all eyeballs. Yeah, and and you'll you'll be in Europe next week, right? You will be okay. Yeah, so we can potentially talk about this in person also Now I think that would be awesome. Okay, great Did did anyone else have any comments or questions on on this stuff So I guess the other comment I would make is given that we've got multiple projects that are gonna work on this it would help if we were to Solidify the requirements, so I don't know what you want to do that within You know the current Google doc I have we could expand that we could run a separate doc that codifies at the very least the use cases that That we need to go pass. Yeah, and that's where and that's where I think that Considering quick VPP and the covalent Cillium stuff, I think if we look at those three cases as long as we have people at the table Who can speak to those three cases? I think we will end up designing something very solid So a Reparment stock of the things that you need that we could append on to for quick or do we think we're going to meet? And then discuss and create that Coming out. I think we need to meet and discuss it I Mostly what I have are a bunch of questions when you look at the write-up. I have Just because I don't have enough experience with the envoy code base to make any You know meaningful contribution at this point from from the VPP Park the other thing to note is that What I did here was just take one of the One of the test cases for the VPP host stack and add Envoy as a TC proxy into it. So Because the other thing Given that I'm also, you know part of the the test team on VPP is To the reason why I like the use cases defined earlier so that we can stand up test cases much earlier in the development phase cool, I've Just looped in the people from our side who've done similar work for quick and I will take a look at it as well So all comments are welcome, you know throw stones, that's fine You have a particular time frame in which you're like really itching to get this done Like is there a particular milestone they are trying to meet? Ed? Yeah, so there's a matter of aspirations and there's a matter of realities So let me sort of throw this out purely as aspirational It would be lovely, you know, if we could get something that that was up and working that we could plug into at some point this summer I think that would be desirable I just want to make sure that it wasn't like, you know four weeks from now or something like you like last week If it were four weeks from now and I would certainly not try and slow things down, but I would be shocked You would be seeing my shocks face So, you know, particularly trying to get something right that's usable by multiple parties It will take a little bit of time to sort it out And like I said, so sort of sort of sort of things out this summer I think is a reasonable aspiration and we'll just see how the trips fall. Yeah, I mean I would I would like to Like I'm I'm pretty aggressive in terms of getting stuff done. So I think I think we can do early summer I mean like let's let's try to iterate on this But I would also be shock shock face if we can figure out a design in less than four weeks Like it's just gonna take a bunch of time. So let's let's make sure that we loop people in and give people time to actually comment I just want to make sure that we get this right so that we don't have to do this again. That's my Yeah, I like early summer very much was an aspiration. Let's try and drive to that. Okay, sounds good Cool Did anyone have any other quick questions or comments or stuff? Cool. Well, have a have a good week Bye. Thanks. Have a great week talking a couple weeks