 Welcome folks. I've gone ahead and stuck a link into the chat for the meeting minutes. So while we're waiting to get started, if you guys could go add yourself there, that would be fantastic. For those watching, how are you? Good? Just hang it up. Hey. So just so folks remember, we do record this call and it does get posted to YouTube. So keep that in mind as well. Also, let me go ahead and stick and we had a bunch more folks just joined. They link for the meeting minutes. So you can go add yourself there as well. See peer pressure. Also, as always, if folks have things they would like to see added to the agenda, then please feel free to go ahead and add them. In particular, we're trying to track and report on the things that landed the past week and the things that are in progress. So if there are things that are not already on the agenda that match the exposed criteria, please add them there. I want to make sure we give everybody calls out, call outs for the good work they do and that we get some senses of community of what's coming down the line. Do we get going for it, Rick? Yep, let's get started. So, welcome to the next Network Service Session meeting. So we have this particular meeting which occurs every week at 8 a.m. Pacific. We are also involved in the telecom user group which occurs every first Monday at 8 a.m. and every third Monday at 4 a.m. We are also involved with the CNCF Networking Working Group which is currently being rebooted. So we will have time, but we'll know what time that comes on pretty soon. So... Also, is anyone able to share the meeting minute so we can all follow along as we go? Yeah, we'll continue on while we wait for somebody to generously share meeting minutes. So, major events that are coming up. So we had a CNCF webinar recently. I believe we're waiting for that to be posted. So... I think everything's posted already? Yeah, I think they posted it the same day. Oh, cool. Yeah, let me go find the link. We actually have a playlist of NSM videos that they've put together. I didn't even realize this till last week. Nice. Yeah, it is here. We should probably ask if they'd be willing to split the meetings from the events, but not too important at this point. So we also have... So we also completed, sorry. We also have open source summit coming up in Leon with a talk accepted by Ivana and Radoslav, which is an introduction to NSM. So if you are going to be there, definitely feel free to show up. Whoever is sharing, we're currently seeing your code, not the meeting minutes. Strange. Okay. The good news is it's all open source code. No, no, the code is open source. That's all fine. It's just... It looks like a smash, actually. So kudos for actively working during the meeting, Radoslav, that's appreciated. But there we go. So if you ever want to know what we're doing. So, cool. Sorry, my audio is doing a couple of weird things. Can you still hear me? Yeah. Perfect. So we also have KubeCon and CloudNativeCon North America. They're coming up where we have NSMCon announced and we have the call for papers have already, or proposals have already been closed and the talks have already been posted for both KubeCon and NSMCon. So feel free to take a look at the schedule and definitely let us know what excites you as well. We have KubeCon and CloudNativeCon Europe in March and April coming up, which is going to be in Amsterdam. The call for proposals is already open. And so that's why we're calling it out at this point. And I believe the calls for proposals, I think in the January... Who deleted? No, the end in November. Oh, in November? Okay. Who did the calls for proposals? Who deleted this? It was there just a minute ago. Yeah, it was definitely there. There it is. Okay. Sorry, I was trying to pick up and I cleaned up the wrong set of things. I was trying to clean up the KubeCon North America piece and instead made a complete mess of it. We should probably get links to the CFPs as well. Yeah. Oh, I made a mistake. The notifications are in January, but make sure you have your proposal in by November 22nd. Yeah, this means that before you go to KubeCon North America, you should send your proposals for the next KubeCon already. Exactly. I would expect to at least go to North America, see how things are going, maybe, I don't know, figure out what the next things are that we want to talk about, but I don't know. Okay. Yeah, I think it's the size. So the larger that the event gets, the more further out ahead they wanna plan and KubeCon's getting huge. Yes, but they also, they move the dates like two months ahead. At the end of May, it's like end of March. So it's, yeah. Cool. If anyone knows of any other events, let us know. So we also have the social media community team. And I believe I saw Lucina on the call. So you have the floor. Hello. Yes. Great. Okay, thank you. So the last week, we gained a few more followers, 14, followed about 15 more and posted tweets and retweets, 21. And I posted, and there's links to all the new posts. So posted the pre-registration for NSMCon, be sure to add that to your KubeCon registration. And the $50 fee will go to the CNCF diversity scholarship fund. The, I've also posted on the webinar, October 2nd, I thanked our lunch sponsor Juniper Networks and also sent out a thank you to the four different network service, network service mesh con sponsors. So this week I plan to promote the individual sessions and speakers that were selected in the lineup for NSMCon. I'll promote the sessions at KubeCon and send more reminders to pre-register to NSMCon and also promote the sessions at open source summit in Europe later on in this month. So when available, I'd like to share a link to the Telcom TV interview from ONS. If there was a video of the 5G panel from Linux Foundation Networking, I'd like to share that as well. I did retweet the ONS keynotes link and the contributors podcast when that's available, I'd like to share that as well. This is all fantastic. You just do such an amazing job helping us. It's very much appreciated. Also, in addition to all of this, you pointed out a nasty bug in the website in terms of being able to copy and paste text which I think is now fixed. Oh, wonderful. Thank you so much. Oh yeah, it was one of those things. Almost everything is easy once you know you're screwed up. Yeah, we share everything except for the ability to copy and paste on our website. Yes. So that should be fixed now. That's awesome. That'll make everything easier. Thank you so much. Great. So the other thing that may help you as well, I noticed you've been taking screenshots, but the anchors used to over scroll, that's also been fixed. So links will show the things that you want them to show. Perfect. Cool. So we have announcements. Do we still have fuzzing bugs available? I believe we do. These are great bugs for beginners because they pretty much tell you exactly where to look and they tend to be things like, there is some sort of unusual mill check that needs to go, there's a mill check that needs to happen or something like that. So these should be very, very good beginner bugs for people to get started. We've been sort of intentionally holding them open so the beginners can grab them and work on them. Obviously we'll go and fix them before the release if it comes to that, but they make really good things if you just want to pick up and try some stuff. By the way, I noticed we had a lot of additional folks showing up, so I'll replace the link to meeting minutes for folks so you can add yourself to the meeting minutes. So please do add yourselves there. Well, so we have the status of the project. So, Ed, you have the floor. Cool. So this is one where, again, folks, please feel free to add your stuff. I'm not sure I caught everything going by, but the first one I wanted to mention is we've probably noticed we've been having some in-domain testability issues that came out of the blue that didn't appear to be related to any code changes that we've made and they were all happening, I believe, in AWS. And so I think Artem has finally chased them down. We've got a candidate for what may be the root cause having to do with some wonkiness in an update that AWS made to it, CNI. Do you want to say a few words about that, Artem? Or not? As far as I know, Artem today is only listening. Oh, that's fine. That's fine. Some issues. Yeah, so effectively, Artem figured this out. It's a very obscure kind of problem. He's really good at those. We think this is the root cause and he's currently attempting a fix that would downgrade to the previous releases CNI. It's a pretty good indicator it was the root cause that the upgrade to AWS CNI happened at exactly the same time that our in-domain tests started having issues. But this is why we test on a bunch of different platforms so that we catch things like this. And currently the tests have been disabled while we work on fixing the issue because there's no point in failing the code changes that have nothing to do with this CNI failure. Cool. Any questions on that? All right. So things that I noticed that landed this week. So do you want to say a little bit, Radoslav, about the kernel forwarder support for metrics? Yeah. So basically it's gotten worse. But it sounds like it was very complicated. Yeah. I think we slowed down that a bit. But yeah, again, it got merged. So it introduces metrics support for the kernel forwarder. So I used the model that VPP was using with this source dash something and destination dash something, that model. So it's the same and it should be interchangeable. Fabulous. Have you talked to Matthew Rohan? I don't know whether or not he's planning on doing any updates to the Skydive support. No, I haven't talked to him, no. It might be worth reaching out to him and see if he'd be interested in showing some of the metrics on those links. It's the kind of thing that he often likes. He's been very kind to us on the Skydive front. Plus he knows the Skydive community people. They may even do it. The Skydive community has been incredibly generous with us. Okay. Yeah. I will reach for him. Excellent. If I can add just a couple of sentences here. So first, this is actually, I mean after Ivana figured out that there's some issues with the way that the VPP agent based forwarder exposes metrics, actually doesn't expose metrics to there. Yeah. This is the first time that we have some actually working metrics based on the infrastructure that did exist essentially. So I was really used to that was existing. And based on this, essentially Ivana has prepared a set of patches in a PR, I believe. Ivana that actually adds exposing these metrics in a prometheus database. Oh, that's fantastic. Could we add that to the in progress section? I'm current. It was finished, but I'm currently trying to build a specific latest master. I'm having some issues with that, but yeah. Oh, okay. It's up to me. It's up to me. But I have some more to module and other issues hitting today. Awesome. Could you please add it to the in progress so that we can follow up on the next meetings and see just how the link there is useful and then we'll take it. Adding it to the in progress is super helpful because that's the first place I go looking for things that have actually landed this week. And then of course, if we can do this, like if we can help Matthew base his work on top of this and show some link statistics, that would be awesome for everyone, I believe. And I would love to also see, I presume that your patch has an optional way for us to install prometheus so we can go look when we're playing on our own and see the prometheus metrics land. Yeah, that would be fantastic. Yeah, Ivana, I don't know about prometheus, what was I mean, we talked about health charts and stuff. Yeah, I did them and I'm trying to test them, yeah. Okay, looking forward to see the PR and review it, actually. They're upstream, but I have some things to tidy up. It's the CIS failing. Okay, cool, cool. That's awesome. So no, metrics are super important, observability is super important. So I'm delighted to see the stuff finally landing. And the good news is we should hopefully get a fix for VPP agent that will look to those metrics as well so that we can get that working too. But I'm glad that you guys continued moving all this forward because it's important to get it done. I think that VPP agent metrics are coming soon. Hopefully you'll share with me from the legato community. Cool, excellent. So the next one on landed this week is the vastly improved build times. Do you want to say a few words about this, Ilya? Oh, yes, I can. Now we built binaries on host machine and then just copy the binary file to the Docker image that helped us to improve the speed of building. And also in the PR I removed many old targets that we had in K8 makefile. And I think that's all for it. I think that improve is a very weak word here. It's like, I don't know, maybe jetpacking the build or something like this. So I did run some local numbers myself before and after because I was curious and I have almost the perfectly optimal build environment. I've got a very fast laptop, I have a very fast internet connection. And if I cleared out my Docker cache before and started from a cold Docker cache, it was taking me 15 minutes to build everything. Make K, save. With this new patch, 15, 15, yeah. With this new patch that vastly improves the build times, it's down to a minute or less from a cold build. It's just incredibly faster. Have you tried doing minus J4 or something like this because actually this works now? Oh, no I haven't. That's awesome. Now I'm really excited. Oh, no, that's even more exciting. Great job. Thank you guys. Thank you, that's really, yeah. Okay, J16. My laptop isn't that cool. Very cool. Cool, so next up, we had some open tracing improvements. Do you wanna say some things about this, Andrei? Yeah, a bunch of open tracing improvements that both the SDK and the NSM. And actually I'm almost finished with the SDK like approach for the NSM request and close. It's in a pull request at the moment and we'll be looking at where you guys. So it's do the same as we have a SDK for the endpoints but do for the entire chain of the requesting and closing network service requests. So for now, open tracing shows pretty cool actually traces. I can have a short demo if I can share my screen. So the Slavkuli please on the screen for a moment. I will show how the open traces look at the moment. Okay. Done, yeah. Okay, do you see my screen? Yep. Okay, nice. So this trace for pretty complex scenario when the data plane is die on the remote side. So firstly, we do requesting a connection. So we can figure out on any place what's happening on by looking to the attached logs and so on. After for example, we see with data plane down stuff. We know what is doing sending remote connection update with the data plane is down. And on the Kubernetes master node, we receive events and when we receive the event when the plane is die, we know what to do with healing and do the healing itself with all the stuff and you can figure out what's happening and mostly in real time. And if any error will occur, it will be marked here in open tracing as the error. So it's more obvious what to do with all the chain. Yeah, so I mean, I think that what's really amazing about this is rather than having to fish through logs to figure out what's going on, when you have any kind of a problem, you can literally get down to the individual subpart of the code and what it was doing where the error originated through the system. And or you can also see what goes in and goes out at every step plus any logs that are happening at any step. Yeah, one more interesting point here is we just need to spam identifier from our init container. So for example, clients do the request the first time and any healing like data plane down and so on happening inside the NS manager will be automatically added to the first traces when we requested the connection. So all the life cycle with healing will be automatically attached here. Oh, fantastic. That is really, really awesome. This should make it much easier to track down what's going on inside the system. So one quick question. Have we looked at the possibility of being able to export and attach the open tracing when we have CI failures? Yeah, yeah, yeah. Actually it just could be added. The trace could be downloaded by using just the cool stuff. So I wanted to do it next to just to download all the spans. I need just the JSON and it could be imported by the Jagger UI just by putting the JSON file here to track what is happening. So the idea- Yeah, it will be next step for, I'm not sure if it will be a good idea to include for all possible tests, but I think about improving the cloud testing stuff to do if a test is failed to enable tracing and run it with traces. Yeah, I think you're right. I mean, basically, if we have test failure, that's probably when you would want to do it. Run it with traces, yeah. Yeah, because then we can literally say, okay, well, we had a test failure, what happened? And you can drill straight into the trace and figure out what happened, which should make it easier to track down test failures. Cool, that's, I find that very exciting, but I'm highly excited by things that make it quicker and easier to develop. And to operate, I mean, don't just estimate that part. That's true. I mean, there's definitely operability considerations here as well, because now if you've got some problem that goes wrong in the system, you can see exactly where it went wrong. It should hopefully lead to much more interesting bug reports as we start getting those in from the field. Yeah. Yeah. Makes more of a pause. Yep. Okay, that's the kind of work we're looking for. Yeah, indeed. Cool. And then we had the SDK style refactoring of the VP Asian data plane. We had that land this week. Do you want to say a few words about that, Denise? Probably I think Denise has some troubles with your audio today. Yeah, the SDK refactoring was merged. Next one will be the SDK like refactoring for my stuff for entire MSM. Yep. And probably Radoslav will have to inspect this refactoring and can figure out how the kernel folder could be and should be aligned with this thing. I think that that patch, trying to remember, I want to say that that patch actually did, at least to make sure that nothing broke in the kernel forwarder. Yeah, of course. That's not the point. The point is like to also refactor it in the same way so that they all look like, you know. No, that would be much appreciated. I think there was also, I want to say I suspect that Radoslav was actually part of the conversation on that PR, but I'm not sure. Yeah, he was. Okay, good. So that's good. Cool. So that's the stuff that landed this week that came to my attention. Is there anything else landed this week that folks wanted to talk about? I don't have anything that landed this week, but I have a question because I would have to probably drop in a couple of minutes. So I just wanted to ask, I put it on the bottom here, but I just want to bring it up with the community in general. Of course, I think that it would be a longer discussion probably, we should, you know, next time probably. So essentially I was talking to some guys around here on conference, et cetera. And I got a pretty complicated question that essentially boils down to how do we deal with noisy neighbors? Like, we don't have any quality of service notion like SLA guarantees. I know that we have been saying, okay, so this is not, this is a function. This is not our thing, et cetera. But essentially providing some guarantees about a bandwidth or something probably is not that far from what we want to do with NSM. So I don't know, maybe it's a longer discussion. Maybe we need to create a PR spec, whatever. But they've been just in kind of initial thoughts from the community here, what everyone is thinking about. It's a very initial, it's definitely an important question and it's a very interesting question. And some of it is, and I think you've partitioned it correctly, right? Because there are two places you might stick cost-related stuff. One of them is in the NSM forwarders and then the other one would be in the particular NSCs. And in fact, I think we have a talk at NSMCon where someone has actually stuck some interesting cost discovery stuff into an NSC and they'll be talking about that there. So though- I mean, how the NSC is going to prevent me from floating with just a random broadcast packets, just- That's the question of the forwarder side. The comment I was making about the NSC side, I think there's potentially something there for both of them. The other comment I'll make, having dealt with a lot of QoS over the years as a really deep networking guy and having dealt with a lot of really low level forwarders is that QoS is almost never the answer. Sometimes it is, but it's almost never the answer because it turns out simple QoS things like policers are very useful, right? Basically saying, I won't let you flood me out. Complicated QoS things like RSVP where you're trying to reserve the bandwidth across a complicated system, those are almost universally a bad idea. So, you know, and because the interesting thing about QoS is in a really high performance forwarder, doing whatever you're doing for QoS actually consumes way more resources than it would take to just service the packets appropriately anyway. It's really software forwarders. But policers are potentially very interesting. You know, policers and shapers are potentially very interesting in the system. So I think there's definitely a really interesting conversation to be had here around what we might want to do. So yeah, that is interesting. But the reason I sort of made the initial comment of QoS is almost never the answer is at least half the time when someone wants to talk to me about QoS, they want to talk about the complicated RSVP version of QoS that's been sort of a universal fail. We can also look at some of the building blocks as well. So if we take a look at how NSM is built, we know which streams come from where, not only from an individual endpoint, but also across a whole flow of packets. And we also have all of the monitoring primitives there as well. So it may be possible to have something that can monitor the overall quality of a connection at a high level from end to end. And if we see that there's issues with noise or something similar, it should be possible to get enough context on any given connection and any given node that we can then have something, try to remediate in a variety of different ways. But the first part is getting all the primitives there so that we can measure it without destroying the overall performance of the system. And so. This is almost certainly going to be more interesting coming from an NFU perspective than an enterprise perspective. NFU guys potentially could produce an amount of traffic that would produce these kinds of noisy neighbor problems as they're flowing through a CNF. There's just no way that an enterprise app is going to produce enough traffic to matter. They just can't drive enough traffic compared to the speed in which the forwarder can forward. So, but this gets to be even more interesting when we start getting the hardware next stuff in because right now, if you really want to be driving enough traffic through the box to have this be an issue, you can't have the kernel in the way, right? It just doesn't work. So, but I think this is a very good question. And, you know, I'm trying to think of like the places you might stick it in. My initial thought would be that maybe a cost context might be the way to handle it as part of the connection context. Just kind of requesting bandwidth or kind of, I wanted to guarantee this standard. Well, I mean, you have to be careful about what people believe when you say things like requesting bandwidth. Because again, we can give them shapers and policers, but I don't think we can actually guarantee bandwidth. Okay, best efforts. I don't know. Yeah, I mean, it's sort of the difference between the RSVP approach versus the policers and shapers that actually do kind of work. Because if you say, I can guarantee you bandwidth, that could mean a number of things. It could mean things like, I guarantee you bandwidth through your forwarder, I guarantee you bandwidth out of the box, from here to the actual physical network. Most people who want to guarantee bandwidth really want to end the guarantees, but then you're into the world of RSVP and we all know how that story ends. So, you know, but it's an interesting, it's an interesting problem to think of if we can think of something smart there. I'm totally open to exploring possible solutions as long as we don't go down. I mean, maybe we'll even come up with a smart solution that gives you the things you wanted from RSVP that isn't RSVP and that would be glorious. So, at least my understanding is that this is not something that we want immediately. It's just kind of an open question that probably people are going to ask more and more. If someone asked me, then probably there will be someone else that will figure it out and say, okay. But more importantly, take us a while to think about it. So getting the idea seated and just thinking about it is really important. It's not going to be something that we figure out a fast solution to. I mean, I sort of cavalierly said, we could add a cost context, but that's actually not the hard part of the problem. Yeah, and I suspect some of these requirements come from some historic areas. Like, you look at something like ATM. Like ATM, you can give you a guarantee that a certain circuit will hit a certain level of bandwidth. And when you look at the line of thinking over time, the ATM crowd shifted towards the MPLS and then eventually are shifting towards something like SRV6 and so on. And so some of those thoughts I think are still prevailing from that concept of I want a dedicated circuit that provides me a thing. Exactly, but it's also very rigid and doesn't allow for the system to basically refactor itself over time as the conditions change. Okay, I don't know if it's a balance. Yeah, I see where you're going, but it's a slightly more telco-like kind of point of view. And it's absolutely valid, whatever you're saying. I think it's that specific context where I had these conversations mostly about, okay, I live in the clouds and you give me a network link. I want to be sure that I'll be able to send my five packets when I want them. Not that all the bandwidth will be taken by my neighbor that actually is doing some weird stuff there. Yeah, it's interesting. And the thing is, it's not just the ATM guys. If anybody who's ever dug into DOCSIS and how cable modems work, like those guys, like they do cost, like you would not believe. If the DOCSIS guys tell you that you can have 10 bytes every 10 milliseconds plus or minus one millisecond, that is exactly what you will get. They are not fucking around. I have no idea. I mean, I'm sure they have applications where it matters, but man, are they really serious about that? Okay, that's a conversation to be continued, I believe. Yeah, good to raise. I think the short answer is if they need DOCSIS, we can pair them to it. Well, but the interesting thing about DOCSIS is even DOCSIS doesn't try and really give you end end. It's giving you the moral equivalent of a single link. That cost is the moral equivalent of the cost you get on the last hop, which is part of why they were so amazingly successful at it. Yeah, that makes sense. Cool, that's a good discussion. And thank you for raising it before you had to drop. So looming back to in progress, so the API discussion stuff is continuing. I think what I'm probably going to do just to start moving that ball forward is break out just the API piece as a PR, so that we get something we can merge and then do one for the helpers and then do one for the sort of converters and compatibility layer. And then we have the pieces in so we can start looking at the strategy we discussed last week of wrapping things in the various adapters. So we did an extensive discussion of the API stuff last week, if folks wanna go back and look into that. And then the slides are actually up there. You can go take a look at them as well, they're linked here. The next up is for you, Andre, the network style, SDK style refactor of network service manager. Yeah, I think the PR are mostly ready. So today I've working on last two tests failed inside it. So I hope it will pass the ACI and it will be looked at more carefully. I have made few attempts to make it into smaller pieces but it actually took more and more time. So I would I think ask you guys to try review it in the current shape. About 16 changed files at the moment. Most of them are pretty simple since it's adjusted different chains of operations like data plane programming or assigning quick connection or creating quick cross-connect and so on. So most complicated changes are related to the monitors and how it is handled. But in general, it become more easy to read and to understand better before. So I hope it will go in this way. Well, that's excellent. Awesome. So then, excellent. So we've got the Ethernet context stuff and I think this is still in progress. I know Demis doesn't have his audio right now so he probably can't talk about it but he basically decided this was easier to do after the data plane refactor. And this effectively was put in to solve a problem we had on HEAL where if we healed you to a different NSC you might have the Ethernet address of the previous NSC in your ARP cache. And so you might not immediately get service while the ARP cache figured out that there was a problem and got invalidated. And so the Ethernet context stuff will effectively allow the NSC to communicate back its Ethernet context so that when we have to do a re-request for HEAL we can get the same Ethernet context from the new NSC. So SRV6 support, that's still in progress. I think Artem is working on that. It's currently blocking on a fix from VPP still but they now have figured out what they think the problem is and so there should be a PR shortly. I know there are certain people on this call who are very interested in the SRV6 stuff. So cool. The kernel forwarding plane, Radoslav, I know we sort of had this held over. Is there anything in progress there? Nothing much, I think. No, nothing. Okay, cool, cool. Azure Pipelines. So Alex has been working on porting our CI from Circle C to Azure Pipelines. Do you want to say a few words, Alex, about how that's going? Well, things are going not that fast but I'm in progress and today I think I got success with Helm charts. So we are continuing and integration tests running okay and after the whole CI thing will stabilize where we can continue to build in a parallel Azure Pipeline CI beside the Circle CI and then to switch to a new CI, something like that. That would be great. I mean, the Azure Pipelines, one of the advantages they have is that Microsoft has very kindly donated enough capacity for us to run our CI on them for free, which is quite kind of them. So hopefully we can get switched over and it looks like quite a nice system. Cool. Ilya, do you want to say some things about security number four or the GWT stuff? Yes, PR actually is ready and it's even past the whole CI. It's green, so please take a look. Oh, fantastic. That's very good. Cool. And then Ivana, do you want to say some things about the metrics observability stuff? I think I don't have something more. Okay. That's fine. Cool. So I think we're to the end of the agenda. Is there anything other folks would like to discuss? Oh, if there's nothing else, then we can yield back 15 minutes of time. Cool. All right. Well, since nothing else has come up, thank you all for attending and we will see you all at the same time next week. Take care. Thank you. Bye-bye. Thank you. Bye. Bye.