 Okay, well, we're just about a minute out. I'm going to remind us of some of the housekeeping items. I'm also going to apologize for the lack of voice here. But a quick introduction to myself. I'm Lee Calco. I'm a CNCF ambassador. I'm also the founder of Layer 5, which is a service mesh community. So certainly today's topic has a vested interest for me. As we get started, I do want to note that the Q&A box should be at the bottom of your screen. Please feel free to drop in your questions there as we go. I did toss out the challenge earlier that we've got some seasoned intellects to present to you today. Please don't leave them bored. Bring your questions. This is an official webinar of the CNCF. And so as such, it is governed by the CNCF Code of Conduct. So please don't add anything to the chat or questions that would be in violation of that code. Please be respectful of your fellow participants today, if you would. So with that, I'll welcome some familiar faces, I think, out there in the crowd. So is that Juan? It's been a couple of years, but good to see you on the call today. I do want to introduce our distinguished speakers, some of which you do actually literally carry that title. So with us today to represent an introduction to network service mesh is Ed Warnacle, distinguished consulting engineer at Cisco. Frederick Coutts, head of edge infrastructure at doc.ai and Nikolai Nikola, open source networking team lead at VMware. And with this, I'd like to hand it off to them to teach us about network service mesh today. Thank you. I think the first question we can't answer is who gets to speak first. There's a little bit of that. So it often helps people to understand where you're going in the generic sense. And so we thought we'd do a little bit of an agenda here to start out. We've got a little bit of housekeeping up front of our own. Then we'll talk a bit about the vision for NSM. In other words, what problem are we trying to solve? What central insight have we had that differs in the way people have thought about that problem in the past? And therefore what that allows us to do, in other words, what's different, what's new, what's innovative? Then we'll go on to the state of the NSM. We'll tell you sort of where we stand right now. We do have running code as it stands in a pretty vibrant community. We'll talk a little bit about some of the cool things that we're gonna be hopefully doing in the not too distant future. And then we'll sort of do a deep dive into some of the more detailed aspects of network service mesh as there is time. So in terms of housekeeping, for folks who've been our talks before, we're a little bit QR code crazy. So if you're watching this, you can haul out your phone and point to the QR codes that will get you to our website. There's a QR code in the bottom corner of every slide that will take you to the slide deck directly. And then finally, we're extremely excited to be having NSMCon co-located with Kubernetes on November, with QCon on November 18th. So if you follow that link, you can get to that page. We've got a really exciting lineup of speakers there that we just announced and registration is open. So we would love to see you guys register for NSMCon and turn out to learn even more about what's going on. You wanna start off with a vision, what do you guys? Sure, so NSM vision. So we'd start off with some of the problems that we have. So we should be able to hit next slide. So part of what we look at is we look at what, at a runtime domain. So think of like Kubernetes as being a runtime domain where you run your applications. Next. So one of the, the second concept is you also have this concept of a connectivity domain. So connectivity domain, think of it like you use your CNI to connect to your connectivity domain. If you're in, let's say you're in the open stack world, you use like Dutron to connect to your connectivity domain. And so in Kubernetes, we're looking at an L3. So if you're looking at something that gives you an IP, you use that IP to gain connectivity to other things. You have some service discovery. So Kubernetes uses, typically uses DNS and says, okay, well, where does the service at it? Where's the TCP IP port for it as well? And you have some isolation that occurs through policy or service meshes, application service meshes if you decide to add them in. And typically everything is within the same cluster or intra cluster in terms of the connectivity domain itself with the option to have controlled ways to get in and out to the internet or the outside environment. Next slide. So one of the problems that ends up happening is when you start to try to drive intra cluster workloads. I think we skipped over a slide there somewhere. Yeah, I think so. There we go. Okay, great. So when we started taking a look at East West traffic or what we mean by that is like traffic between different clusters. So this becomes a very difficult problem. So when you start to look at what's involved with that, move forward to the next slide again. So when you start to take a look at that, part of the problem is that you have all these various applications that need to talk to other applications. And typically what ends up happening is each, you end up having this perimeters and each of them have their own subnets, they have their own things that they, bits of information they need to be shared in order to properly, in order to probably connect. And you have to work out how do you get policy across all of these different runtime and connectivity environments. And so typically what the pattern that we're seeing is the runtime and connectivity environments are one in the same. And what are the things you tend to see there? And this is sort of where people are trying to solve the problem. So if you go back one slide, one of the things that people try to do to solve this problem, if you go back one slide, is they'll try and solve this problem by saying, okay, I will build a gateway that will connect my different clusters to each other at the level of cluster-ness. And the problem with this is that generally speaking, you care about workloads talking to workloads, not clusters talking to clusters. It's extremely rare that everything in one cluster is supposed to be able to talk to everything in another cluster. We've solved this isolation problem within a specific cluster, for the intra-cluster networking for Kubernetes. And it solves it well with the level of dynamicity with pods coming and going and so forth. But when you start trying to get things talking, workloads talking to workloads in other clusters, there are not terribly great solutions for intra-cluster workload isolation for workloads that dynamically come and go very quickly. Yeah, and to make matters even worse, one thing with Kubernetes is Kubernetes keeps track of its service IP range, but it does not keep track of the pod IP range. That's actually left as an exercise for the CNI plug-in that you're using. So when you're trying to work out what can I share or what should I conflict with or not conflict with, and you start adding more and more clusters and need to communicate with each other, you end up with this combinatoric problem with a lot of information that needs to be aligned with each other. And in many scenarios, you're not gonna be able to resolve the conflicts between them. So this full mesh of connecting network to network ends up becoming very problematic and is not an easy problem to solve without a lot of careful planning. And hopefully you don't have something in the future that doesn't cause you to have to redo all of this because of a compatibility issue of your new clusters as you add them in. So what ends up happening is people who end up taking this approach end up doing a lot of hands-on work. It becomes a lot of effort to try to keep this on and it becomes very fragile, especially when you start looking at the firewall rules and how to prevent these clusters from communicating things that they're not supposed to communicate between each other. Yeah, particularly when you look at public clouds where you may have different levels of isolation for those clusters and they may have different APIs for you, how you manage that isolation and different kinds of activities that you do to set up direct linkage between things, it can get very, very messy extremely quickly. And when you add in the service discovery, so when you start adding in things like DNS then that just multiplies the problem as well. So like all of this just focusing on IP is complex. Throw in DNS on top of that and now you're asking for a lot of pain. Or even worse, the sort of vips that you get between different services deployed in different clusters, those Kubernetes services are only good within a specific cluster or exposed out to the broader world. If you wanna say share services between these various clusters that's a difficult problem. And so what approach to the difficult problem? Go ahead, Frederick. Hi, yeah, you go on for this part or I can take it either one. Okay, I mean, so like people then have stepped forward and said, okay, well, we get all these problems, let's do federation, right? So federation is the thing. And it turns out federation's a good solution for certain problems, but it's actually not a terribly good solution for this one. It kind of hides rather than fixes the inter cluster linkage problem that you have. So it also doesn't scale particularly well. So as you scale up to large clusters, you start running into all kinds of limitations in terms of propagating services, propagating your network policies, getting all the things that support them to scale up to handle them properly. And so as you add more clusters and you're trying to, at fine grain, pass this information between different clusters that may be geographically isolated. Those combatronics and complexity get much, much worse. Next slide. And so in addition, the semantics of Kubernetes networking could in principle be federated across multiple clusters. But again, what you care about is your applications and the ability of your workloads to talk to each other. And so if you've got workloads that are running as pods, they're used to getting pod semantics. But if you have workloads running as VMs, and most people do, or you even have things running on bare metal servers on-prem, then those are used to connecting to connectivity domains that have entirely different semantics. So it's not even clear how you would extend this federated Kubernetes networking domain to them. So and then finally, when you look at sort of service mesh, there are lots of people who have basically taken the position service mesh is going to save us, just deploy a bunch of Istio and a bunch of gateways. And those tools are fabulous at layer seven, but they don't actually do anything for the L3 nature of this problem, the IP stuff. So first of all, you wind up with the same full mesh combatronics problem, but you also get the fact that you still have to peg up these L3 links because what everyone presumes out of the gate when they talk about service mesh is they presume when you're up at layer seven that there is some flat layer three living under you where you can reach everyone conveniently and you don't have various weird NAT games going on and so forth. And that's just simply not true in a very large number of cases, particularly when you're doing multi-cloud and hybrid cloud situations. By the way, is there anything good? Something that also adds to that problem as well. So you consider the cloud environments like the number of times where I've needed a cluster to do something. So I just hop onto GKE or AWS with the EKS and just spin up another cluster to handle some set of workloads and I want them to connect into some other service that's out there. Like if you're spinning up isolated environments then it becomes easy, but if you want the system to interact with other Kubernetes clusters that are running along the long way of services on your behalf, but you wanna create that isolation and as part of your operational model then these type of problems end up occurring much more often because of how people tend to use Kubernetes and how they tend to spin things up in the cloud. Cool, so this gets us to sort of the central realization of Network Service Mesh, right? So a lot of people have been looking at this problem and it's been very painful for a long time and sort of the central realization for Network Service Mesh came down to why are we welding our connectivity domain to our runtime domain? Why is it that every workload in a particular cluster only gets access to networking that makes sense within the context of that cluster? It's a very strange presumption when you think about it because where a thing runs and the kinds of things it needs to talk to are not necessarily intrinsically related. It's an accident, a historical accident in the system that put us in this place. Next slide, and so when you really think about it what you realize is what you really care about is workload to workload connectivity independent of where those workloads are running, right? So if I have a collection of things that need to talk to each other then they should be able to talk to each other with whatever kinds of stuff the network is supposed to do if it's doing the base load balancing, if it's providing some policy based isolation, whatever whatever those things are that it's supposed to do as it talks to its pure workloads those should happen no matter where the workload is running not only if that workload is running in the same cluster in the same virtual machine zone in the same data center all of those are things that developers don't care about they care about their workloads communicating. Next slide. So this is kind of where network service mesh comes in. So we rethought the problem entirely, right? So we said, okay leave the intra cluster networking alone. That works, let it be. So what network service mesh does is completely orthogonal to CNI. We don't interfere with CNI you don't have to run a special CNI plugin we take great care is not to mess with the intra cluster networking you're used to. So it's harmless to your existing Kubernetes networking but what it does is it allows the workloads that you have running to connect to new connectivity domains. And those connectivity domains can provide whatever connectivity security and observability features you need to connect you need for that connectivity domain. So we've got a couple of examples here. So say for example, I've got pods that are running databases that do some kind of database replication. Say I decide that I wanna have read replicas per cluster and the replication protocol is not running over HTTP. It's some weird thing your database vendor put together. So you need L3 reachability the connectivity domain you logically want is a database replication connectivity domain that gives you pure L3 connectivity between the database replicas wherever they may be. And when new database replicas come up you want them to also be plugged into that connectivity domain even if it's the case that they're coming up in things that are not Kubernetes clusters that are say legacy VM or bare metal environments. And another example here would be Istio connectivity domains. The Istio as a tool is brilliant at what it does at layer seven but these layer seven service meshes tend to presume a flat L3 under them. So why not just give them one? You could run a single Istio instance over an L3 domain and have the different pods from different clusters connect into it and reach each other. And to add to this, yeah. Please. Sorry, you go first and go out. Okay, so to add to this the beauty of it and I don't know if we'll be able to get to how this whole thing operates today within Kubernetes but the beauty of it is that essentially you can ask for these services at runtime. So you don't have to press set up your pods before they get instantiated with all these connections, et cetera, et cetera. In some time in the middle of their lifetime they can just say, oh, I need to replicate my database somewhere. I just need this connectivity. They can request it, get their replica done and then be done with it and then continue doing whatever they want to do. So that's an additional bonus. Yep. And then I think we've got one other cool thing about the solution in the next slide. If I remember what you get to it. So this is the other cool thing. Most of the reference work that we're doing in the network service mesh project right now is about building a reference implementation that runs well with Kubernetes. But the underlying architecture that's been put together for an network service mesh is actually not Kubernetes specific. It's agnostic to run time. And so as a result, you will be able to in the future have versions of that implementation that will allow VMs running in various VIMs to connect up to the same connectivity domain as pods and likewise with on-prem services. So if you have some giant Oracle database running on a piece of hardware out in the middle of God knows where and you would like it to be able to feed, read replicas running in your clusters in various public clouds. That's exactly the kind of use case that we're aspiring to with things like our database replication domain. And as far as the workload running on the server knows life is normal. It doesn't have to do anything weird or funky. And the same thing is true with the pods. From their point of view, you add a single line annotation and they automatically get connected to any additional connectivity domains that they need to be participating in. Yeah. Just to finish that concept up as well. So one thing you don't see in this scenario is you have like on-prem and you might have like, let's say your Kubernetes is in a cloud environment. So one thing that you don't see in the CLO or this peach colored line is what goes into building that connection. So you have to go through firewall VPN or other similar things to reach to your cloud environment. So all of that is, it still exists but it gets abstracted into that connection. And so it's an important part of NSM is from the point of view of the application developer you ask for the thing that you want to connect to. Like please give me access to the database or please give me access to this STO environment. But from the point of view of the operator, the operator says, well, what does that really mean? That means I have to go through this intrusion detection system, this firewall and drive it all through policy. So there's a lot, there's stuff that goes on here that you don't necessarily see but from an application developer perspective. And so it ends up simplifying your life in this scenario because you get to abstract those away and they're still there, they're still important but the complexity is controlled for you in this particular path by your operator. So one of the quick thing, I think we did comment at the beginning. You know, I did think we, it sort of comment at the beginning that you guys should ask questions as you go. We're perfectly comfortable asking questions. I know we had one attendee who raised their hands if you could add any question you have to the Q&A box. We also had a really interesting comment about think of NSM as an SDN controller. That's a super astute way to think about it but with one really fascinating twist. Network Service Mesh is in the business of handling virtual wires, not virtual switches. So we sort of made a very fundamental error when we went virtual. We had these switches we were used to pricing by the port and so we had these NICs that we were buying that were very expensive for our servers. And so we internalized the notion that wires were expensive but once you've paid the purport cost, well, you've already got the switch, it does all the switching features. But it turns out in the virtual world, virtual switching is extremely expensive and complicated to manage. And you don't want to make that the central element but virtual wires are incredibly cheap. And so Network Service Mesh you can think of in some ways as an SDN controller for virtual wires and we leave as an exercise to the people providing the network services, what kind of network services they're providing and how they're going to manage them. So there's a huge amount of flexibility. Thank you, Mohamed for the chat. I often pitch it as a controller of controllers. So it'll ask your controller saying, I have this context, can you please set something up that solves this particular thing? So someone asked for a thing, I will give you a wire for it. Whatever it is you have to do, please do it. Exactly. Cool. So we have a question you came in. How does a Network Service Mesh work in a hybrid cloud model where the network appliance and the underlying connectivity domains are both in the cloud and on-premise? So I think what it really comes down to is, again, if you think about these virtual wires and I don't want to go too deeply because we've got some more stuff in depth later, but the way we've set up the mechanisms for handling, stringing the virtual wires and particularly stringing the virtual wires between on-prem and other public cloud domains, there are intermediate elements that we call proxy Network Service Managers that can, when they get a request for any virtual wire from this workload, they can tweak whatever knobs have to be tweaked for the particular environment you're in to allow you to get that virtual wire strung between the workload and environment A, or runtime environment A and the workload and runtime environment B. And so that architecturally gives us the flexibility such that for any environment you're in that literally has the capacity to connect to another environment, you can provide a proxy Network Service Manager that will get you in and out and I expect we'll see a proliferation of them as we go. Hopefully that answered your question, Sujit. Thank you. You want to grab this one, Nikolai? Yep. Yeah, so quickly what is the status of what is the state of NSM today? So back in April, I believe, NSM was accepted as a sandbox project within the Cloud Native Computing Foundation. We are receiving the needed support being such project and we are kind of reviewing and aspiring to become a proper, what was the name? I forgot the next level. Yeah. So that's it, we are really proud of this and we're proud to participate, to partner with CNCF. One other initiative from CNCF initiated initiative is the CNF testbed. So this probably brings up a little bit of the topic of is NSM positioned in the telco domain or is it positioned in the enterprise domain? And I think that it is, at least from my point of view, it is so fundamental of a concept that you can apply to either. And I think that the only thing that can happen is that both can benefit from, you know, kind of mutually evaluating, kind of upgrading the project to the new levels and new features and practices and whatever. So within the CNF testbed, CNF is essentially cloud native networking functions for those that are not familiar with this term. This is essentially considered to be the next level of networking functional virtualization concept developed like six, seven years ago. We've started with virtual machines. Now a lot of telco operators and service providers are looking into this as a next step into the evolution of the networking virtualization. The CNF testbed is essentially something that it has been announced last week at the Open Networking Summit as an initiative that is going to become much more... We make a lot more sense within the telco world, the telco domain. We're proud to be there and to be part of there. Actually, there are a couple of use cases already in Nebulter and SEM. We're working with them closely and continue to add more and more use cases. We just had a question that came in on chat. From Mohamed asking, I'll be interested in the comparison between the NSM capabilities, the tungsten fabric is both our host event at the Linux Foundation. Might have I catch that one, since I... That's a good plan. Cool. So one of the things with NSM is NSM itself doesn't actually provide the data plane itself, but rather we interact with data planes. So for example, the initial reference architecture that we built used VPP, which is also part of the FIDO project, which is under the LFN. So we are talking with people from tungsten fabric to do the same thing with them as well. So one option is to use them as a potential data plane. The second thing as well is tungsten fabric has a set of network functions that we may be able to connect people to or connect services or applications to. So part of the idea would be to how to expose those network functions and help them become more cloud native in their design. And NSM can help connect things to it and help foster some good practices towards those goals. So in short, it's not really like an Apple's to Apple's comparison. There's good synergy between both projects that together can solve some interesting problems. So one thing that might help clear up some of the confusion is network service mesh strings virtual wires between workloads and network services, connectivity domains, if you will. But it doesn't actually provide the network services itself. That's something that lots of people want lots of different things. And someone just asked, can you elaborate more on the virtual wire concept? So if you think fundamentally what a wire is about, a wire is literally something where if you shove packets in one side, they come out the other. And if you shove packets in the other side, they come out. So basically you shove in one side, it comes out the other. And that's effectively what network service mesh does is if you've got a workload like a pod, we will drop an interface like object into your pod and take care of all the niceties so that if you say try and reach something that happens to be available from the network service that that interface goes to, your packets will go out that interface, it will travel over the virtual wire and it will arrive at whatever is providing your network service. That might be another pod that's doing user space packet processing. It might be some VM running somewhere out in your world. It might even be physical network gear that's being managed by an SDN controller. But all network service mesh does is it allows the workload to ask for and get routed to someone providing that network service. It's request routed to someone providing the network service and it allows the someone providing that network service to say, okay, here's how you connect a virtual wire to the network service I've prepared for that requester and then NSM will see to it that we string up the virtual wire so that packets get where they need to go. And so that's why I think it's actually incredibly complementary to more sort of traditional SDN because we don't do something like try and configure all of your network switches for you so that they do some complicated routing switching behavior. What we do is we allow a workload that needs to connect to that to have a standard, clear, simple way of asking for it that you can provide if you're an SDN controller. We had another question from Juan Ramon. Can you please describe how traffic policy can be implemented without going out of the L7 environment? I'm not entirely clear on that question. Does anyone else, or could Juan provide better a little bit of additional clarity? I think that we can say that in general, this we from NSM point of view, this would be a networking function which runs on top of NSM and we just provide the functionality for it. So if you want to do any traffic policy, it will be something that is a matter of implementing it as a function. I just want to add something to whatever it was saying on the previous question. So probably one of my favorite things with NSM is the fact that we don't care about the type of the payload. So if I were to invent my own type of payloads and there's nothing binding me within NSM to any particular protocol or IP or Ethernet or whatever. If I want to invent my own protocol, I should be able to connect both workloads sitting on both ends of the virtual wire and they should be able to talk their own protocol, whatever. So this is, I think that is one of the fundamental things that NSM is introducing. So when we say a service or an endpoint that implements a service, we literally can tell that this endpoint can expose whatever type of payload that it wants to serve. So imagine RDMA for example, you can have an endpoint talking RDMA as a service. Yeah. So cool. So getting back to things and I know we're getting some very detailed questions. I'd encourage folks with more detailed questions if we could also drop by the network service mesh, power network service mesh on the CNCS Slack. We're delighted to put you guys, we have lots of collateral on some of the details. So getting back to the CI, this is actually something we're very proud of. Network service mesh, we run for every PR that comes in, we run our CI on AKS, EKS, GKE and vanilla Kubernetes running on packet. And the reason we do that is we wanna make sure that we ubiquitously run everywhere. And there are some tiny little nits between these environments then we just wanna make sure that we're not getting stuck on any of them. And we currently run 449 tests across those four clusters to ensure that everything is actually still working all the time. And some of these tests include like cross cluster compatibility like vanilla gates versus AKS for example. And yeah, and we have got some strange issues while Kubernetes was upgrading on various clusters. So it's, I mean, we put a lot of effort into bringing this infrastructure and we are really proud of it. Okay, multi-cloud resiliency, auto healing. Ed, do you want to say something about auto healing or for that? Yeah, let me go ahead and say a few things. So we've got this thing we call resiliency that you want and as you might imagine, it's because we've had smarter ideas since then we would like to do. And effectively what this comes down to is if you can imagine stringing these virtual wires that connect your workload to some network service, you would like it to be the case that if anything involved in that process including the thing providing the network service goes down or restarts or any of the number of things that happen in the world, you want it to actually be true that you're still getting the network service for your workload, that you've got a blipping connectivity, not an outage. And so our auto healing is actually constructed that way. So just like in normal Kubernetes with normal L7 workloads, if you were running an application service mesh up at layer seven and some replica that's providing the, some microservice somewhere that's a replica providing the service you want goes down, you just get routed to another one and you get what you're supposed to get. We do that kind of auto healing right now in network service mesh. So all the many elements can go away one by one. So you could lose the local forward or on your current network service mesh forward or on your node. You could lose any of the network service managers in the process. You could even lose the thing that's providing your network service and it becomes a blip, right? As soon as that piece comes back we will reestablish your connectivity to your network service and you're back in business. So you're looking at a few seconds of outage as opposed to a await your workload is screwed. And that kind of resiliency is super important because things fail, right? We know they do. And essentially for one piece of your network service gets, I mean, disappears for some reason we will try to find you another one if such a replica exists or something that provides the similar service and announces it, right? Yep, exactly. Inter-domain. This is, yeah, the initial implementation of what we already said. We have things to improve there but still, as we said, this is already part of our CI. So it's verified cross private and public cooperatives deployments DNS at. Yeah, so DNS is interesting because obviously you need DNS, right? But effectively what it comes down to is that we need to make sure that you get what you expect from your Kubernetes DNS in your pod all the time. But it's also true that when you connect to a network service, it may also have DNS that it needs to provide you. So if you've got a database replica domain you might want to be able to look up by name whoever the master is for your database replication in that domain. And so that ends up being a super interesting problem but the good news is the DNS protocol is super well done for the most part for this kind of problem. And so we literally have a little DNS sidecar that will fan out and multiplex your DNS requests. So when you make a DNS request, we will send it to the Kubernetes DNS and we will also in parallel send it out to any DNS that's being provided by the network's services with the connectivity domain you're connected to. And the basic rule is whoever gets across the finish line with a positive response first that's what we forward onto your client. That's always in your Kubernetes cluster going to be your Kubernetes cluster if it's doing that. But if you're say getting DNS from a network service that will also come in as well. And so effectively what it comes down to is if you want to do DNS style discovery over a connectivity domain over a network service we've got that code now. So security. So in terms of security like one of the things that people should be asking themselves is how do I know that a workload should be allowed to connect to a connectivity domain? And so we're collaborating quite closely with the Spiffy Spire guys. They've done a brilliant job of issuing authenticatable identities so that when your workload comes up and it says I want to connect to your database or application domain you can actually know who that workload is at a semantic level and know that authoritatively. And what that allows you to do is to be able to at a very fine grain say instead of go and configure a bunch of firewall rules with IPs which given that pods coming though and IP shift all the time is a really hard problem. Burging on impossible. You could use Spiffy Inspire to issue identities to workloads such that when those workloads come up you know semantically who they are and you can make policy decisions about whether they should be able to connect to that connectivity domain at a semantic level instead of trying to lunge a bunch of IPs and ports around. So real quickly, we do have some questions. Let's see, one remote came back and said I want to know if the fabric message there is opportunity to do specific traffic treatment at that level instead of having this in the traffic out to a VNF to verify network access control. Maybe, so I think what you're asking one is could I use the identity of the workload to decide whether it can access a connectivity domain rather than having to have something try and look at its traffic to figure it out. And if that's what you're asking then absolutely you could do that. In fact that's part of the whole point about the Spiffy Spire stuff. And then we had a question from Scott that said as an IBM employee, I suppose I have to say odd darn it sounds like you don't test this stuff in IBM Cloud Kubernetes service. So Scott, what I would say to you is we would love to if IBM wants to donate some time for us to do so and have someone come out and help us get it working we're actually in favor of testing this any and everywhere we can. So if you have a Kubernetes environment that's not being covered yet we would love to work with you drop by the Slack channel. We would love to get that going. Cool and it seems like one remote has indicated that his question was answered. Cool and there was a question about on the chat. Is it correct to imagine this is a double overlay or does it complement existing CNI? So no, none of our traffic actually runs over CNI. We're completely orthogonal to CNI. We leave CNI alone. We do not mess with CNI. We take great care to make sure that CNI is unharmed in the course of all of this. So we are not running over CNI. In fact, the fact that we hook in with VMs and on-prem like CNI actually becomes a limiting factor in that scenario. And CNI also focuses primarily on L3 connectivity domains. So when you start to pull in L2 domains or more esoteric systems like MPLS or so on then CNI, the actual interface itself is limited in that scenario. Cool, yeah and awesome. Yeah, so I mean we have another one here is from Nomura. Nomura, what about case where K8 runs over VMs? And what I would basically say there is I cannot save you from any sins you might be committing underneath the covers for your nodes. So if you're running over K8s and your K8s nodes are running in VMs and whatever is connecting your VMs is doing all kinds of crazy and cap stuff. I don't even have visibility into that. And so there's nothing we can do to save you from that. Now, if you wanted to do something that would punch past that, we do have mechanisms in the architecture for you to introduce say a network service manager that could signal it. So we also have something from Mohamed. What is the planning for the official release? We do have a zero dot one release that we put out a few months ago. It's definitely sort of a pre-office stage. We're hoping to get a zero dot two release out before KubeCon and NSMCon in November. And so that's sort of where we stand timeline wise on this. So you guys have been fabulous with questions, you're great. Yeah, any help that people are willing to give as well to help us progress that we would more than appreciate. And even if that help could even be in regards to like the previous example with IBM time, like we'll find things through that integration that'll help us improve the overall stability. So, you know, it's, yeah, feel free to help us get there as well if this is something that'll be useful for you. So future stuff, resiliency V2. So I mentioned resiliency V1 earlier where you could lose any one aspect of the system and bring it back. We've got a smart idea and I won't go into the details with resiliency V2 where we think we can get to a place where you could lose basically everything except the workload simultaneously and we would still be able to restore your connectivity and just have it be a blip. That's gonna be super cool. We can get that to work the way we think we can because it's sort of like if you've heard of Chaos Monkey where you break a few things here and there all the time you could potentially bring in Chaos Gorilla and just have it smash all the things and when they restart, then your connectivity domains come back and your workloads continue to be able to talk. So we've also got some folks working on Istio on top of NSM. I mentioned this is an example of having an Istio domain on top of a network service mesh, network service or connectivity domain. We do have people poking at that in the community to get that up and working. Next. So I mentioned Spiffy Inspire earlier for being able to give you authenticatable identities. This is all super cool but the next piece we're hoping to bring in is Open Policy Agent to allow you to bring in very flexible policy as to admission control into a network service. So when a workload connects, you're gonna fairly involve policy about whether it should be allowed to connect to that network service. There's another one that we've had some people mulling over and this one's gonna be cool when it comes together. There's been some interest in packet capture observability. So having a network service, a connectivity domain so when your workload is attached to it, you can bring a wire shark instance in a pod with VNC or something like that and have it connect to a network service that would allow you to get packet capture from various links to various workloads. So if you've ever actually really had to get down to the wire, the truth is always there on the wire but in many cloud environments, it's complicated to get down to the wire and particularly we're hoping to be able to do that in the future as an observability feature subject of course to policy about admission you don't want anyone to be able to sniff on this stuff. But that sort of per workload granularity packet capture of the workloads coming into a connectivity domain wherever those workloads may be, we're hoping that proves to be a very powerful observability tool for people. And just to not get confused, this essentially should be able also within a single cluster. It's not necessarily kind of sitting in the middle of the clusters and you're sniffing only there. You can do this within your workloads or the same cluster. Cool. And then there's another concept that's running with the students that people are talking about. So we mentioned that we take great care to be orthogonal C and I, but we occasionally get people who pop up who say, I actually do want to ask for a network service to be interposed between me and my actual intra cluster networking. Now, this is clearly not something that should happen by default, but if you're really explicitly asking for it, there are a lot of interesting things you could do with this. And so we have people looking at what it would take to make this possible with that work service mesh. We don't currently have that, but a good example of this would be right now most people insert an Envoy sidecar into the pod with the workload, which means that it life cycles with the workload for Istio. But we've had people who have sort of used that it might be nice to be able to insert that Envoy sidecar as a network service because then you can life cycle it independently from the workloads. If you have a long-running workload and you discover there's an issue with your Envoy, you can upgrade your Envoy to something that doesn't have issues without having to disturb your long-running workload. So this is something else that folks are sort of looking at in the community. Again, we have some concoming. Join us. Yep, so we definitely have that coming. And then I think we have a bit more folks are interested about how the magic works. So, you know, quick show of hands on the call, how many folks are interested in how the magic works? Yeah, with all of the questions that are being asked, I would say for sure people are interested in the magic here. A couple of questions that maybe we'll all raise up just to make sure that we're complete. But I think the virtual wire concept and how it was being explained that NSM is the facilitator of the connector of different domains and of the right, you know, establishing signaling for the right connectivity. But that edit we're describing before, I think there's a little bit of, it would help if you re-articulated how it is with the actual connectivity happens. So we can talk a little bit about that on these slides, if that would help. Sounds good. You have something you wanted to say, Frederick? Yeah, let me pre-fix that a little bit on as well. So when you think of NSM, think of it as there's two primary things that NSM tries to do. The first one is it tries to discover where your services are. So there's a discovery component. The second thing that it works on is negotiating the connection to that service. So that the result of that negotiation is the V wire that connects you to it. So think of it as a concept to help us with understanding this particular space. Anyways, let's jump right into this in that way that I can describe in more detail what's going on and it'll be more simple. Let's back up a little bit and we're gonna give you sort of the abbreviated version. One thing that you'll find is there are a lot of good YouTube videos out there where we talk about this in much greater depth. But if you sort of think about this, when you talk about any kind of a service mesh at any level, most of the time when people talk about service meshes, they're talking about application service meshes which live up at layer seven. But we're talking about a network service mesh. It does many of the same things you use to in a service mesh. Only it does them for payloads that are layer two or layer three payloads, IP packets, et cetera. And so anytime you have a service mesh, you have some kind of a registry for your services and we're no different. We have a network service registry. We also have these things that run called network service managers that end up running per node. So next slide, cool. So if you think about this in a Kubernetes context, right? So if you've got a node, what we're calling a network service manager domain here, when a pod comes up and says, hey, I need this network service. And it indicates that with an annotation. The network service manager sends a message to the registry essentially saying, find me a network service endpoint that provides that network service. He gets back a bunch of interesting information that tells that what network service endpoints will provide it and also what policies it should apply in terms of selecting from the candidates. It then turns around and asks its peer network service manager for the one that the network service endpoint it selects. Hey, I really do want a virtual wire, if you will, to this network service endpoint. Network service manager there doesn't work at that point. There's a bunch of negotiation of what the tunneling mechanisms are here. And you wind up with a network service manager asking this NSM forwarder to build the pieces of the tunnel and drop an interface into the client on one side of the network service endpoint on the other. Now this particular interface like thing, if you're a normal application, that's gonna be a kernel interface. But if you're actually a sophisticated packet processing machine, you're gonna get something better than a kernel interface or you should be asking for something like MIF or something of that nature because kernel interfaces are very slow if you're a packet processing machine. And you sort of get that virtual wire in place there. Again, this is a very abbreviated description of the process. If you go look on YouTube and, and or if you drop by the Slack channel we can point you to specific videos there. There's a lot of really good like 30, 40 minute long talks that walk through this in great detail. So we've got a question from Pierre-Louise. How do you ensure symmetry with redundancy with these virtual wires when you connect pods to a centralized stateful service? Ah, okay. So effectively, this is an interesting question, Pierre. You're sort of asking something like, do we connect up to virtual wires? And the answer is not by default. By default, we will fall back to auto keeling. But then your second question is about stateful services. What happens if you're connected to one instance of a stateful service and that goes away and we need to reconnect you to another one. And the answer is the problem of sharing state between different network service endpoints that are providing a network service is the responsibility of the person who writes those network service endpoints. We don't enforce any particular way that you do it on them. So that's really up to them to handle in the way that makes sense for the network service they're providing. And there are a bazillion way network services people want to provide. So we really can't effectively pick one true winner there. Cool. So we've talked a little bit about this already, interdomain. So we've got a bunch of examples of domains when we talk about these and these are sort of network service registry domains. These are places where you might have network services. So you can imagine having two different public clouds, each of them have different clusters and you might treat each of those clusters as a network service domain, network service registry domain and you'd give them different names and then these intentionally look a lot like DNS because we're used to thinking in DNS. You could also imagine in an enterprise having network services or clients for network services running in a virtual manager that's running a bunch of VMs. You could imagine it running in a physical network that might have a bunch of DC things happening with physical servers. All of these are network service mesh domains. And so the question is, how do you connect them in network service mesh? Then again, this is going to be highly abbreviated. So I do apologize, but we have limited time. And so effectively what ends up happening is you get the clients coming up in the normal fashion, the network service manager goes and looks things up in the normal fashion and it ends up making its request to what we call a proxy network service registrar within its cluster. We've got a reference implementation of that that can fall back to DNS in order to find via service record, whoever the network registrar is for that domain. Please note, this is not the only way you can do it. It's just the reference implementation we've done. That then proxies that request to the network service registrar and then the other domain gets back its response. The network service manager can then go and we're showing the simple version here. It can make its request to its peer and set up the connection. So I think we're running up against time so we probably won't start diving into the next topic. Are there any final questions before we hit the wall on time? So we've got a question from Vochek Deck. Can one span a service across domains and have the endpoints be picked based on client proximity to the domain of origin? That is indeed our intention. The sort of topological selection we're still working through some of the details on but that is definitely on our list of intentions. Well, this is great. We are at times and we had no lack of people trying to take on the gauntlet that we'd laid down which is to break one of the three of you with a question and I'm not sure that we've been successful but there are some unanswered questions happening already and so there's already a request for a round two of today's topic. So today was the intro to NSM. Sounds like we've got an encore request for, I dare I say a deep dive into NSM? Maybe we would stand for like a middle of the road and it's kind of a deep dive. Yeah, I mean, we didn't even get to be able to get to some of the, you know, we didn't even touch on things like how we bring hardware NICs and SRIOvvfs if you care about that stuff in. If you don't care about that you'll never know. If you do care, you probably care a lot and we do have a direction we're going for solving that problem as well. Sounds like folks will should either join the Slack, watch some YouTube recordings, attend NSMCon if they can. Any other ways that you guys recommend people engage? There's a couple ways. And so the ones you described out, we have a very active Slack. That's pound NSM in the CMCF Slack group. We have a mailing list that you can send messages to. It gets a little less activity than some of the Slack. We are available to, people can ask questions on Twitter. We also, every Tuesday at 8 a.m. Pacific we also run the NSM community call. So that's a place if you want to get involved with NSM in a more, in more detail that want to help us build or want to tell us what your use cases are. That's a fantastic place to jump in and let us know. And if you want to know whether or not your, what things that you're building will work with NSM or want to know how it works with NSM, feel free to connect with us. And we'll help you work out what path you should take. And if it's something new that you want to build using NSM then we have different approaches that we can use to help with that, including a specs board and that's where you can describe what problem you're trying to solve. And we have people in the community who will look at those and we'll help answer those questions. So get ahold of us and we'll help you with any questions that you have. Oh, and yes, we have a YouTube channel with all of our meetings as well, which Taylor posted. Thank you for reminding us about that. Awesome. Well, we are at time with Ed, Frederick, Nikolai. Thank you for a great presentation today. And I do want to remind the attendees that the webinar recording and slides will be online later today. We look forward to seeing you all at a future CNCF webinar. And of course, NSMCon. All right, thanks everyone. Thank you.