 and welcome to the network service mesh introduction. I am, my name is Nikolai Nikolayev. I work as an open source technology lead in the open source technology center at VMware. And the session was announced to present us predatory codes, I didn't manage to make it so I'm going to do it on my own. We are both maintainers at network service mesh. So I'm going to do the introduction today on my own. Let's see how this works. Okay, so what we're going to discuss today is some of the tools that we have around networking and the cloud native networking. And we will try to discuss and demonstrate, I mean, in terms of how network service mesh works, how we are trying to solve these problems that we see. Network service mesh is a very, very core technology, at least the way that we perceive it. So whatever we are going to give as an example here, it's not the only application of this technology in this approach. So please come to me, I'll be around today, tomorrow. And if you have specific use case that you are finding like challenges, implemented it in a cloud native way, come and talk to me, talk to us as a community. We have a couple of friends here, also in China. Okay, let's see what the journey is. So today we're going to, as I said, share some thoughts about the cloud native networking. We are going to look at how Kubernetes Networking API does this as a good example. We also usually, if someone has seen our talks, we usually do this story type of stories that we discuss. And we usually have our friend Sarah and her problems. Today we're going to show another of our friends, which is Marcia and her multi-cloud application problems. We are going to also show how network service mesh works at a higher level. And of course, in the end, a couple of links. And I hope that we have time for Q&A also. So this is the cloud native definition as defined by CNCF. You probably saw it today, then show it at his keynote. It's pretty long. There's no point into reading the whole of it, but yeah, it's very well written. So please go and check it if you haven't done so already. But we have outlined a couple of points that we think are very important for the networking done in a cloud native way. So immutable infrastructure, I guess that everyone aspires to that and know that this is important. Loosely coupled system, like loosely coupling all the components that are implementing your functionality in a cloud native way. And minimal toil. I mean, minimal toil is an important thing. No one really would like to do more than they should. So let's start with minimal toil networking. So what that would mean. Essentially, people don't really want to think about all the details of networking. I mean, you just want your connection done. People would not really want to think about subnets or out, I mean, you're not requesting a specific subnet. You want to consume some form of networking. That's on the conceptual level. And that's why we would like to call these network services. I mean, this is a famous terminology, but we would like to think that all the requirements for connectivity, all the aspects of connectivity are packaged in a single term called network service. So all your security, load balancing, nothing, whatever you need, you just pack it in as a network service. And then when you want to consume it, you want to refer to it by name, you just want, okay, I want my secure internet connectivity. This is a typical example that we give in our talks and one of our core examples in the project. But essentially imagine if you want to run your application in a public cloud and you want to connect to your office, you want to consume some form of network service that will get you the proper secure access to your corporate internet. Well, there are other examples given here, but we can continue with exemplifying this. Now, how Kubernetes is doing this, how Kubernetes networking is applying these concepts. So essentially from conceptual point of view, you basically get your layer 3 connectivity out of the box once the port is deployed and created, you get your layer 3 connectivity, you can do security with network policies, load balancing is there for those services and between the endpoints. And what one would really notice, you start looking a little bit closer, is that mostly most of these concepts are about inter cluster, like all the networking that goes inside a single cluster. There has been challenges about this before trying to overcome, but if you look at the plain networking that Kubernetes describes today, it's about inter cluster communication and connectivity. If you want to consume it, as we said, it's just there once you get your port connected, we have everything there and network policies and services are really easy to consume and to implement. When we talk about loose coupling, this is something that's very familiar to all the application developers when we start talking about microservices, it's essentially a way to abstract all your building blocks through some API abstractions and then you can build from these small blocks, you can build more complex functionalities and design patterns. It's a very flexible way to actually implement more complex functionalities, but if you look at it, historical networking has been very strongly coupled to the infrastructure and what we typically see is that your networking is coupled to your cluster, to your data center, if you're using a public cloud, VPC, et cetera, and you don't really have very much granularity and flexibility to the way that people would really like to do it at some point, if you start doing more advanced things. How this works with Kubernetes networking, yes, in general, you are decoupled from the specific implementation through the CNI, which is container interface, container networking interface. And on the other hand, it is also strongly coupled because you usually get one CNI plugin per cluster. You can do multiple CNIs, there are ways, but it's not really consistent and easy to do it. What you would also see is that there's a single edge which is typical for the entire cluster or even for multiple clusters, like if you are running in the data center using virtual machines, then you can very easily end up with a single edge of your multiple clusters sharing. And the granularity is not very fine here. I mean, you either get your networking call, you don't get it, you don't tweet for particular workloads. From the immutable infrastructure point of view, essentially your bots run on really high level, so you cannot really change anything particular from the point of view of the networking functions, other than actually they are obstructed through the Kubernetes and the CNI APIs. So your bots really cannot request any specific networking capabilities from the infrastructure and you cannot change that easily. So that's the check here in Kubernetes. So let's look at our example and the story that Marsha has to tell us. So Marsha is trying to implement an application that has some multi-cloud and hybrid cloud aspects. What that means is that essentially she wants to, she has workloads that are running possibly in a public cloud, in a private Kubernetes deployment, maybe some legacy virtual machines, and here we have some bare method deployments. Like for example, there could be some huge database running here already in your data center and you don't want to mess with it. So she wants all these things connected. Now, if you look from the point of view of networking, you'll find that each of these domains have their own ideas about how they do networking. So the public cloud have their own private CNI implementations, your own premises, Kubernetes deployment, you probably have some of the publicly available CNI implementations for private CNI implementations, virtual machines have their virtual networking and of course your bare method of physical networks. The usual way that you interconnect this is on a network to network direct connections, which is not very fine grade. Like I mean, the usual way that we do this is you just draw a line if you want to call it like this from one network to the other and then you assume that both networks are connected and then you have to put some filtering and additional work just to be able to isolate particular workloads and pods. And there's no actually real way for Marshall being just a developer to do this on her own. I mean, she has to work with all the things that are involved in setting up all these complex networking. But in the end what she really wants to do is to be able to connect her workloads only. So she wants to be able to declare and say, okay, my workload that's on the public cloud wants to connect to my workload that's on private cloud, virtual machine, bare metal, et cetera, et cetera. And that's one of the things that we are trying to solve with network service mesh. It's where it comes. We have this concept of virtual wires or some kind of obstruction of connections that actually will help her in this situation to essentially request a network service that will implement this point to point connectivity for her. Of course, under the hood there still be all the networking inter-cloud networking going on. But the network service being something that can be requested straight from the pot level actually will help her very easily to implement her distributed application. So that we said already essentially we have this Marshall sub connectivity network service that she can request and she can do that. From the minimal point of view, whatever we do with network service mesh and I have an example later to explain how this works. More or less it's essentially what she has to do is to just annotate her deployment manifest with this special label where she essentially requests a specific network service and then network service mesh takes over and actually provides all the connectivity from there. Of course, someone has to implement the network service before it's not working just magically out of the box. From the loose coupling, yeah, we already said that essentially you don't bind to your inter-domain connectivity, you bind to the couple from there and you have just your connectivity between your workloads. And from the immutable infrastructure point of view, essentially the way to enable network service mesh is to we have Helm charts or you just install NSM there. It works on top of Kubernetes. It doesn't change any aspects of Kubernetes. It doesn't use any special version of Kubernetes. It doesn't change the CNI, it doesn't interfere with it. We think that CNI does its job great for whatever it was designed. We don't want to mess with it. I know that there are some approaches people are trying to implement some various form of tweets, CNIs, et cetera, to solve some of these problems. Our approach is to kind of amend to that, to provide some extended capabilities based on our concept that we do with network service mesh. And of course it works with any CNI. And to prove that it works with any CNI, we are having our CI and continuous deployment, our CSED pipeline continuously running on a couple of public clouds, also virtual machines, bare metals, and also use kind, I don't know if you're familiar with kind but it's a really nice project to run your tests against with essentially on Kubernetes, you know, Docker. So with that, I can move on to how this thing works. And before we talk about how this thing works, how NSM works, I would like to talk you through a little bit to our definition of port, actually. NSM is, so NSM is three things, network service definition of JPC API and distributed control plane with minimum share estate. So starting from the last distributed control plane, this kind of resembles the approach that Kubernetes case with its Kubelet. So they are kind of local agents that are same, it's separated from each other. It's kind of distributed. They work on their own. This is what we do again. We call this network service managers. We run them as demon sets. Then the JPC API is essentially an abstracted way to describe, publish and consume network services. And then we have the definition of network service where we actually say that network service is not a single entity. It's essentially a composition of multiple functions and endpoints that are kind of bound together to a single network service descriptor. Of course, all these concepts and all these theories we have implemented in our Git repo, it's a Kubernetes-based implementation, but the overall architecture of the project is done in a way that the Kubernetes specifics are isolated in a separate module that can be replaced. So today we don't use much of Kubernetes except for its central storage like HCD. And we use it also for scheduling our bots that are actually doing the network service mesh. But other than that, we are not tightly bound to Kubernetes. So in theory, it should be easy to actually add all these other aspects of virtual machines, bare metals, and there's some interest in work going on in the community and discussions around these things. Also today, we use as a kind of basic packet passing as we call it forwarding component, forwarding plane, we use VPP, which is FDIO, Blue Foundation Project. But there's a work going on around implementing a pure kernel-only-based implementation, which will not depend on VPP. And it's not that we don't like VPP, it's just that we want to be able to demonstrate how the underlying forwarding plane is independent from NSM. We are not bound to any specific implementation. We want to have something that is bare metal, I don't know if they're metal baseline that we can just run each and everywhere without really relying on any third-party components. So a couple of slides to show how these things work in more details. So if you have your Kubernetes cluster, with let's say here, we have three worker nodes. The first thing that we do is that we apply our custom resource definitions. We define three essential CRDs, Network Service Manager Registry, Network Service Endpoint Registry, and Network Service Registry for the services. So once we deploy with Helm, for example, once we deploy Network Service Mesh NSM, it essentially deploys the Network Service Managers, which are the mandatory components. We also deploy alongside the forwarding plane, which in this case is VPP, but it's completely orthogonal. It can be deployed in other, by other means. But what's important here is that essentially Network Service Managers gets registered to the central registry. So they are very well defined, which Network Service Manager and which worker nodes it runs. Then the next thing is to essentially deploy the Network Service, which is, I will show it later. I have a slide, which there is some YAML file descriptor that tells us how the Network Service is composed out of the endpoints. Then we deploy the endpoints, which also get registered in the central registry. This is for the red service. We do the same for the blue service, for Network Service. And then when the client comes and wants to consume that service, or the red client there, it will essentially communicate to its local Network Service Manager. This local Network Service Manager is going to consult the central registry about what this network service is, what endpoints are implementing it, et cetera, et cetera. There's a selection process going on based on label matching. And then both Network Service Managers that are involved in this process are going to establish a point-to-point connection. And that's maybe one of the crucial things that we should mention here, that with Network Service Mesh, we don't do broadcast domains, bridge domains, any virtual switches or things like that. It's only point-to-point connectivity between points. And on the other hand, this Network Service Endpoint maybe would like to also chain, it was the term that was used in service chaining, but we would like to call it Compose here, but we can connect to the other Network Service Endpoint that actually implements the full Network Service as a whole. So in this case, this Network Service is composed out of two endpoints. And as you can see here, the first endpoint here, essentially is getting two new interfaces at runtime. This is one thing that Network Service Mesh actually uses as a tool to achieve its goals. We support multiple interfaces for ports and they can be requested and disposed at runtime, you don't have to restart your services to do any fancy things that just out there at the moment that you request the service, you just get your interface injected and you can start consuming it. Same, of course, goals for the blue service the client can come and you can consume all two services. One thing that I don't know if it's seen very clearly, but here on the bottom, we essentially have labels on the service request. So once connection request goes out of the port, you can have labels alongside it. Like you can say, I want to consume service version two and then it should be able, Network Service Mesh should be able to get you the proper service labeled with the proper labels. There's some label matching involved here in the picture. How the network service consumes on happens. So if you have your application container and you want to run it in a Network Service Mesh enabled port, essentially, as we showed before, all you have to do is to use our annotations format. And then there is a Tink, which is an NSM admission controller. It processes this deployment at deployment time at port creation time and it figures out what the service is. It essentially injects an init container there and this init container takes care to actually inject the interface for you and to ensure smooth connectivity and the consumption of the network service. Now, this is one way that someone can do things. We have more advanced ways. We have an SDK that actually you can incorporate in your application straight and actively request and manage your services and connections that your application is using. But this is the easiest way if you just want to consume something that it's out of the box, you don't even have to change your application. This is an example on the network service manifest. So this is how we essentially, this is the descriptor of the network service manifest. So the first thing that we have, we have our custom resource defined of kind network service. Then we have the service name, which in this case is a secure internet connectivity or it will be Marshares app connectivity. If we are talking about Marshares application here, then of course we have the payload type, which in this case is IP. Network service mesh is pretty much payload agnostic. So if you have seen our site or some of our slides, you probably know that we are claiming that we can do layer two and layer three connectivity. There's nothing really bound, there's nothing bound or hard-coded in the architecture and the design principles that we have that is binding us to any particular workload. So one of the use cases guys that were just before me here, they were discussing Kubeflow and Kubeflow is essentially a way to run TensorFlow in Kubernetes way, but on top of Kubernetes infrastructure, but they are limited to whatever the Kubernetes networking is giving them to essentially use TCP to communicate between the nodes. Now that's good, but if someone of you knows TensorFlow, there's also improved performance improvement based on using RDMA, which is a really fast way to communicate between the nodes. And one of the things that we are discussing with some of the members of the community is how we can essentially enable Kubeflow, for example, the talk RDMA being between the nodes. So essentially in that case, the service there would be payload, would be RDMA or I don't know, something else, whatever is needed there. So to continue here with the, you know, that was a smash descriptor, we have label matching for selecting the endpoints. This is essentially a match on the label that comes from the client requests. So in this case, the label match refers to ApiCos firewall, but it could be, they could be multiple label, they could be versions, et cetera, whatever people need. And then we have a simple routing, and then there's again label matching for selecting the endpoint. Essentially, each of the endpoints are labeling themselves, saying, okay, I am implementing version one, I'm implementing a firewall, I'm implementing a gateway. So given all this, this is essentially the basic rule for finding the way through the mesh and finding the way how to wire and to connect things within the network service mesh. And the last thing that we have here on the slide is a wild card. So essentially, if you don't specify labels, you can just use a wild card matching for default connectivity routing. This is more or less what I had to share with you. So we have our network, our site, network-service-mesh.io, we have our GitHub page, we are a CNCF Sandbox project since April. We are actively participating with some of the initiatives there. One of them is related to Telcos. This is another one of the aspects of network-service-mesh. There's some interest around this, how Telcos can be enabled to do cloud-native networking. If you want to connect with us on our site, we have the community sub-page. We have Slack channel in the CNCF Slack. We have a couple of calls during the week. We have mailing lists. It's all listed there. And I think that we have some time for questions. Yeah, we do have. But if you want to find me later, unfortunately, as I said, the other maintainer that was supposed to be here couldn't make it. So I'm the only one representative here on the project. I'll be later at our booth between 2 and 4 p.m. today and tomorrow and the CNCF answer bar near the end of the event. But if someone wants to chat and discuss their specific problems, I'll be there. And that's all from me. I guess that we have some time for questions, maybe. Yeah, we do. For questions. Hi, Nikola, great proposal, yeah. I have three questions, yeah. The first one is, in the Security and World Service case, and as you proposed so, yeah. And as client to connect and maybe client to the V5 World Service and the point, and the service and the point of view connect to VP and get it when the point and can be and connect to the service to the middle service and the point. For example, for V5 service and point, can we connect to the service or more service? Yes, yes. So there's nothing in the architecture that actually prevents you from doing that. It's up to the endpoint to be able to accommodate more incoming connections and how it will handle them. Now, it really depends on the application. So if you want to have the best scaling application, you can probably put some limits on the incoming connections and say, oh, okay, I cannot accept more than five connections for my CNA for my endpoint, or you can just let it run as much as possible to accept each and every request. There's nothing binding in the architecture here to tell you you should run only one interface or five or 10 or there. Okay, I see. But I think it's okay for technology, but I guess it's hard to manage the second number of service and point. For example, and the service point is that and we have two skills and maybe two more service or more service. I think it's hard for network connection. Yeah, yeah. So we have, that's probably one of things that is worth mentioning on this intro talks, is that we have a pretty complex auto-healing functionality implemented. There's some still rough edges that needs to be cleaned, but it works pretty well. So essentially what we have today is that at any point in time, because we are essentially managing the connections and we know what type of functions this endpoint is implementing, if the endpoint dies, we can essentially rewire you to another endpoint that implements the same function. And I mean, of course, if the endpoint is stateful, then it's up to you to have, I mean, to do the implementation of network service on the endpoint to have the replication of the state between the neighboring endpoint. But if it's stateless, then like if it's just a passive packet filter that just drops packets for some ACI rules, you can just be rewired and your service will just continue to work without really, okay, there's going to be some small glitch while we identify the endpoint and why we rewire it, but still it will continue to work. Okay, okay. Another question, yes. Why we use an immediate container for network service client to connect our service? Can we use another function? For example, we use a controller, yes. We just watch the labels of the, in the network service client label, yeah. I guess, yeah, she want to connect to the service and I just to connect to the source by him and put a controller and more by immediate container. Why we use that? What's the advantage and disadvantage of that? I guess that this was more or less inspired by the way that other guys are doing, like projects in the domain are doing similar things, like for example, I'm sure you know how the service mesh is like is to and linker, did they using not immediate container, but sidecar proxies sitting there. What you're saying actually makes a lot of sense. I would be, I would be, I mean, it would be nice if we can chat later about it, but yes, it makes sense, whatever you're asking. The way that we did it is we essentially have an SDK, so what we did is we just created the client, package it as a sidecar container and found that this works really well with an admission controller, but yes, of course we can go without it. We should be able to go without it. Mista is a high space for the novices, client to connect, no service for any container, but any container. Okay, we are in timeout. I didn't really get the question, but we can chat later. Okay, we can talk. Thank you. Thank you.