 Hello, everyone, and welcome to today's episode of LF Networking's webinar series. My name is Jill and I am your host. Today, today's webinar is entitled Building CNFs with Fido VPP and Network Service Mesh. Our speakers are Milan Lencho and Pavel Koticek, both with Pantheon Tech. So just a couple of housekeeping items. Both of our, the attendee, attendee microphones are going to be muted throughout the presentation. However, there is a Q&A tab at the bottom of your screen. So if you have any questions throughout the presentation, feel free to click on that screen and enter your question. And following the presentation, we will have some time for open Q&A as well. So without further ado, I'm going to turn it over to Milan. Okay. Thank you, Jill. Hello, everyone. So as it has been explained, I will be presenting the first part of this webinar. First, I will introduce the concept of cloud network functions. Then I will talk about technologies that we use to build and wire CNFs such as VPP, Ligato and NSM. The core of the first half of the presentation will be about our most recent work of integrating NSM with Ligato. And then I will show you how it works with a pre-recorded demonstration. And then my colleague Pavel, I will show you how you can use iOvisor, EBPF traceability to get insights into your VPP-based CNF deployments. Okay. So first thing first, what is CNF? So CNF is an abbreviation of cloud native network function. So it is basically a software implementation of a network function like router, switch, gateway and so on that follows some cloud native principles. So for example, one principle is stateless nice, which is a separation of data storage from application code, which then simplifies scalability, fault tolerance and recovery. Sorry. The next one is microservice architecture, which says that basically each CNF should be minimalistic, performing only a single function. And then by varying multiple CNFs, we get a complex functionality of a network appliance. And another important principle of cloud native functions is declarative APIs. And these declarative APIs basically improve, for example, configuration validation, recovery and rollback. So CNFs are commonly packaged as containers, which makes them both portable and lightweight. As network appliances gets decoupled into many CNFs and automatic orchestration becomes a necessity and technologies such as Kubernetes or redhead open shifts are used as the solution. And to build more complex network functions, CNFs need to be wired together. And in the area of networking there are typically separate connections for management of CNFs and for the data point traffic. So that means that CNF typically have more than one interface. And the simplest apology of CNF interconnection is called chain. It's commonly referred to as service function training or SFC for short. So CNFs are not some kind of revolution, but a natural evolution, basically. So first we had physical appliances. So standalone physical machines combining proprietary hardware with proprietary software. So that's not very flexible and often very expensive. But then virtualization allowed us to view hardware. Not just as individual machines, but rather as a pool of computational resources. So we basically could use community hardware with host operation system and a virtualization layer to build infrastructure where each network function is implemented and deployed as a single virtual machine. And these virtual machines are orchestrated by some SDN solution. And then if we continue the coupling and make network functions even lighter using containers containers as opposed to VMs can be instantiated in seconds since they share a common operation system. And as we apply those already mentioned cloud native principle stateless nice microservice architecture and declarative APIs. We get to even more flexible cloud native architectures here in these pictures shown on the right side. So that's where we are heading towards. Right, so first technology that I would like to talk about in the in this context of cloud native functions is VPP. So for us VPP is data plan of primary choice for a couple of reasons. So for example, it is a vector packet processor. And as opposed to scholar packet processor that means that multiple packets are processed at the same time and that improves instruction and data cache locality and therefore it improves overall performance. Furthermore, VPP runs fully in user space. It bypasses kernel, even for things like packet acquisition and injection from and to NICs and that accomplishes by using the PDK framework. It supports multiple hardware architectures provide implementations of many network protocols. It is programmable and it is easily extensible through plugins. And what is important is that recently it has went through some improvements towards being more, let's say cloud native ready. For example, a so called MIMIF interface was added, which allows to efficiently exchange packets between VPPs and also other MIMIF enabled processes running on the same host by means of shared memory. Therefore, completely in user space. And that's very important for efficient efficiency because as we decompose network appliances into many interconnected CNFs on the packet exchange between CNS becomes a considerable performance penalty. Okay, so now we have a data plan or one option for data plan, but we also need a control plan for our CNS and Ligato is a framework designed just for that. It's designed for building control plan agents for cognitive network functions. It uses Protobuf from Google to create declarative configuration models and then use these technologies such as Kubernetes, ERDs, GRPC or even or key value data store like HCD to store and then also to submit configuration into CNF. It is easily extensively plugins and already provided as a production ready control plan agent for VPP and also for Linux networking. So just to quickly show you the architecture. So on the top you can see that cloud native technologies like HCD, GRPC, CRD and so on are used to submit declarative configuration into the agent and things like Kafka or Prometheus are used to publish the data out from the agent. And inside the agent the configuration is broken down and processed by separate plugins. For example, there is a plugin for VPP interfaces or plugin for VPP IPsec or Linux interfaces and so on. And then also inside the Ligato framework core, there is a graph of dependencies maintained to ensure that configuration operations are executed in a valid order. So for example, if you have an interface and a route, a route which references the interface, then you need to configure the interface first and then the route. And for VPP data plane, the configuration change is applied via GoVPP package and for Linux changes are performed through a netlink library. And this is extensible even beyond VPP and Linux, as we will see with NSM and VPP agent is packaged as container and typically orchestrated by Kubernetes. And this agent itself is already ACNF because it offers declarative API and behind it many networking features which are already offered by VPP and Linux. So it can be used as is for many applications. Right, so these are some of the technologies of our interest for building CNF, but there is a there is a challenge and that is how to wire CNFs together, specifically for networking applications. There are some existing solution of working on the application layer, like service meshes like LinkerD or Istio, but this is often not sufficient for networking where L3 or even L2 connections are needed to support the wide variety of network protocols. And also the requirements for latency and bandwidth of such links are often higher. And finally, another requirement is that separation of control plan from data plan traffic is often preferred. So in other words, multiple network interfaces plugged into containers or pods in the Kubernetes are needed. These are to use CNF focused CNI such as ContiVPP that we are also contributing into or Maltus, which is by default used in OpenShift which combines multiple CNIs. And also you need a efficient access to physical interfaces for which you can use the SRIV technology in Kubernetes enabled to the device plugin. Finally, listed is a novel solution which this presentation is focused on and it's called network service mesh or NSM for short. So NSM is a CNCF or cloud native computing foundation sandbox project. It basically provides service mesh functionality but on L2 and L3 layer. There are additional connections, network connections between Kubernetes spots. So this is for Kubernetes. Between Kubernetes spots based on a routing definition submitted via CRD. It runs alongside any CNI and creates additional high performance data plan connection. So you have the primary interfaces created by CNI of your choice. And then you have alongside that running NSM which creates additional interfaces. And these interfaces are high performance oriented. So apart from tabs and widths, it also supports the already mentioned MIMI interfaces for VPP based CNFs or MIMI enabled CNFs. The important thing to note is that connections with NSM are initiated from inside CNFs via NSM SDK. That puts some limitations because it means that CNF control plane agents need to be written in Go because there is currently only Go implementation of NSM SDK. And also this SDK based configuration approach kind of breaks that cloud native principle of declarative APIs. But there is an option to put an already prepared CNF sidecar next to your, for example, some non NSM non ever application for example some legacy application not written in Go or legacy application that cannot even be changed. But this sidecar approach has its limitation for example, the configuration of interfaces cannot be changed in runtime. So NSM obviously supports the the simplest topology of CNF chain here in this picture we can see CNI one shown as some NSM native application which call uses NSM SDK to request tab interface that goes then into NSM data plan then this is connected via L2X connect into MIMI interface into another CNF which is also NSM native using SDK but it is also VPP based. So it supports the MIMI interfaces and then chain continues through the through the F packets it gets to the physical interface through the vague slum it gets to another Kubernetes node. And then again F packets L2X connect and to tab interface to another CNF tree, which is some here as an example of some non native let's say some legacy network function which uses sidecar. Right and actually NSM supports any topologies not just chains in there in SM model, some CNFs are advertised as a so called NSM endpoints being producers of some services and other CNFs are acting as NSM clients consuming those services so it's a producer consumer kind of approach. And then there is a routing defined via CRD as example is shown here on the right side that then decides for each NSM client to which NSM endpoint it should be connected to based on labels which are attached. So, so for example, here we have two rules. They are all the cold matches right and this this bottom one, it says that NSM client with label app equals CNF five should connect to NSM endpoint advertised with label CNF equals CNF three, and that results in this middle bottom link between CNF five and CNF three. All right, so that was some introduction of technologies that we that are involved here. And now the work that I'm going to present here, what we decided to do was to integrate NSM into legato to get all this CNF wiring features from NSM. But with cloud native principles, such as the declarative APIs and also the composition of network features into separate plugins as it is provided by legato framework. I will skip this slide to the diagram to show you on a picture what we did. So basically, we created a new plugin for managing an SM interfaces. It gets declarative configuration description from from these APIs and translate that into corresponding sequence of imperative calls into NSM SDK to to get those interfaces created by MSM. And then the the VPP and Linux features already provided by by legato can then reference these interfaces transparently by their logical names, that means irrespective to how those interfaces were actually created from their point of view, whether it was via NSM or through some other way. So in other words, the already existing legato plugins can be used with now supported NSM interfaces without any changes. I will go back one slide just to show you link to GitHub repository where the the code of this NSM interface plugin can be found. And the combination of legato VPP agent together with this NSM interface plugin we call NSM agent. And in the following pre recorded demo, I will show you how it can be used to implement and wire CNFs. So source code of the demo, which basically is just a set of email files deployed into Kubernetes can be found on this GitHub link. Okay, so let me switch to the pre recorded demo. Right, so I will pass it here. So here we have a simple in this example, we have a simple chain topology to be deployed inside the Kubernetes cluster. So there is in the middle there is a CNF not 44. So that means cloud native network function providing network address translation between private and public IP for networks. And it is wired on one side with a client pot with a private private IP address. And this client is in this example is actually a Kubernetes pot with crawl to crawl to install inside. And on the other side of CNF not 44, there will be a web server with a public IP address. And the web server is actually a another pot with BPP inside. And by the way, BPP provides HTTP server for testing purposes. So that's what we are going to use. And all these three, all these three pots are using NSM agent as the control plane agent. And here on the left side, we see the email file for the definition of the routing for NSM. So it's just two rules. One, the routing is definitely such that BPP web server is an endpoint for CNF not 44 and CNF not 44 is an endpoint for the client pot. Right, so I will resume the demonstration. And by the way, inside the repository with the demo, there is a read me file which you can I can then follow to try it out for yourself. There are also all the steps which I'm going to do. Explain in detail. And actually first couple of steps tell you that you need to have Kubernetes. Deploy it together with NSM. But this has already been done before recording. But but in the read me file, you can find links to the documentation of Kubernetes and NSM to learn how to deploy each. Right, so now let's start deploying the email files. There are a couple of them. So first we actually need to deploy CRD controller. For our CNFs, it will be used to receive declarative configuration in a Kubernetes native way in this way. In this case, it will be configuration inside the email files deployed via QPSY tail. And the controller just write this configuration into at CD data store, which was already now deployed. And from at CD the NSM agents will read it. So this is that cloud native principle of statelessness separation of state from application code. That's why the configuration is not submitted into NSM agents directly. So next once we have CRD controller, we can start deploying our CNFs and pods. So first we will deploy web server. So here actually inside of its email files, you can see the declarative configuration, which contains just that NSM endpoint, and it is inside an instance of our CRD. So this is the first part of the email files. Now, once we deploy web server, what happens, you can see on the right side. So the pod is created. It includes the VPP, which will be our HTTP server, but also NSM agent that receives that configuration and based on that configuration, it will advertise the endpoint to the NSM control plane. Next we deploy CNF NAT 44. In its email definition, you can see that there is a definition of the endpoint to which the client will connect. There is the NSM client that will connect to the web server. And then there is now highlighted the NAT configuration and that NAT configuration references those interfaces by their logical names. So the same NAT configuration could be used regardless of what wiring technology we use, but in this case it's NSM. Right, so once this is deployed, the NSM agent of NAT 44 will request connection from the NSM control plane. The NSM control plane will look at the routing and will determine that it should connect to the web server both once MEMI interface because they are VPP based. And it will create those MEMI interfaces that will connect to each other directly without even having to go through the NSM data plane. So it's a very efficient connection. In this case, and we can use the VPP CLI to confirm that those interfaces have been already created on both ends. And finally, we will deploy client pod with the call. So in its configuration, there is just this NSM client definition that will connect to CNF NAT 44. Also, there is a route for HTTP requests to be directed through that interface, through this data plane connection rather than going through the primary interface, which is created by CNI. And so once we deploy the client, the NSM agent of the client will request connection from the NSM control plane. So the NSM control plane will determine that it should connect to NAT 44, but since client is a Linux application, the call is a Linux application, it wants tab interface and tab interface cannot be connected to MEMI directly. So both client and NAT 44 will connect to NSM data plane, and there they will be linked together using L2X Connect connection. And with that, we have the chain ready. We can test it using the HTTP request sent from the client. As you can see, we have received the HTTP response from the server. And let's verify it even more. Let's do packet tracing in the web server and see that if those packet indeed goes through the MEMI interface, we will start packet tracing on that MEMI interface. We will rerun the HTTP request, and then we will print the captured packets. And firstly, we will confirm that we have all the packets of the TCP session going through this data plane connection. And secondly, we will also verify that the IP address of the client has been already sourced and added to the public IP before it reached the web server. Right. So that concludes this first part of integration of NSM into Ligato. And now I will pass it to my colleague Pavel, who will show you how you can trace such VPP-based CNF deployments using IOISOR. Hello everybody. So next couple of minutes we will talk about traceability, especially VPP traceability. As for any other application extensively using processor, it's good to know how it operates, where bottlenecks are, or why its performance is low. This is where tracing comes into the picture. This has two well known tracing tools, S-Trace and L-Trace, allowing you to see what system calls and dynamic library calls are being made. But what do you want to know what happened inside such call? Fortunately, science version 4 of Linux kernel, there is a Berkeley packet filter, VPF for short. These are optimizing packet filtering and on top of this VPF, there is enhanced VPF, the EBP for short, which allows to run on events other than packets and to do action other than packet filtering. Data source for tracing could be a system call, a function call, or even something what happens inside, shall we call. Data can come from kernel or from application in user space. Kernel probes provide dynamic access to internal components in the kernel, and you need to know the function signature that you want to break into. Kernel trace points are needed in case of static access to internal components in the kernel. These trace points are codified by kernel developers when they implement changes in the kernel. User space probes are used in case of dynamic access to programs running in user space. User statically defined traces trace points are designed for static access to programs running in user space. And application developer have to manually annotate their code using user statically defined probes. Data from events can be extracted by various application, a spare system tab, and also including enhanced VPF. And to display a result, responses or logs, there is a variety of front end tools including VCC and VPF trace. In the next, we will focus on EBPF. EBPF is a register-based virtual machine using a custom 64-bit instruction set. And it is capable of running just in time native compiled VPF programs inside a Linux kernel. Enhanced VPF is full virtual machine implementation, so not to be confused with kernel-based virtual machine. All its interaction with user space happened through EBPF maps, which are key value stores. By design, EBPF virtual machine and its programs are intentionally not turning complete. It means there are no loops allowed. And so each VPF program is guaranteed to finish and not hang. The main and recommended front end for VPF tracing are VCC and VPF trace. VPF compiler collection, VCC for short, provides a large collection of tracing tools for developing kernel tracing and manipulation programs. VPF trace is a high-level front end for VPF tracing, which uses libraries from VCC. VPF trace is ideal for ad hoc instruction instrumentation with powerful custom one-liners and short scripts, whereas VCC is suited for complex tool environments. Here is VPF program workflow diagram, and we can see that at first, VCC generates bytecode from user program. Then user space sends bytecode to kernel, where it is verified and compiled to native code and inserted at specific code location. And finally, kernel sends measured data back to the user. On the left side, there is a set of Linux events supported by EBPF and set of various tools on the right side. All these tools are available on iOS or GitHub web page. On this web page, iOS or GitHub, there are plenty of examples of VPF one-liners and scripts. An example of script code is here on the left side, and also VCC tools on the right side. The reference guide to write VCC tool and VPF trace script can be found on iOS or GitHub also. And now we will demonstrate using VPF probes in VPP. Again, I have recorded video as Milan, so we can first talk about what we will see and then we will demonstrate it. At first, we can check any existing probes on our system using VPF trace tool. We can also check count of system calls performed by given process using this kernel trace point. And also we can check number of file system reads using kernel probe with return value. So now let's do it on our system. So first, we will list trace point which contains some string because there is a lot of trace points in my system. As you can see, there are more than 40,000 trace points. Now we can check number of system calls by process using kernel trace point. So let it run for a while and then we have to stop it to see results. And finally, we can run kernel probe to see number of calls read function on file system. Again, let it run for a while and then we have to stop it and we can see histogram with results. Now we can start tracing VPP using VPF. As Milan mentioned, VPP is fast open source vector packet processing data plan framework. VPP has a set of CLIs and APIs to configure it and to retrieve status as well as information from VPP. And so using VPP we can trace interface counters, interface state changes, node statistics, neighbor table updates, routing table updates, nut session creation deletion update and much more. In our tracing tool we use user statically defined trace point because as Milan said VPP is completely running in user space. And the reason to use the user statically defined trace point is that most of VPP function return just void, error code or complex structure so using user probe is useless. And in every place where we want to do a probe we file a system type probe using this macro. In macro we provide name of provider which is in our VPP name of probe in our case is VNet software interface state probe and arguments for this probe. These arguments can be read later on using VCC or VPF trace tool. An example how to trace our probe using VPF trace tool written in Python is on this image. This is a tool definition of probe path to the probe in our case it is shared library name of the probe and arguments which are which we are interested in. The same we can we can do using high level language VPF trace and it's script. An example how this script should look like. As you can see it is pretty short and here again is defined type of probe in our case is the user statically defined probe name of shared library and so on. So now let's check it in in our code in our application. So first we have to run VPP and now we can use VCC tool to list probes in VPP shared library as you can see in one of VPP's library it's really library there is a set of probes. There is another library VNet which also contains some probes and finally one example of plug-in is not plug-in and here is set of probes in this plug-in. And now let's let's try interface state changes in VPP using VCC tool. Here is the first example we saw in picture and using this VPF script, the second one. So now we have to create interface in VPP. And as you can see immediately we can output of the probe. This is using VCC tool. So it is not formatted it is just as it is provided but using our script we can format and provide also name as example. Now we can change interface state from up to down. So from down to up and you can see changes in both outputs. If you switch it down you can see again change in the output. So finally we would like to present our complex tool which combines output from all probes we have inserted to VPP yet in one screen. It combines data plane changes, control plane changes so you can see all interface state error and node statistic and all outputs from control plane performed by CLI or API. First we have to create two pods running VPP. Here is configuration file for both pods. One is named VPF1 and the second one is VPF2. Now let's start these pods and we can connect to VPP consoles using VPP tool and here you are. We are connecting to VPF1 and to VPF2 again in another console. And now let's start two instances of VPF tracing tool one on VPF1 node or pod and one on VPF2 pod. As you can see probes are attached in a few seconds. Now we can create interface and see outputs in this tool. So one interface is created, it stays down. Now assign IP address to it and we can see probe which was invoked and their arguments. Now we can change state to app and then we can add neighbor to this interface. As you can see two probes were invoked by one CLI command. Let's do something more complex on the pod number one. So I have prepared script which will create more complex topology. So there are two interfaces both have IP addresses. There is not feature enabled on interface PG0 and there is also packet generator script which will generate 100 packets with source destination and with source and destination IPs and source and destination port. This script can be simply executed on VPP so all commands have been executed at once and you can see changes on interface state and also what IPs and so on have been assigned to interfaces. Now let's simulate traffic using packet generator. And here you can see that 100 packets have been sent using packet generator and one last session has been created. Finally, we can configure interface on running VPP in Docker and to use pink tool to check connectivity. Now we have started VPP and we are trying to do pink but there is no response as VPP has no interface assigned configured. Let's start our tracing VPP tracing tool. It is trying to attach to running VPP. Once it is configured, once it is started we can configure VPP that we will create one interface, switch it up, assign IP address and also enable for communication. We can simply execute this script on running VPP. You can see changes here and now when we are pinging this IP address, there is response is coming and you can see that VPP is processing packets and you can increase in number of packet process by given interface. Here is 10 packets sent as there are 10 pings. This concludes the second part of presentation, the VPP traceability using VPPF and now Milan will give some final words. Thank you Pavel. Alright, so this was our demonstration of building CNFs with VPP and network series mesh. Also with some VPP traceability in cloud native deployments and here on this slide there is a summary of what we do in Pantontech. We of course welcome any contributions and cooperation in all of those open source projects that were mentioned. And lastly I would like to thank or we would like to thank the Lynx Foundation for giving us the opportunity to share this demonstration with you and of course thank you to everybody, everyone who participated. And if you would like to keep in contact with us, make sure to find us on all major social networks as we have written them on this last slide. So thank you. And now, if you have any questions, please feel free to ask them. Great. Thank you so much. Really appreciate this great presentation. We do have a couple of questions and if you have any more folks on the line. There is a Q&A window at the bottom of the screens so feel free to type in your question there. So we'll go through a couple of these first. We did get asked is there any performance penalty of using network service mesh. Right, so as has been shown in the picture, which was included in the demonstration. NSM consists of some components and there is the data plan component which by default is VPP run on each Kubernetes node. So obviously VPP is like you can say CN CPU oriented packet processing. So it requires some computational resources. But as like the reward is higher performance. So if you have those computational resources that's it's a good choice. But if you are limited there and you don't need the best networking performance and you can change the configuration of NSM to use kernel kernel based data plan which is also supports and with kernel networking it is basically less less demanding on the computational resources. But the price is that the network connections will be a bit slower. Okay, great. Thank you. Are there any alternatives to network service mesh. As it has been mentioned like to create additional connections for data plan between between CNFs and pods in Kubernetes pods in general. You can for example use the already mentioned Conti VPP, which is very similar to NSM in that it also uses VPP to wire CNFs, but Conti VPP is also a CNI so even for primary interfaces you can have VPP based connections. So you know it's your choice basically if you if you want like any CNI if you have some requirements for specific CNI or if you are okay with using specific CNF oriented CNI and then you can use Conti VPP. Great. Thank you. Where does the NSM store the configuration. Well, so the NSM follows this cloud native principle of statelessness. So it stores the configuration into CRD so so not locally next to the application code or in some file, but like separately into into CRD. And so, yeah, that makes it suitable for for cloud native deployments, as it can handle therefore it can handle restarts very well. Great. And where, where would somebody go to find the code. So, so, so for legato, there is a link in the presentation for legato code for NSM. There is a linked webpage networks that is mesh.io, which links you to GitHub repository and also that there are links for for this integration of NSM with legato also included in the presentation. And also the these examples are which we have demonstrated here are in a repository also can be found in this presentation. Wonderful. Thank you Milan. It looks like our next set of questions are specific to Pavel's part about tracing. So somebody's asking why or when should I use BFT trace instead of VPP CLI. The reason to use BFP trace is that you can annotate any piece of code you would like to to trace instead of CLI which provide just different definite number of commands and moreover CLI provides output for given command only and as you saw in the presentation, when you do some changes in VPP, there are two probes invoked and using CLI you will never got this output, this output. So it would do much better tracing information as CLI. Okay, thank you. And then it looks like we just have one more question here. Do I need a special VPP build to be able to use BPF probes. Sure, it has to be a special tool because we are using user statically defined trace point, which as I said, should be used to annotate code. So you have to build your own bill and insert these trace points and you can then trace application. So, yes, it is necessary to build your own version. Okay, well unless I don't think we have any more questions so I think that concludes today's webinar. Thank you so much to our presenters and thank you to everyone for joining us today and we hope to see you on an upcoming elephant webinar soon. Have a great day.