 Hello. Okay, I think time is up. So, when we start, welcome everybody. My name is Rastis Kassava. I'm from the Panther and Tech company and today I'll be presenting something about network for monitoring in Kubernetes using the ContiVP PC and the Elastic Stack. Before I begin, something about me. I'm a staff engineer at Panther and Tech. We are doing some software solutions mostly for the networking industry. Previously, I was personally focused mostly on network imageability protocols using NetCon blank, currently working on cognitive-related projects. And I have contributions to several open source projects such as Cisripo and three that I'll be mentioning throughout this talk, FDRO, mostly the Go VPP SAP project, Ligato and ContiVPP. Okay, let me start with some motivation. Why do we actually need network monitoring in our Kubernetes clusters? Well, at the very minimum, you would at least need to be alerted and notified whenever there is some issue in your cluster. I mean, networking is a related issue. There might be some factor issue which would cause that one of your notes does not work or just congestion, packet drop, et cetera. Also, you want to monitor your cluster in order to identify networking bottlenecks. You need a large-scale deployment with many ports throughout many nodes. You want to deploy them in a way that the traffic is distributed according to the cluster. Of course, another motivation could be detection of malicious activities or attacks and their investigation if possible. And the fourth motivation that is interesting for me is the deployments of CNFs, cloud native network functions, which is network functions deployed as a set of microservices, let's say, router or some net or firewall running as a port or as a chain of ports. Okay, now what are the current options for some network monitoring in Kubernetes? It's pretty much metrics. I'm using the CNFs, expose some parameters metrics. Some of them make sure you do not export the most interesting data, which would be packet counters or bad counters on individual port interfaces. Some of them export just some internal state and error rate, et cetera. But for instance, the one, the CNF that I'll be talking about, the CNTVPP exports quite nice metrics. I'll show how it looks. Then you can rely on some service mesh in case that you have service mesh solution running in your cluster. But that would give you pretty much the view on the application layer. Sometimes you need to see what's happening on lower networking layers as well. Then you can do your own metrics. You can monitor network interfaces in individual ports, new spaces, and process them. But the most important thing that I wanted to mention is that metrics are usually not enough. For metrics, as you can see on this slide, which is a CNTVPP CNI deployment, which exports the parameters metrics for each port interface. For metrics, you can spot some issue. You can, for instance, see that on the top graph, we have some spying in error drop. You can get an alert that something is happening in your network, but you don't really know what is happening. When you get the alert, you can go to the alert, explore what's happening. But if that goes to normal, you again have no ability to figure out what was happening. Maybe it was an attack. You don't know. Okay. I'm actually showing the parameters, exported metrics of CNTVPP CNI shown in Grafano. As I said, for each port, you can see, for instance, what was the outgoing incoming traffic rate. You can filter those on the pod names, et cetera. So these are metrics. But if I said that metrics may not be enough, then what we can do in order to get some better visibility into our network? Well, in traditional network, the networks you would use the protocols such as NetFlow or IPfix, which are, I think, well known in the networking industry. The NetFlow or IPfix protocol work based on three components in your network. You'd have FlowExporter, which exports some information about each network flow that it sees. And the network flow is invariant and double identifying the individual network connection, which consists of source and destination IPs, protocol type of service and ports, of course, data and packet counters and timestamp. So in a traditional network, the FlowExporter would be either outer or switch or a specialized hardware probe. The FlowCollector would be some software which is responsible for collecting all the flow records that it gets from all the FlowExporters. And then you would have some analysis tool, which pretty much analyzes all the data received and stored by the collector. Okay, in Kubernetes, we can actually use the same principles. We'll just use different components as our FlowCollectors, Exporters and analysis tools. So in Kubernetes, our FlowExporter may be a CNI plugin, because the CNI plugin is actually our router switch, which forwards all the traffic between the ports on the same node and between the nodes. You cannot actually explore the flows on the routers or switches that would interconnect your nodes, because the traffic between the nodes is actually usually encrypted or it's encapsulated in some tunnel. So that's another reason why the CNI would be the right component for acting as a FlowExporter. So the CNI for exporting the flows. Now what we can use as FlowCollector and the storage and analysis tool, since we are in the cloud-nity world, we could use, for instance, the ElecStack, which stands for three open-source components that go together and create a world infrastructure together. One is the WorkStash, the other one is Elasticsearch and the third is Kibana. So the WorkStash can act as a FlowCollector. It already supports the processing of IPfix and NetFlow information. The Elasticsearch we can use for two things pretty much. That's the storage and it can act as a search engine for all the stored flows. Elasticsearch is very good at horizontal scaling, so we don't have to be worried about the scaling issues, because if one Elasticsearch instance is not enough, we can scale it horizontally into cluster. And for the analysis, we can use Kibana, which would pretty much be our UI that allows us to search in our FlowStorage. For the pull-off concept that I'll show in a second, I used one of the CNIs that are available. The CNI is called ContiVPP. The CNI is really focused on two things. One is speed and redundancy, because it runs an FDI or VPP instance on each node. VPP is a very fast data plane in the user space, which its name or shortcut comes from vector packet processing, which means that instead of processing the packets one by one, it tries to process multiple packets in one vector at once. That's why it's very fast. Also, it relies on DPDK for accessing the NIC on the node. And it also supports an interface called MIMIF, a memory interface which is able to forward the packet between the VPP instance and the port through shared memory. To use the MIMIFs, you need to have a special application running in the port, either another VPP instance or some application that would use the MIMIF client library. So by default, you get the tab interfaces between the VPP and the port, but in case that you would need high performance, for instance, for CNFs, deployments, you can use MIMIFs. Well, that's actually the second aim of the ContiVPP CNI. It's really aimed for the cloud-mating network function deployments because it supports the memory interfaces. It is able to connect multiple network interfaces towards the ports, and it already supports some simple service chaining between those secondary interfaces. And for this demonstration, the best thing is that VPP already supports IPvx, so it pretty much just needs to be enabled on all the interfaces that go towards the ports. Okay, this shows how this can be all integrated together in case that we wanted to upstream this IPvx support to the CNI. So pretty much, you need to write a ContiVPP IPvx plugin, which would hook to the port and delete irons and enable IPvx on the port interface on VPP. The way how ContiVPP does it, it is, of course, written in Go as, I think, all the CNIs, but ContiVPP internally uses the Regato VPP agent to program VPP. Regato IO and its VPP agent is an very interesting project, especially if you are interested in CNFs. It pretty much provides production-ready CNF that can be just deployed and configured. It contains the core container which, or it provides the core container which contains MVP in it plus its management agent, which provides several cognitive, more bound API ways of programming it, such as GRPC, REST, ATCD, etc. And ContiVPP uses it as an internal dependency to program VPP. And, of course, the Regato agent then calls binary APIs, which program the VPP instance. Okay, so once we have VPP acting as a research between the ports, thanks to the ContiVPP CNI, we will enable the IPvx on each of the pod interfaces. We need to deploy the lockstash Elastic Surgeon Kibana. For this, we can actually use an existing project called ElastiFlow, which is really ready to use out-based IPvx collector and analyzer solution that we just need to run. For this demonstration, I just packaged that into Docker containers where I've wrote the deployment dianofiles for Kubernetes, and all these three components run as spots in my Kubernetes cluster. The VPP instance is then configured with an IP address of the lockstash pot where it sends all the full information. Yeah, this shows pretty much the result, but I'll try to show it to life if I manage that with one hand. So, this is how you can see the flows in your cluster in Kibana. As I said, the flows were exported by VPP, sent to the lockstash, which saved them in Elastic Search, and Kibana just accesses those stored data in Elastic Search. These dashboards come with the ElastiFlow solution, but of course, you can write your own dashboards if this is not enough, or you want it to see something more specific for your deployment. So, let me see if I can show you this in my browser, or let me start in a console. So, this is my cluster running on my laptop. I have already deployed Elastic Search lockstash in Kibana. They are running as spots in here. And apart from them, I have the content of VPP CNI running, which you can know by seeing this content VPP, this content V-switch port, which is actually the V-switch which processes all the packets between the ports. I can access the VPP console, which allows you to type some debugging commands. I can show you the individual interfaces that are going towards the ports. You can see there is a lot of packets coming to them. Another interesting fact may be our full ports. Yeah, it seems that I have somehow managed to disable it in the meantime. But never mind. All the flows were actually exported into Elastic Search before I disabled it. I can show you one more thing that I have not mentioned. The content VPP CNI also provides the Kube Proxy functionality on VPP. So in case that you have content VPP deployed in your cluster, the net that usually happens between the ports does not happen using Kube Proxy, but using the VPP, which we can see here showing not for static mappings. So these are the materials programmed on the VPP, which do all the netting for the Kubernetes services deployed in my cluster. On the bottom, you can see the entries that handle the Kube DNS service. So the Kube DNS cluster IP is load balanced to two local IPs, which are the IPs of the Kube DNS ports. Okay, so let me go back to Kibana, which accesses the flows stored in the Elastic Search. Just briefly show you how does it look like and what you can see. So this is the top end rule. In my cluster, I didn't have any specific application running. Pretty much it's just those ports that you have seen. So the Elk ports and the Core DNS ports. But, yeah, you have the ability to see on the graph, which are the most active services you can filter out by service ports, for instance. So if I was interested in HTTPS, I would see just Kube DNS communication. Another rule could be this nice rule on the flows in the cluster, where I can see which port communicates with which ones. So for instance, this one, this red one was Kibana accessing the Elastic Search port. And this yellow was Elastic Flow or Elxtash, sending the traffic into the Elastic Search. There is also a GUIP rule in case that I had a cluster that would be accessible from outside, outside from where my clients are coming. And we can also dig into some traffic details for instance, if I was interested in clients ordered by packets per second or bits per second. Again, you could filter out by services, server IPs or server names, and the same for the client. And the last one, which I would show is an actual flow records, which are stored in the Elastic Search. So this is basically the raw data that is in the Elastic Search and from which all the dashboards are generated. So let me open one. Okay, so this is the raw flow data as Elastic Search received from Elxtash. So all the information about what was the client IP address also translated to its username, thanks to DNS working in our cluster. The same for the destination, which was the server IP. And some more information about the TCP that was going on to that flow and the port numbers, flags, etc. Of course, timestamps and byte rate. Okay, now let me go back to my presentation. Yeah, so what we did see was just a single node cluster with one Elastic and Elastic instances. For real world deployment, as I said, you usually want clustered deployment of Elastic. And ideally, you would want to have one Elasticized instance running on each node to limit the traffic between the VPP and the Elastic because in case that you have too many connections in your cluster, all the flow information coming from VPP to Elastic that can be quite heavy load. Of course, we don't have to scale Kibana because that's just an UI accessing the Elastic search. I was also talking about the MIMI interface that VPP supports. To recap, it is a special type of interface which allows you to forward the packet between the VPP, which runs in the user space, to a port, bypassing the kernel through the shared memory. But for that, you need to have an application that is MIMI-F-aware, which means that the application needs to use the MIMI-F library. We can actually use that to build more lightweight flow collector than the lockstages. We could, for instance, use the Cloudflare's GoFlow project and just integrate it with the MIMI-F library in Go. And in that case, all the flows that are coming from VPP to our flow collector may go through shared memory. Yeah, and that would be it from my side. I think I have a few minutes left, so I would maybe mention one more thing that is important from the point of view of CNFs, called native network functions. And that is that ContiVPP supports ports with multiple interfaces, which can be deployed as follows. So pretty much in the port definition, you would just tell ContiVPP that you are requesting an additional interface, and you give it the name of the interface, the type of the interface, which is stop in this case, and the network, which is stop in this case, which means not connected to the default port network. This is how we request additional interface. And what's important is that the IPfix can be enabled on those interfaces as well. Also, in case you wanted to deploy MIMI-F over CNF, it would look like this. Again, you would define that you want the custom interface, give it the name. In this case, you would give it the type MIMI-F, and that's it. ContiVPP would connect additional interface and enable IPfix on that interface as well. That's very important from the point of view of cloud native network functions, that this can work on all the interfaces inside of the port. And if I'm talking about CNFs, I would just show you maybe one more thing. So, as I have told you, those secondary interfaces in this particular case would be in stop network, which means they wouldn't be connected anywhere. You can define a service function chain, which would actually interconnect the port with an additional interface, with another port, with an additional interface, in this case on L2 level, which may be useful for various networking deployments. Okay, I think that's it. We have five minutes, so if you have any questions, feel free to ask. I'll give you, Mike. I have two questions. One is, I see the ContiVPP hasn't updated for about one year. Do you have planes that or lose maps for the ContiVPP? ContiVPP was not updated? Yeah, yeah, yeah. You can see from the GitHub. The projector seems stopped update. Yeah, I need to mention one thing. When we are talking about ContiVPP, we need to be sure to look at this repository, which is really ContiVPP, because there are many Conti repositories, which pretty much contain the old ContiNet plug-in CNI. That one is not supported anymore, but ContiVPP is still being developed. So there are pretty much two different CNIs. There was old Conti. The development on that one has stopped, but there is ContiVPP, which now uses the same name, but is completely different, and it is based on the VPP instances running on each node, and that's what I was talking about. And as you can see, that one is like two updated four days ago. Yeah, yeah. Another question is, I see you have a record in low-stash for each packet. So? You have a record in low-stash for each packet? No, no, no, not in each packet. And what's the positive for recording in low-stash? Yeah, so the way how IPv6 or NetFlow protocols work is that you are not saving each one packet. You are saving only information about each network flow. So you have a TCP connection which transmits two gigabytes to it. You would have just one record of a few bytes telling what was the client IP, server IP, timestamp, sports, data bytes that flew through the connection, but that's it. No, each packet, only flow information. And so what's the period for your update to the low-stash? The intervals, yeah. Yeah, so. Yes, it's configurable. Yeah, yeah. So what I've shown is very just a proof of concept. It is not yet upstreamed into Conte VPP. It is all configurable on VPP site. So you can configure how fast you want to send the flows from VPP into the flow collector. Okay, thank you. What's the difference or relationship between Conte VPP and Conte VPP? Only the name. And maybe some of the team members of all Conte, where are also the members of the Conte VPP team, but from the CNI point of view, only the name is the same. Conte VPP is completely new CNI, which just uses the name. Hi. I have actually two questions. First, have you any numbers on performance drawbacks you receive with your stuff? And the second, how do you compare yourself to with something like Weave Scope? In this case, you mean the Conte VPP CNI performance number or the IP fix? If I just implement, okay, forget, let's keep blockstash and elastic search aside, just installing your IP fix stuff. How big would be the performance impact just roughly? Yeah, so we are comparing just the CNIs compared to Cardico or whatever else. If you use tap interfaces between the VPP and the pot, which are there by default, because the application just uses sockets to communicate, the performance is outside similar, if you just go pot to pot, not through service. There is a difference. There is a better scalability with many Kubernetes services. You may know if you deploy thousands of Kubernetes services, which would program as IP table rules, that would be a lot of overhead on the kernel. And it would take a lot of CPU cycles just to handle all the not entries. This is where VPP shine, because there is almost no performance drawback on VPP. And then if you used the MMI interfaces, which as I said, meet a special type of application, then we are talking about eight mega packets per second between the pot and the V and the V switch. I don't think we have anything upstream in ComTVPP that will be available, but there is a FDIO VPP CCID test results, which actually tests what is in our field as we are doing. And there is one V switch running on the node, several containers interconnected through MIMEF. And they have numbers for several different chains between the containers. We switch one CNF and back to V switch, and then we switch CNF, CNF, CNF, back to V switch, et cetera. That's available on FDIO website. Sorry, I think we need to finish, but we can talk afterwards. I can show you something. Thank you.