 Hi, good morning. My name is Luca Derry and my colleague Samuele will be talking together about eBPF in the context of network traffic monitoring. So the idea is to use and to show you an application of eBPF to something meaningful. We are not talking about eBPF per se, but how to use it in a word. So let's start with a few few notes about us. I'm the founder of the Antop project. You heard this before when we talk about PF ring. I'm also a lecturer at the University of Pisa. Samuele is a former student of mine and is working with us at Antop. Antop is an open source project. We started, I started in 1998, so very long time ago. It was about monitoring network traffic in a simple way. Don't think about the internet. The internet did not really exist as it is today. It was a totally different word. So over the time we started from analyzing packets, okay, just to display to people to accelerate in packet capture. So PF ring, we have done many other things. Another thing and good contribution, I think, to the open source community is a library called NDPI that allows you to analyze and dissect packet payload. So it allows you, for instance, to tell you what is the application protocol of a certain communication. So this is called NDPI. You can find all our open source code on GitHub, so github.com slash Antop, and so this is the place where you will find our activities. Let's start from the very beginning. So why we're here talking about network monitoring? So when we talk about network monitoring, so this is a definition coming from Techopedia. So we are talking about the ability in essence to monitor a computer network to see what is happening. Okay, so if there is something wrong and expected, so from the security standpoint, from the configuration, misconfiguration, this type of things. And usually the way to do that is through packet capture. So in essence we capture packets, we analyze packets, we report users what is happening. This is why, for instance, I started the PF ring because it was a way to accelerate packet capture. Packet capture has always been the main problem. So we have to receive this data. So the input are packets. So if you talk about that, so look at this picture, 2009, so 10 years ago. Okay, this was the picture of the Antop word many years ago. So PF ring was there, so we had a different way of pulling packets, a better pocket puller, T-NAPI. And then we have our application on top of it. Okay, so here we are talking about packets. We are talking about network adapters. We are talking about way to accelerate packet up. And then we have packet analysis. Again, 10 years ago. If you're talking about 2019, you go to GitHub, you download AntopNG, so that is the successor of Antop, so the next generation. You will see a graph like this, you will see that there is a peak, you will see that there is probably an anomaly here. But you will see that packets are still for citizens. This morning, until now, we talked about packets. XTP, DPDK, and so on. Packets, packets, packets. Because packets are too many, so it's not possible really to handle packets easily, so you have to put them in a meaningful fashion. So in a way so that you can understand what is happening. Usually the way of doing that is through the concept of flows. So you put together one after the other, packets with the five tuples. So same protocol, same IP source, destination part, and so on. So this is the uncompressed version, so you will see packets. This is the compressed version of flows. So you see my PC is talking with that PC over part 80, and all the packets of the same common action will be put together. And again, people like to see a peak. Again, there was a peak here. You want to see, hey, what's happening? You can see the flow, but if you don't trust flows, so if you want to go down to packets, in N-top and G, you can click and say, give me the packets. So again, we're talking about packets. What's wrong with packets? I'm not against packets. For all my life, I've done packet analysis, so it is good, okay? In particular, it is good if you want to be outside of the system you're monitoring. So we are here today at FOSDEM, so if I want to see what is happening in this room, I cannot put an agent on every single laptop. First of all, because it is impossible. You have a smartphone, you have tablets, you have many things. So it's very easy to look at packets, so capture packets, classify them, put them on a web interface. Very good. On a web, there are some challenges. For instance, one is encryption. In NDPI, we had to struggle fighting against that. Not that we have to decode the packet. Again, the packet inspection is not about decoding and dissecting the packet and instructing, you know, your data. It is about understanding if you're talking about Gmail or if you're connecting to your home application. So we had to, for instance, to decrypt the initial, you know, certificate exchange to understand what is that about. However, there is visualization. So in the past, computers were, you know, connected to the Internet, but these days, you have seen Cilium before, you will see a big machine with several containers all providing a service, actually a microservice, and all interacting. Sometimes the container moves on another place, and so on. So if you look at the packets, entering or leaving a machine, at the end, you will see the same result. So you will see that you're connected to a certain website. But you don't understand the interaction of things happening inside the same computer. This is the main problem. And also when you deal with packets, you have to handle fragmentation, packet loss, or transmission. So the packet is not the native way of communicating. When you open a socket, you say, I want to connect to this port, I want to send, get, slash, whatever. You don't know anything about packets. Okay, so you don't have to pay attention to IPR, there's nothing, zero. It's the card that is doing for you. If there is a long empty, it will split the packet into, you know, segments and so on. So anyway, packets are nice, but they're not native. People don't understand packets. I mean, experts know what they're about. So years ago, I wanted to extend that and give more information about the system, system introspection. I wanted to handle virtualization. I wanted to have also an ability of have a continuous drill down from a problem, let's say a peak. Let's say, what is happening? Okay, my web server, a stock with my SQL, and this was the reason. And then you go to flows and go to packets. So at the end, we are still using the same flow and paradigm and packet paradigm. But I want to see what is happening inside the system. If there is a security issue today, you can see, oh, I've seen this very nasty payload inside my pocket, but you don't know the application that I sent that payload. You don't know who has to receive the payload. You simply have, you know, an IDS, an IPS, that is reporting that, but you are blind with respect to what is happening inside the system. So we want to fix this problem. So this is what we are going to talk today. So in 2014, I was very excited because I'm a good friend of the creator of Sysdig, about Sysdig. I don't know if you are familiar with that. Sysdig was, it is still a kernel module that you can install on Linux. It looks like a TCP dump for system events. So in essence, you can look at system events. You can see connect, close, send, receive. So the idea was to give an idea. So when we have, you know, application or containers talking each other to see, okay, my HTTP is stored with my SQL, and this is the amount of data, but as it changed. However, this software that, again, 2014 was very, very nice, was designed by somebody that had packets in mind, packets. Okay, this was the idea. So use TCP dump over Sysdig. This was the problem. So we had, so this code is still, it's still available. We had, we integrated that with, you know, elastic search, you can see, you know, on the nice graph, you know, top processes, top, top of the, okay, very nice. So I spent a lot of time on that, but it was a failure. It was a very big failure, for many reasons. First of all, it was using too much CPU. So you have a process that is using 10 to 20 percent of the CPU on your box, simply because Sysdig was designed with the packets in mind. So if there is a system that receives a lot of traffic, you have to analyze a lot of packets, and that is wrong. First of all, because you're not in the machine with the packets, anyway, because, you know, there is a lot of things to do. And second, because adding your monitoring feature adds extra load on the system. Second, people don't really like to install agents. Okay, it's not, it's not always possible, like I've said, but again, this is still doable. But the main problem is that Sysdig requires a carrying module. And, you know, many people buy, you know, commercial support by, you know, Red Heart or whatever. And these people say, if you install a carrying module, then you are on your own. People also don't like that. And also containers were not that popular in 2014. You know, Docker started to exist, but it was not a big problem at that time. So we were a little bit ahead of time. But the main problem was the CPU. I mean, monitoring was too expensive. And this was, this was the problem. In essence, Sysdig outwars just to give an example. So you suppose that you want to monitor a TCP connection, you have to track socket, you have to track sand, sand, receive, receive. You have to put them together in flows, similar to flows. And then you have to report the information. And this is very CPU expensive, because you have to see, analyze every single packet. That is, that is the problem. So Sysdig is too much pocket oriented. Now I'm going to leave the microphone to Samuel. Hi everyone, I'm Samuel and I studied the application of the BPF to traffic monitoring with Luca. So BPF is a very great tool that enabled us to inject code inside the kernel and have it executed when a specific target kernel function is invoked. And so these, by attaching a BPF code to network functionalities, we are able to avoid inspecting, capturing and analyzing the single packet, but focusing only on the events we are interested in. Furthermore, because the code is executed inside the kernel, we can compute metrics and send it to user space only there are things that are needed and avoid sending all the information away we can collect. And both these features give us a great savings in terms of CPU usage. Furthermore, BPF has another key feature that is that it doesn't need the installation of another kernel model that becomes embedded in the modern Linux kernel versions. So these are very great things. What we have done is inject BPF code inside the kernel. It's set up inside the kernel and generate events and send it to user space through a SQL buffer provided by BPF. So the structure of events is very simple. It's articulated in different ways but provides very simple information such as the destination or source port and IP address, the protocol or for example the latency calculated from inside the kernel. So it's from the perspective of the kernel or the application and not from the external point of view of who captured the packet. Another thing that we are providing is the PIDO, the user ID or for example the the full path of the executable or the task name or the process name or for example the time in which the event has been triggered and things like this. So to do this we attach the different probes, the different we inject code to we attach code to different kernel functions that are for example the TCP connector which is triggered when the kernel connects to a remote or local host or for example to intercept UDP or events concerning the change in the state of the socket. For example we can track when the socket is closed and analyze how much bytes has been received or sent through that socket. Another thing that we can capture is the retransmissions, the events regarding the retransmissions and this is it. So together with this kind of information what we've done is using a BPFA per function that is BPFA get current task to indicate the task, the struct associated to each thread and process and by navigating through the kernel structures we can collect these kind of information such as the user or for example this group and furthermore by using the socket that is provided with the function call for example when we connect we use the TCP connector we provide as argument the socket that we want to use but analyzing the socket we can extract information regarding the protocol there or the destination address. So to provide visibility concerning containers what we have done is using the C group that we collected from inside the kernel to interact with the Docker demon with the Docker API and gather information concerning the container. This can be done because the C group identifier that we we found that this group identifier that we we have extracted from the kernel structure and sent to user space is the same identifier that Docker used to track containers. So what we have basically done is collect this information, collect the C group identifier, send it to user space and at user space level interact with the Docker demon to collect information and store them in a cache. So we can we have information like for example the container name or for how much time has been in the container alive furthermore we have found that from from the result of the query we can obtain information from Kubernetes such as the pod or the cluster where the container is in. So we have a greater view about what is going on on the machine in a bigger way concerning Kubernetes view. So under the hood this TCP accept has been implemented very very has been implemented very simply because we have attached a probe to the to the to the to the return of the function call. So when the function returns our code is executed. When our code is executed we use the socket that is returned by the function to collect network information and use the the task structure to to to collect information of the user and concerning the process. So in a very in a very similar way we have tracked the retransmission and events concerning socket closed when a socket is closed but for for for the connector it has been more difficult indeed we have to use an ash table because these because we want to know not only to return information about the user or the network but we want to return also the the return code of the function to check if it has failed or if it has been successful. So what we have done is attach a probe when the the function is invoked when the when the TCP connect is invoked collect the socket that is passed as argument then the function is executed the the controller returns to the to the to the function and when the function returns the the return code our code is executed again we check in the ash table where we put the socket and is indexed by using the thread identifier we we we we we we get this this socket and and track network information of the this beside the return code and in this way we by putting also the kernel time we were able also to to to to measure the latency of the connection. okay okay all right okay okay so in essence we have integrated the ABPF with with entopmg through a library called the ABPF flow that that allows us to avoid you know seeing all these these internals of eBPF and just to report to report events just to give an idea this is a this is a typical example so you start eBPF flow this is a helper allow you to see you know the kernel time the process ID what is happening who's talking to who what is the process what is the command something like that so the idea is that we want to make this library available to also to everyone so that you can put your application on top of it and forget eBPF so for you it will be a source of information so currently in entopmg this is a preview of what we have allows you to see next to the communication also the user and the and the process that's executing that so you will see this this this pocket is going to Apache okay or this pocket is coming from Chrome thread and this is for the source and for the destination of course thanks to the the pocket inspection we can analyze that this is very important because it allows us to augment the information so we have an idea of what users are doing so what am I doing okay I run in Thunderbird so I'm using draw box so these how I use my my network so not just analyze pocket but also analyze you know activities so what what is a certain user is doing and I don't know if you have you have noticed but we have the full part of the application so in case you you will have a let's say a security evaluation you will see you know that a certain pocket is very bad so you know exactly the process that has created that pocket and the user and the container eventually so this is this is important we have also the ability of putting all these things together so you see you know what what a certain process is doing what what are the process activities and what are the users doing inside the container but you know this is very nice okay we just track you know some some some events very simple so if you have a you know gigabit of traffic we simply see connect the response and periodic update but we have a problem with udp inside udp every pocket is independent so you can open a socket send the pocket to to each of the of your few so how do we avoid play with pockets again how do we avoid sending to use a space an event for every single pocket so at the moment we have used an internal lru so that whenever we see pockets similar pockets from from the same to the source of the same destination we just send the event once okay every every time so let's say for every for every second for every 10 second we we see several pockets so especially if you have a vpn let's say before somebody mentioned open vpn so this is a typical example so we send an event every every a few seconds so without doing that so this is one of the problems that we are still trying to tackle unfortunately everybody's talking about ebpf in very positive terms but there are also some some problems with it the first problem is that bcc is not what what i consider a stable you know tool first of all there are some limitations in terms of memory loops this type of things you might say so why why is it a problem it is a problem because whenever you have to decode a pocket you have to know you have to go through certain layers so in fact before in the presentation of xdp people put a lot of emphasis about the speed say give me this pocket give me this pocket but if you have to look at the pocket before making your decision so you have to decode it if you want to decode it inside the kernel with ebpf you have to deal with the problem and the limitation of of this type of tools another point for reason is that you know the bcc tools are changing very often okay so you see people that are playing with python and they are happy but if you play with the cpi you know sometime this might change we have although also some some other little limitation that i hope that will be solved in the future so just to wrap up it is now possible to see the full path from pockets to activities if there is something by happening i will not say this pocket is bad but i will say this user with this process has created the problem that's it thanks to the you know the container paradigm and thanks to the fact that our library can run on the host not on each container so we have the ability of monitoring all the containers of a system and to collect this metadata and attach this to two packets the load is very little you know with our tool the load is less than one percent so it's almost unnoticeable and we have this information this one as shown this preview you know it will be part of the next n top ng version version four but we plan to have this this library available so that there might be other tools willing to use it so that for instance i imagine a net or probe or an IDS that want to add this information next to the to their problems they are reporting thank you very much