 Hi everyone, thank you for attending my presentation called eBPF on the rise. I'm Quentin, I work as a software engineer at Isovalent and we are doing Cilium which is built on top of eBPF to bring networking, observability and security to Kubernetes clusters. My objective today is to help you getting started with eBPF and we'll do that in a free part. The first section will be about understanding how eBPF works and then in a second time we'll see what tools are available for working with eBPF and the last section will be about what benefits we can have for cloud native environments from eBPF. So before we dive into the details, let's observe that there is something happening with eBPF these days. It's been marked by Liz Rice as one of the key technologies to watch for this year. But why is that so? What's happening with eBPF? To really understand that, let's observe that the Linux system is being used at the basis for everything in cloud environments nowadays. And we have some kind of paradox with Linux because everything that allows you to get some observability to understand what resources are used by a given processor pod is happening on the kernel side but you have very little flexibility at the same time in the kernel. You are free to program whatever you want in user space but in user space you won't get a direct access to those kernel data structures. So do we have a way to introduce more programmability in the kernel? We have kernel modules but kernel modules can be tricky to implement or they can have issues in terms of safety. You are likely to crash your kernel if you made some mistakes in your modules. You do not have any guarantees in terms of API stability from the kernel from one version to the other one so that can break your modules too. So we have all those kind of components that are like bounded bugs in which you can work with, from which you can interact but can you somehow get out of the bugs and bring back some programmability into the kernel not at the expense of safety or efficiency and if so can you leverage that to get some benefits in your client-native environments? The answer of course is EBPF. So EBPF is some generic purpose execution engine to really implement your programs defined in user space inside of the kernel. So historically it was built on top of what is now called the classic BPF which was used with Dispinum or Seccom to filter packets to copy to user space or Cisco arguments respectively. And nowadays EBPF is using the BPF system code to take some bytecode from user space and to eject it into the kernel where it's executed. It's attached to some specific hooks inside of the kernel and run on specific events and it has several particularities. One of them being that it's extremely efficient because the bytecode generated maths well to native code on modern architectures and at the same time you have a cheat compiler just in time compiler inside of the kernel that turns your EBPF bytecode into native instructions making it really efficient at runtime. You also have benefits in terms of safety because your EBPF program will be checked and verified when you load it into the kernel to make sure that it terminates and won't be hanging your kernel and that it is safe. So that means you won't be able to have infinite loops in your programs but that makes it sure to terminate and you won't be able to leak some sensitive memory to the user space or to perform out of bound accesses in your EBPF program and to risk crashing your kernel. That just won't happen which is a really strong feature. At the same time you get something which is very versatile because you have a number of existing program types like 31 at the moment. Some of them can be attached to different hooks inside of the kernel so that means a lot of possible use cases. You have also a number of helper functions that are functions defined in the kernel which act as some kind of user library function that can be called from within EBPF programs to help you perform some specific tasks. You have a number of maps to EBPF maps as some kernel memory areas available to programs usually key value storage areas like arrays or hash maps, a few other ones for dedicated use cases too and these maps can be shared between different instances of a program to store some state for example some counters and metrics whatever. You can also share them between different programs to correlate some data. You can also share them with user space to share information with user space to collect metrics or to pass down some configuration options. In addition to those features we get a number of things that the execution engine itself is supporting as the EBPF system subsystem is getting improved. So we have now up to one million instructions in the program which makes it flexible for a wide number of use cases even advanced. We have tail codes, we have EBPF to EBPF function codes, we have bounded loops support now. We have a number of things like this that makes it really closer and closer to just a regular program that you would compile from C or from any other language really. It's getting close to something very generic and that you can use for all type of usage. About the use cases they mostly fall down into two big categories which are network packet processing on one side and tracing and monitoring on the other side. As for networking we have a number of hooks in the kernel so we have for example a hook on the TC traffic control both ingress and egress path. We have the XDP hook which is about going very low in front of the kernel stack for processing packets just at the exit of the driver and we can retrieve our packets from there and process them even before the circuit buffer is allocated. So even before we spend time and resources processing the packet in the Linux stack so that's very efficient in terms of performance here. These hooks make EBPF very suited, very well suited for applications like protection against denial of service attacks or for load balancing because it's located just in front of the stack. You have some other applications too like routing, overlay, NAT, many others. You can parameter some options for TCP sessions and even re-implement the congestion algorithms used by TCP. So a lot of things on the networking side. As for the tracing and monitoring you have a number of hooks on kernel probes and user probes that are dynamic probes that don't need any instrumentation inside of your programs. You have also static probes with trace points or the user space equivalent and a few other probes too and you can use them to inspect and trace and profile your kernel or your user space applications making it very suitable for understanding what's going on in terms of resources usage and optimizing your programs too. One big advantage of EBPF is that because it's a program running in your kernel you can aggregate and correlate metrics and other data inside of your program and just send what you need, the meaningful information to your user application. You don't need to sample out everything to user space and that means a lot of gains in terms of overhead. I mean you're saving a lot of overhead. You have a few other use cases too. So for example you have a Linux security module that is built on top of EBPF now in the kernel. You have proposals that were made about using EBPF for file systems or storage so the list is growing. There are more people getting interested in EBPF and proposing use cases and that's something that is very interesting to see. I'm looking forward to seeing new applications using EBPF but we have all those use cases. How can we use EBPF concretely to start tracing systems for example nowadays? The first tool I would like to present is first the LLVM backend to generate EBPF bytecode. This bytecode is very close to assembly but nobody really likes spending the time writing assembly I think. So we have the clang and LLC tools that have been adapted to generate EBPF bytecode from C in particular and so you can just write C programs and turn them into EBPF bytecode that will be stored in L files, L object files and loaded from there. So here is a very simple EBPF program for networking. It would be attached to an interface and drop everything which is not IPv4 packets. So this is a standalone program. I can just compile this file with clang and attach it with the common IP link set XDP to drop everything which is not IPv4 and the way it works is we have two checks in this program. The first one is about using the data and data end pointers pointing to the beginning and end of the packet data respectively to make sure that the packet is big enough so that I can check in it for the prototype field at the ethernet header and otherwise the verifier wouldn't let me do this referencing of the ethernet pointer in the second check and if this first check passes then I can just read the prototype field and depending on whether it's IPv4 or not I can drop my packet or let it pass to the Linux stack. So that's something really simple but really useful already, really powerful already. Here is something for tracing. What we do here is we attach the program to the do-sees-open function in the kernel and each time some process calls the open system call we run our program. So what we do in our program is calling a first ebpf helper function to retrieve the PID of the program realizing the call and then the second one is just dumping a line in some log file containing the name of the program, its PID and its arguments, the filename of the file that's being opened and the flags for opening the file. This is not standalone. This has to be used with PCC which is a framework for building bpf tools. So bcc handles compiling the programs with libllvm and it provides a set of Python wrappers to help manage ebpf objects too. It also contains the number of examples of its own so you have a big number of tools and examples coming with bcc. One example is OpenSnoop which does just what we did in the previous example but in addition to that it also hooks at the exit of the do-sees-open function to retrieve the value returned by the function and it prints its output to the to the console so we can see just like in this slide we have the file names and also the names of the process performing the calls and some additional information. Here is another example of bcc tools which is about profiling CPUs so we can understand what functions the CPU is spending time on. So the horizontal length of the bars on the flame graph presented here is the time spent in every function by the CPU and the vertical boxes represent the call stack of the different functions. We just need two commands to generate this graph. The first one to run the bcc tool to extract the data from the CPU to sample the CPU and extract the data and the second command is just running a script to process the data and generate the graph itself. So you can use that to profile CPU for your kernel but also for applications written in a variety of languages so there is Python, Ruby, PHP, CDAC languages and so on and so forth so that makes it really handy to troubleshoot where your CPU is spending some time and to optimize your applications. We have a number of other tools available with bcc too and this picture is another view of all the components that can be traced with the existing tools they're all open so you can use any of those already so just have a look if you're interested in understanding what's going on deep down on your system at different parts of your system. Another tool which is quite similar to bcc but even easier to use is bpf trace it's built on top of bcc but the idea is that you just use one-liners or short scripts to run commands it's like an equivalent to detrace so the first example I have here is about reproducing the tracing of the open system code but it's just on a one-line command so that's very short. You have two more examples below about printing the size distribution by process of the length read by the read system code so you you have programs using read and read returns length of the memory chain that's being read and you can just print a histogram in the console with that. The first one is about counting the LSE cache misses for your processes so really really short commands but potentially really helpful information to help you troubleshoot your programs. If you want to go be on bcc and bpf trace you can build your own applications you have libraries to help you with that you are not starting from scratch you have for example libbpf which is the reference library for everything related to libb through bpf it's the reference because it's being updated by the kernel developers each time some new features is being added to the bpf subsystem on unix. You have a number of libraries available in go also I'm probably a little bit biased here but I would definitely recommend going with the ebpf library maintained by Cloudflare and Celium which is a pure go library very useful to to manage ebpf objects. There are some options in Rust too and possibly in other languages I'm not familiar with everything that's existing on this side. Last tool I would like to mention is not so much about programming with ebpf but more about managing ebpf objects I want to be able to understand what's going on in my system like what programs are loaded and to inspect them so there is bpf tool that you can use to load programs with bpf proc load from the command line or bpf to proc show to just list the programs that are loading on your system so here I have two programs attached to so get some cgroups and one xdp program I can just dump the bytecode of those programs too to see what's in the program that that's loading on the kernel I can dump also the cheat compiled instructions in case it helps I can manage maps too I can list the maps I can look up for given entry I can update the map and I have a few more options available as well I can test run programs I can list the ebpf related features spotted by my kernel if you're interested in all those you should probably have a look at the manual pages for bpf too and that's it for the tools now can we use some of those tools and some of the advantages we've seen about ebpf into clown native environments so as a reminder we have a number of advantages brought forward by ebpf we have safety we have performance we have a big deal in terms of observability it's something very versatile it's in the kernel but it remains flexible and having it in the kernel it's also a huge advantage for other reasons in particular because it's available by default you don't need to install everything to anything to use ebpf it's already here on your system it has a stable user api so it's not subject to breakages from one kernel version to the next one it's also really easy to update an ebpf program you don't have to hack into your kernel and then send the contributions upstream and wait for them to be merged and then to be available in your version that you're using you just change your program and recompile it and reload it i no need to reboot your system to load it and if you're processing packets for example you don't even have any loss of packets between an update from the previous to the next program ebpf is also contained aware in the sense that it has multiple hooks all of those are placed in the kernel networking stack all for observability so you have a lot of possibilities to to to process your packets at the entrance at the exit of different clusters and pods and that makes it really suitable for all of these these cloud native environments and one i think one big thing with ebpf is that you have this possibility to not just program and configure inside of a framework you can really create what you need which means you can also leave aside everything in terms of features that you won't be using so for networking if i don't need ipv4 just ipv6 i just don't compile any ipv4 related features and that gives me something really clean and fast and scalable too that i can use to really implement my solution to solve my real world production use cases and that's something really important to have this kind of flexibility especially if we consider that the linux systems are really everywhere in the cloud and used for for building everything in data centers having ebpf with all this flexibility and all those advantages brings us huge benefits so how does that translate in practice so for example we have kubectl trace which is already able to run bpf trace streets on pods and clusters so what it does is basically launching a worker pod to just run the command on the node that you want to trace or profile and you can have ebpf trace one linux and scripts sending you information about your your system on the same model we have inspector gadget which is doing like the same thing really but for bcc2 and that's already available open source so you can use it already if you were to focus more on networking cdm is probably the reference here so we do networking and observability and security and all of that with ebpf so for example we have a kubet proxy replacement we can get rid of kubet proxy which is a huge advantage because kubet proxy heavily relies on ip tables rules and those rules may come by thousands and typically when you get a packet to process you would search for the relevant rules in your tables in a linear way and that takes a lot of time and resources and with ebpf we just have to do one lookup in a hash map table so you just retrieve a tuple to identify what flow your packet belongs to and then you realize one hash map lookup to get the relevant rule and that's it and because of the hooks available for example on tc you can actually do most of the processing in your packet in ebpf and bypass all of those ip tables hooks that are otherwise present in the stack so that leads to huge gains in terms of performance and makes things a lot cleaner too another example of synium optimized data path is when we use an envoy proxy to implement layer 7 policies so if i want to tell this pod the system that this pod can use this HTTP REST API command in particular but not this pod this one doesn't have the permission so i have this envoy proxy injected as a sidecar into the pod and i have to go through the linux stack three times to implement that once to get to the loopback interface and i go back to the proxy and then i go down again to exit the pod and same thing on the destination pod and with ebpf we can just avoid most of that we can just establish a connection of the socket level directly to the proxy and we also get rid of the ip tables hooks on our way down to the network so that again leads to to important gains in terms of performance we also have a number of other use cases for ebpf for networking load balancing network security observability and service mesh but i won't have time to dive into all of them so just have a look at senior documentation or journal communities like if you're interested to discussing them further let's move on with the big ebpf players that are actually contributing to or using bpf there is facebook using it a lot for tracing and monitoring and for network processing they have an open source load balancer in particular called catran that you can find on github netflix is using ebpf 2 for tracing monitoring mostly google for number of use cases to cloud flare mostly for protection against denial of service attacks and of course everywhere cdm is deployed ebpf is used a lot to implement the data path and the network policies there are also a number of other projects using ebpf so for example we have falco or tracy using ebpf for security purposes in the cloud we have herbal which already implements unprecedented visibility for network flows on your clusters we have weaveworks suricata 2 has some xdp mode for capturing packets for security purposes so ebpf is a thriving ecosystem really there are an increasing number of projects in addition to those i presented using the technology this leads also to some new startups productizing ebpf for continuous profiling network analytics for security in the cloud too and some of the startups have been acquired already so i think that was late late last year we've had a pixie acquired by new relic also flow meal acquired by splunk and that shows that there is some interest into products into the technology itself and that there is a lot of momentum here on the kennel side on the community side we've had a dedicated mailing list for ebpf contributions which received about which has been receiving about 50 emails per day on average and we also have free maintenance five senior code reviewers to keep up with the load and they come from facebook israel and google and all that makes ebpf one of the fastest growing subsystem in linux at the moment there's a lot going on we've had our first ebpf summit late last year organized by the cdm community and it was a huge success if you want to see the videos have a look on youtube they're all available two tweets that i would like to present to show also the momentum that is going on with ebpf the first one from mark russinovich is about microsoft looking at implementing some sismon like utility for tracing on linux and that sounds very interesting to to see microsoft focusing on ebpf to implement things the second one is from steven rusted about bpf that may replace linux in the future so more say usually there are a number of people that envision that more and more parts of the kennel might rely on ebpf in the future because of the performance and the safety guarantees that it brings to avoid all kind of security issues and to gain some flexibility for all kind of processing so to wrap up this presentation ebpf brings a lot of programming of programmability to the kennel it's safe efficient it's versatile it's scalable it's ideally located to gather data about what's going on on resources usage on the different calls that the pods are executing on the system and also for processing packets all of these for individual systems or in cloud native environments and we have a number of tools already that are being improved by the day to work with ebpf so we have bcc ebpf trace if you want already to use some tools to trace and monitor some of your applications we have libraries like libbpf and go libraries to help you program your own applications using ebpf programs we have tools like bpf2 to help with introspection and management of ebpf objects all of these open source and already available ebpf is on the rise so it solves real world production problems and that's something really important a lot of big companies are using it for that because it brings them so flexibility they need to to change some behavior in the kernel right now without having to wait for upstream changes so celium's optimized data path and network policies is a very good example of how you can leverage ebpf to implement advanced features in cloud native environments and there is a buzzing community behind bpf to add new tools new features all the time so i hope you'll be able to to join the community and to ride the ebpf wave so that's it for this presentation thank you for attending if you need more information there's one link in particular that you should check which is ebpf.io and which contains in particular all the pointers that you would need if you wanted to get more information and documentation about the ebpf use cases and internals themselves so have a look at this website and again thank you for watching my presentation today