 So my name is Michael. I work at Keenfolk. We do Linux system-level cloud software You maybe know by now who here is used BPF before okay, maybe 30% 25 So yeah, this this talk is really Focused on the basics and I hope it Helps you to understand the the low-level concepts and the language if you want to start with it I Will touch the history a bit and then go over architecture instruction set development tools So what is BPF? some people describe it as in kernel bytecode virtual machine others as engine so It's a small language that Can be used in the Linux kernel for Certain tasks and it has been there for a long time so Today there's classic and extended BPF and when I say BPF, I mean extended BPF as found in the Linux kernel today actually classic BPF when used today is Internally translated to extended BPF. So what actually runs is extended BPF if you have used TCP dump and probably that's more people than The group of BPF users you have already used classic BPF If you haven't seen this talk this morning it shows How you can get BPF output from TCP dump and so on and explains a couple of things about Classic BPF so I will skip this part of my presentation and encourage you to Just go back to the talk from tomorrow this morning So today we have extended BPF Compared to the classic version it has a richer instruction set more features and more use cases most prominent use cases are networking express data path Tracing so for example you can attach BPF prose grams to trace points or K probes Which are facilities in the Linux kernel to Trace program flowers to debug things And security The main design properties are that it's very fast equal to a native code There's a cheat compilation in the Linux kernel and when extended BPF was designed That was one of the main goals to make the language match how modern systems you look today so for example match registers which are available in the in BPF to actual registers of modern CPUs and Another design decision was to Make it possible that calls into or from BPF don't have any So code for in function overhead so Your code actually runs in the kernel context It's a general purpose instructions that there are 11 registers Register 10 is the program counter and you have 512 bytes of stack which is not a lot, but it's surprising how much stuff is possible with even such a limited environment The registers are grouped as follows First register is the return value of functions and it's also used as the exit value of your BPF program then there are registers as arguments to in kernel functions Registers that are coley-saved so you can use those for your stuff and Register 10 holds the frame pointer eBPF also has maps so that's for There are different kinds of maps and maps are key value stores so you when you create a map you can define what kind of data you want to put in there and Those are also used to make it possible to have Interaction between a kernel space so the BPF program that runs in the kernel and the user space application In user land so you can use it for both either you only use it in your BPF program or You use it for example to pass data to user land or to read data from user land so When you think about a firewall, maybe your user land application Wants to signal that port 80 should be blocked so you could have a table which is a list key and blocked or not blocked Port keys port and value is flagged for blocked or not blocked and Your program in kernel space could check this map to verify What it should do? Second thing is helper functions So depending on what type your program has you have a couple of helper functions By the kernels so one helper function for example is print K so you can write to a trace pipe it's not It's not the kernel ring buffer that you can see with the message but There's a trace pipe file in the kernel debug file system And there you can write write messages to Third thing is tails calls into BPF programs, so you can chain programs That's really useful when you have complex applications and Want to have some kind of waterfall design so you could load a lot of programs and then decide from your first program which handler should take over the payload and Finally, there's the pseudophiles BPF file system This can be used to pin programs or maps Usually programs and maps are tied to the lifespan of the process So when your process dies the BPF program or the map or whatever you have created Is deleted as well if you pin it to the file system that won't be the case but Then only when the last reference is gone the kernel will actually actually delete the program and That's also useful For inter-process communication, so again you can think of some scenario where one application sets up rules Programs and a second application checks the BPF file system to find out about what is there and to use those When you have your program or when you do your BPF call to load a program the kernel verifier comes into play so the for all programs that you load the verifier has to make sure that the program is sane and follows the rules because You don't want that the program for example enters an infinite loop and never exits again This would mean your kernel is blocked and the system is dead So there's an instruction limit The verifier is pretty complicated, so there's a very long comment in the source code which I can recommend as as a documentation but in my experience it often still needs back and forth between writing code trying to load it getting some cryptic message from the verifier and trying to find out what's what's going on so This can be a bit painful when you are just starting and Yeah, I don't have Feeling yet for what the verifier maybe could be complaining about All interaction with the epp ebbf subsystem happens through the syscall BPF So You have the single syscall and what you actually want to do then is defined in the BPF attribute union. It's a pretty large union today With a lot of fields which can potentially be set So no matter if you want to load a program or create a map or look up a value of a map or delete a map or whatever You always use the system call and you have to set the proper fields in the attribute corresponding to what you want to do Yeah to Get it done So all that together the big picture could look like this So this is a use case where we attach a K probe to a kernel function So it would similar like this look similar like this. It starts with a program that you write You can use LLVM to translate it to to an elf file LLVM has a backend for BPF so that's Pretty useful because as we will see in a moment Writing BPF pseudo assembly is not so easy and nice Once you have this program You can load it with the syscall the verifier will check it and if it's okay. It will be loaded and then in the K probe scenario the Kernel will trigger whenever the kernel function which the program was attached to is Accessed and Your program can for example write your maps and the user space application again can also interact with those maps so before we look at a Program written in C. I will show you how BPF instructions look low-level So this is a very simple program does that does nothing except returning 11 So what we do here is? We write the immediate value 11 into register 0 We have learned before register 0 is also the return value of your program and then we exit the program and that's Actually the The most minimal program you can write in BPF Having only an exit instruction would not work because the verifier would reject the program As only values can be read that has that have been written before This is to protect kernel memory and the exit instruction reads the Value in register 0, so we have to write to it before if we would want to load this program We could do it like this so we take this BPF attribute Union and Set a couple of fields first. I set the program type in this case where we have this Demo program it doesn't really matter But of course if you do Write your real applications you have to choose what you want to do Then we give the size of the program we give the instructions and we we also set the license we do set the license Because the kernel will not allow us to access a couple of things like helper functions If it's not a GPL licensed you can compare that with to what you know, maybe know from Linux modules there you also have to declare with which what is the license of the module and then finally we can load the program And we do this with proc load Yeah, we will pass the previously defined attribute and if the call is successful we get back a file descriptor If not, the value will be negative And you have to handle it so how does such an instruction look like? It has five five fields first is the opcode one byte so What do you want to do and then? This nice destination register source register offset an immediate constant and It depends on what kind of operation you do You don't always need all the fields of course if you want to write an Immediate constant into the destination destination register like we have done in our program We would only set destination register to Register zero and immediate to 11. We don't have to set the other fields the opcode is actually a mix of Fields so the define for the move instruction that we have used before looks like this First we have we say we want to move a value We then say BPF K and This is goes back to a classic BPF days in this context. It means we want to move an immediate value Because BPF K was the same in and classic BPF and there's another flag which would mean we want to use a source register so When we have this three things in mind those are operation code Source bit and instruction class There are two types of code encodings as you can see so it depends on which instruction you use So if you would have a load instruction, it's two bits because you here You have more options what the source or the size could be here It's only immediate or source register here. It's half word word double word and so on so that's why we need more bits here Then there's the verification log You need the log to figure out what the verifier thinks about your program So I would just always set this and When you do and there's in it's large enough that's important Then you get back the verifier lock and this exactly tells you what the verifier has been seen in your program What was loaded and if there's an error also? What what is the problem with the application? If you would set the level only to one you wouldn't get those register dumps after each instruction So you have to decide if you need this. I've said this before some programs must match The No, I haven't actually Some programs must match the kernel version K probe programs must do this because the The data tapes that are passed to the probe Version-specific so fields and offsets could change So you have to tell the kernel. Yes, I want to run this program on this version basically Guarantee that it will work We have seen maps before so there are different types of maps. I think it's between 10 and 20 today One special type of map is the the proc array it's It's a lookup table for other BPF programs which Are loaded so you can use a proc array to for example create a list of programs of handler programs and then from If you want to train programs you can Make the first program look up our tailcaller program from proc array and I've also already said user space kernel space data passing is what is done with maps map definition could look like this so You have the type you have a key size and a value size and Maximum number of entries Actually there are More fields today and it also depends a bit on what loader is loading the map So for example TC BPF It is also BPF able to load BPF Handlers for traffic control and they have a field Which allow you allows you to pin a program to a namespace LLVM learned about BPF in version 3.7 That one is important you need to inline everything and You you best define a macro because inline alone Doesn't force the compiler to inline code. So You should set attribute always inline only then you can be sure that the compiler does Does inline the code? You can do print KD debugging Clang allows you to add a 12 info That's nice because you can object dump your object after build and Compared to what the kernel verifier gave you back in the in the verifier lock And since the object dump is C code annotated you can get at least a clue where The problem is in your program I've only about four minutes left. So maybe I have to rush through things a bit. So I will give you a quick demo So, yeah, yeah, I have a Rather simple and stupid program which does Accounts this course for that I use a use a BPF program that gets attached to the Ross is called sis enter and every no matter which is called you use This trace point will trigger every time and since this is too much to output I limit it to the reach this call which has the ID 3 and Yeah, I will just start this I do this for only a single PID So this is this bash here. You can see the PID is the same as here And when I do sink here, we can see the Read just calls are detected. So I did this with For the BPF program I used LLVM to translate it to an elf object and for the user space part I used go BPF on the small tour to load the program You could also use BCC. I Would especially recommend it when you just want to start playing with it because it's Very major already and a lot of tools are in the repository Very popular and it has a very nice python and lure bindings which make it really easy to use BPF for tracing purposes They also have a nice list about which Linux version introduced which BPF feature So that's the best place to go if you have to find out if you can use a feature on the kernel that is in your Linux distro for example BPF proctestron is great available since Linux 4.12 and with that you can Use your SKB and xdb programs and Give them to the kernel and Provide data which Will then be passed to the program as packet data. So it's possible to unit test your programs without having for example network hardware which is capable of running BPF programs So this would look like this You again fill the BPF attribute the test part with your program you call it and If set you Get the data how the network packet for example looked like after it was handled by your handler The return value and the duration So that's pretty nice because that's not something which was possible before so you had to set up your virtual network infrastructure Environment and then start programs and intercept of traffic and with that you can Yeah, test the handlers alone and just pass in a socket buffer There are a couple of CCDL options if Regarding tracing there's a excellent resource by Brandon Craig And there are a couple of projects which are prominent users of BPF So there's a lot of learning a lot of to learn in the source code The talk this morning was about TCP trace a BPF actually And that's it. Thank you. Hello. Hi. Now. Would you say that ebpf is suitable for? modifying packets If they go through the kernel Yes, you can modify pocket packet I Don't know out of my head which program type you have to choose because I Mostly do tracing and don't have Real experience with networking, but I think most users actually use it for networking But you have to look up the details. I don't know More questions. Okay. Thank you