 It's my first time in Bilbao and Well, it blew my expectations away. So it's really awesome city I don't know who's local here in this audience from Bilbao Rise your hand. No No one too bad. So guys I'm missing. I would move here So quick agenda we have plenty of time. So Please interrupt me at any time. We don't need to wait till the end for the questions Agenda for the next 50 minutes. I will do a quick recap of BPF like what it is Maybe a different angle from what you've seen and then we'll dive into as Elena said intricate relationship between like BPF and security and the whole like well world of speculation Okay, let me start with this with this screenshot from the movie arrival where Actress Emmy Adams is trying to explain the aliens how to quit why If you haven't seen the movie spoiler alert she was successful and Just like vi It's not for everyone. Some people hate it. Some people love it BPF is somewhat similar But what is common is BPF is a sequence of commands that can be understood So like you probably have seen different interpretation of BPF from like virtual machine to Initial else. So my definition today that BPF is universal assembly language Whatever that means and I think the main statement here that is strictly typed assembly language the very first one Unlike x86 risk-fire arm, you name it the assembly language does not have types here You have like in BPF you have type like there is no Pointer to memory. There's a pointer to certain type always It's pointer to string pointer to integer or anything else. So everything is typed So that's that's the main difference and because it's the sequence of command that can be understood It's use cases go way beyond just user space telling kernel what to do like now There are cases that people are working on with the hardware when you plug it into the like PCIe slot hardware will tell kernel What it is about so the hardware will explain kernel its purpose. There are other User space the user space use cases where is like no kernel and vault what's ever but you also have a special verifier just for user space and special jit just for user space when one computer telling another computer potentially in like other side of the world like what to do and Comparing into the I've been asked this question a lot like how BPF compares to like web assembly or rust so the difference is like Sunboxed environment like wasm. It does not know what is going to execute or like from V8 JavaScript V8 engines They restrict what is possible because they don't know the intent they have to like create this boundary of what's possible and because of it like there are like Performance slowed down because of the all of the runtime checks in BPF Everything is verified statically and there are practical like no runtime check The only runtime checks are there that verifier cannot like statically prove the safety of the code But like 99% is verified statically. So the intent is known. That's the major difference between Sunboxed and BPF style environment Having said that I've seen people advocating For web assembly in the kernel especially on LWN. There are some people who just love web assembly my response to that Bring it in. I think there are enough room in the kernel to have web assembly or any other Sandbox Interesting fact here that in 2013 when BPF was first introduced it was called I BPF From internal and it took us like three years to eventually rename it to EVPF and that's how it's like no noun It was also extended many times over the last years in LWM We call this extension is mCPU of different like flavors We wanted to use three now in July we landed the mCPU before support and GCC joined the The team now GCC fully supports mCPU before and the latest patches that are coming like from Oracle GCC and being new tools developers they're going to land I think this week and They say in like this point they will have like full fetch a better feature parity with LWM Which I think is amazing achievements well, we'll see and I think this is probably one of my important most important slides in this presentation. I Believe that any big project or community got to have its clearly defined mission statement and mission statement for BPF is two things one is innovate and Second enable others to innovate what it means for like me in particular case that BPF as a like framework as a subsystem and satisfies my own thirst for innovation and It enable others to innovate what they love to do the most is to help people on the mailing list when they come in and saying well, I have this I want to do and I don't know how and He's just like makes my day to help them and when I see that they are not just trying to like Follow the example, but trying to create something new It's just like the best the best moment of being like a kernel maintainer for me And I think it shows in the growths of the community. This is a number of unique developers in the kernel for unique developers per month from 2019 until now and Blue line is external developers and green line and the bottom is developers inside Meta formally Facebook So you can see how it grows. So now we have hundred plus. Well, actually combining these two lines is about hundred twenty under 30 unique developers every month, which means roughly about 70 to 80 emails a day The team of develop BPF maintainers and reviewers have to have to process. So this huge volume BPF has roots in tracing It started the first hook it was attached to was like a props and you props then it was trace points and eventually what we call a Fenty and the facts it and modify return The important part here that People somehow when they see like BPF like flushes Somewhere they think like oh BPF can do everything. Well, it's actually like very restricted depending the way it's your touch it too In case of trace point it can read all of the kernel data, but that cannot modify none of it in other cases when it is let's say like networking it can Read the packet data and can modify the packet data and drop them, but it cannot modify the packet state and in other cases like security it actually like prevent system calls and act on gatekeeper on LSM hooks, so that's the difference that I think like very important to understand Also in terms of all like BPF can do everything or nothing a lot of it comes from Use cases that been Used now for example if you have an Android phone you open it up and you want to see How many gigabytes you spend like Facebook versus YouTube the way Android folks are doing it is through BPF program that Attached and they know where the traffic is going and this is how they count stuff If There is a if you ever used the Perth tool as part of the Linux kernel that hope helps you with observability There is a tool written for Python. We call it piper. It's kind of the same but for Python and it shows the stack of the Inside the Python programs and the way it's done because there is a Python calls and the Python interpreter and everything else in between so BPF is doing this special stack walk that is Specific to particular Python interpreter and leaves only the frames that are relevant for performance Analysts like without BPF in the kernel doing the stack work. This kind of analysis would not be possible Another use case that is also fun and it may not be like obvious is how BPF helps to analyze purely user-based applications. It's not attaching to anything to the kernel it's using what we call you props and You props is not attaching to a process is attaching to a particular instruction inside the file inside inside the i-node so use case to that you attach to i-node inside GCC preprocessor and then you type make and every instance of GCC that launched on the system will have this Uprobe inserted automatically by the kernel when GCC starts executing the trap Happens it goes back to the kernel and kernel BPF program executed and now we can capture for example How much time? preprocessor spends doing processing pound includes versus actually compiling the code and this was used by our friends at performance of the C++ application to understand whether this whole Metaprogramming in C++ with 10 places and and boost and all the fancy stuff takes too much time actually processing the headers versus compiling the code I've touched on networking a little bit here Networking was like very interesting as well First BPF hook that was added was for the what we call socket filters It was done. Well, this is the eye chart. So you don't need to like memorize it. There's no quiz at the end But the socket filters was done because classic BPF and TCP dump that was the main hook for the TCP dump to use and I'm sure like you use TCP dump and Because of the history of TCP dump it was the extended EPF and it was created we just not too well To make it Acceptable for the general for your own community because BPF was like revolutionary at that time We did exactly the same as classic BPF did classic BPF attached to socket filters for TCP dump purpose was that well You BPF will be doing the same like but it can do maybe a little bit more turned out this socket filter use case is Practically like unused like no one really used it the success of BPF in networking came from all the other use cases in particular From for this like security-minded audience the XDP XDP use case met us a lot Essentially, it's used inside Neta Facebook for DDoS protection it started around 2016 and back then DDoS as a servers all of a sudden became popular Really you could like on a black market by DDoS attack and well, it's True probably still still the case today that Linux kernel and like all sorts of layers they're full of security vulnerabilities so millions of microwaves and cheap Wi-Fi routers were hacked and And there were bots all all over the world that was selling this access to Hacked million of routers and it was it started as a service for angry Gamers who wanted to DDoS another game server they will just buy and there would be multi gigabit attack and Because it was pretty much easy to try to attack anything people were trying to attack Facebook and we've seen Many all the way to half a half a terabit. I think the worst was half a terabit attack That's a huge number to absorb for the any service even as big as because Facebook and back then in the 2016 We were using a kernel feature called APVS I forgot what this stands for but essentially it was a layer follower balancer But down in the kernel, but it wasn't like fast enough to Absorb this kind of attacks. So that's why the technology XDP stands for express data pass was invented and what it does we attach Bpf program right next in the driver. So as soon as so instead of going and APVS is actually operating Closer to the after to see Round AP AP area In in the layers of stack. So when we can install and run the PR program next to the driver We remove all of the overhead of the networking stack where it's allocate the socket buffer SKB so called then doing like early demux Sockets like etc etc Check summing that so running it in the driver were able to achieve 10x 10 times performance improvement Versus APVS and the main use case of Facebook at that time was Preventing and not preventing absorbing this denial of service attack this distributed denial of service stack which were in many many gigabits per second across the globe and Over the years the networking is cases grew a lot now Cilium is the Dominating plug-in for Kubernetes. It's used for networking connectivity and for security as well from the networking perspective They can analyze the traffic. They can see they can even do the layer seven HTTP style well at HTTP layer understanding where the where requests are coming from and block depending on the security policy and And recent development Thanks to keep you through is who is in the audience today is addition of BP of LSM In together with security hooks and CS cold it's now being used to prevent interesting security attacks and Back to my point that like different BPF programs can do different things depending what they are with There's what we call a program type and where they are touched like tracing Can read everything cannot modify anything in case of LSM it can read everything and can deny operations But that also what it can do it can sleep unlike all previous programs that we did in the past this one is sleepable and not only sleepable its Kernel terminology, it's foldable. So the program can do a minor fault and at that time if it's a minor fault when we say the User address that it's trying to read was swap to a disk Jordan will actually do all of the swap in of the pages from the swap into memory and will let BPF program to complete This minor fault and fetch the fetch the data. So this solved one of the concerns that where people had in the past in terms of That attack for example, like if the security if let's say the BPF program They're doing something security and attached to a system call and they're just trying to read user addresses in non foldable context if this address is just folded like the program The read will return a fault because the page is not present in the memory and based on that It's possible to like sneak Sneak certain accesses that the program cannot detect was sleepable. It's like more secure now unfortunately, the BPF LSM programs are not that I would say open sourced right now like in networking like everything that people do so far and in tracing like it's I would say in 90 percent of it is open source in public every networking program. I've seen from Silium and Issa Valent from Facebook, which is this DDoS prevention, which we call Katran This is layer for load balance or it's used by Facebook Dropbox. So you are the companies all open source cloudflare One of the biggest Protector of the DDoS as well. They also open sourced and the BPF program. So in the networking everyone is chipping their knowledge In and showing what they can do. We learn from each other. We like educate each other in a tracing site It's it's just as good like Brandon Greg post like has this hundreds of the Tracing programs that he wrote that do performance analysis here with the book of how to use them and how to develop the new ones In the security world It's not that well Well, not that good But I will I will get to I will get to this point later And another part of BPM that is inactive development now is a BPF next. It's even hard to categorize what it is for Like tracing and networking use cases. I would say there is very little from the BPF core perspective that is happening there I would say they are 95% solve and of course 5% takes 90% of the time, but they are pretty much done Very little development for this for the security side also very little development In terms of like new features where everything like innovative comes in from this BPF next The recent feature that landed was called Heat stands for human interface devices what it can do it can modify The program can modify your trackball to be seen as a mouse from the kernel point of view or If it is a broken mouse the program can actually fix it Between so the challenge there that like when the hardware is developed Driver is being certified. It needs to go particular kernel But then during like manufacture and defects or whatnot The mouse is actually not operational with particular drivers. So this quirks that can tweak How driver like sees this mouse are done with BPF program So time to market time to like be able to actually like sell this device with a keyboard or mouse or like any interface Is like much faster now because of this facility and three more features have been in active development is one as a scheduler Which I'm I guess excited the most about It's we call it scarce caddx or extensible schedule class here BPF programs are used to How to actually to actually perform the scheduling functions in a kernel? there is only like cfs and now evdf and These schedulers are good for general use case. They're doing as best as they could but there is always niche use cases and specific like custom applications that benefit from custom scheduler one of the examples is cloud environment There is a like if the hypervisor is done in linux and every vm is running linux What you have is like two schedulers one and hypervisor and one in every vm They just compete with each other the hypervisor schedule is not aware how they perform And the scheduler inside vm cannot do it Well, not aware that they actually like virtualizing how scheduler is doing it So what you see if you use just a standard standard like cfs Generic scheduler that part of the kernel you get like mediocre performance from the vm so Mediocre relative of course, but you can get more out of the vms out of the cpu if you're doing custom scheduling for example, you can disable the tick tick is Timer tick that happens most of the time in most of the configurations thousand times a second So instead of doing thousand times a second then interrupt in particular vm that's Doing something else a thousand times a second the schedule just disable the ticks And now like instead of like thousand times the vm enter vm exit, especially with all the security consideration They're quite slow. This is not happening. So performance performance is achieved And this was my introduction to bpf and now the history and now go into the next step. That's uh, I think more fun on privileged bpf so I do explain what it is got to start from very very beginning It bpf as an instruction set was invented 30 years ago and now we call it classic bpf It's used in tcp dump and secomp and it was all because like tcp dump only needs networking privileges to it's a cap net draw to Read packets. It doesn't need any other capabilities to read the packet So the classic bpf was and still is is completely unprivileged the bpf followed the same idea and Turned out as I was saying the socket filtering Use case was Completely unused. They like in lots of bpf was now nine years in the kernel. It's actually next week. There will be bpf birthday Exactly nine years ago The facility landed in the kernel exciting stuff But in this nine year socket filter was barely used and the only other Program type that can be unprivileged that was added over the years is this C group sqb, which is similar. It's like socket filtering but Scoped by the C group abstraction Everything else is like was always always required through it So all was fine for the first couple years of bpf existence until in 2017 Young horn from project zero wrote this bit of code Anyone remember this? okay, cool and It's it's it's actually the snippet from he actually used three bpf programs as part of his exploit. This is most I know interesting one since it's actually doing Preparing the code for speculative execution and for those who had fun back during the christmas of 2017 at that time This was eventually became known as specter. We want specter. We want attack, which is a bounce check bike pass quickly tldr is most of the modern cpu's arm into lamd. They all do in speculative execution and when cpu's execute speculatively they affect The actual like execution is canceled like if if they're like mispredicted and executed something that shouldn't be but cpu's cannot by now still Cancel the side effects of the speculative execution and for the spec we want this is still the case today there are like many specter variants found later and Bugs in speculative execution and the cpu but spec we want is not fixable. So it's still Software still have to do the mitigations for it And how Mitigations were proposed The way part of the vendors looked at it when it was disclosed. I think in december 2017 um kind of when they said well That speculative execution speculative fusion is all bad like it's it's leaking the secrets It's reading the memory. So what we can do we have to stop speculative execution and the proposal word from everyone from intel from arm was Funny at the time They're just like yep If there is a branch with possible branch miss predict here in speculative execution that can lead the leak their secrets Just add lfence and lfence is a speculation barrier. It's stop execution microsoft did similar they tweaked the visual studio compiler to opportunistically insert lfences all over the code and this is how the Windows were recompiled at the time and everyone said yep. Yep, that's good. Yeah use that and that was the request from the vendors to do the same in lemux kernel In the linux kernel we decided to look at stands because like lfence look like a pretty big hammer So the idea was that we discussed over several months is instead of stopping the speculation we will manage the speculation instead of Telling the cpu don't speculate any further. We will just steer the speculation into being safe Uh Many months later in January we finally are like well convincing linux that it's a good idea Was easy enough and was linux on board. It was extremely challenging to convince Intel and It took like six months to convince arm that this is even like feasible that this the concept of the Steering the speculation is the concept of the masking at the end It was masks of different kind when there is and when there is an access instead of The speckled like certain the when cpu executes speculative. It's also execute like masks Speckled till we as well and this masking operation on the address Bounds the Range where speculation happens to the only addresses that the cpu supposed to read On the implementation side, it's it was a big deal in terms of amount of work in arguing On in private and in public at the end. This is now known as this ray index knows speck macro that are then underneath doing this three-value looking Or operation and shift which is nothing but mask but it had like a huge effect for the kernel and for Mitigation of a speculative execution Right now we have 240 cases Lastly counted in the kernel that use this array index knows speck macro and some of them are used in very hot toss of the kernel One is a file description table. So every access like pretty much everything in the linux is like as through the file descriptor A file descriptor is an integer and you pass the syntax into a kernel saying like do this And file descriptor table is limited differently for every different process. So it's certainly an attacked vector That's why to access this index to a file file pointer inside the kernel This array index knows speck have to be used and Since every operation like even read write of the socket receive message write message using fd using the index This is executed like many millions time a second on a new given kernel So now we imagine if linux didn't resist this L fence L fence L fence push And we had this L fence everywhere. So now we would have 240 L fences around the kernel and enabling mitigations speck speck spectra v1 mitigation in kernel would be Well, yes, how much slower it would be But as I said like BPF was used as part of the exploit And of course we had to fix BPF as well and on BPF is that because it's all like assembly Strictly strictly type assembly. We cannot use the macro. So we've been using this So this is how the code looks when we generate access load from the map instead of just It in the compare We're doing compare and mask. So that's only different. So BPF programs when they have to be when they Well loaded as unprivileged. So this is still unprivileged BPF When they are unprivileged we add and instruction to mask the success to steer the speculation to make sure expect v1 attacks are not prevented well mitigated But then Short time after I think it only like we were happy only for a couple months young horn came back with the specter v2 attack Which was called branch target injection. Yeah here it's Because the world was already like steaming From like BPF is like must be really scary in this case v2 BPF was also used as part of the attack for Well good and bad reasons, but here the interesting part that Not very people are not many people know that the kernel actually not really involved There is no very far involved to attack this to use BPF as an attack and in this case BPF actually completely the BPF instructions are in user space The only thing that's being used as part of the attack is Interpreter that's inside the kernel So the whole interpreter the text the code as part of the kernel the interpreter itself Is used to speculatively execute the instruction that will never load it to the kernel It's BPF instructions in the user space and speculative execution jumps from user space to the kernel and So effectively the presence Of the interpreter in the kernel was able was possible made it Possible for Jan to demonstrate to create this exploit for the branch target injection And of course the bug itself is in the CPU and now this whole IBRS and then later EIBRS All of this stuff was done to like prevent it So the the actual bug in the CPU but BPF interpreter was used So to mitigate that like we cannot do anything like kernel not running Like there is nothing we can add to the kernel to prevent part of this attack The only thing we could do is to remove the interpreter completely from the kernel And BPF is only one of so at that time we were arguing that look guys. Well, there are actually other interpreters in the kernel There are three more at least And everyone was like, yeah, but BPF is interpreted. That's one was used. So the other some I'm like, okay, but The kernel my point here that kernel is not fully safe today And it's just like people didn't work hard enough to demonstrate how other interpreters can equally be used As part of this like speculation attack But what we did We added this config jit always on and what it does is completely removes interpreter from the kernel and on the jit Just in time compiler from BPF to x86 remains that prevents this Part of the part of the attack. So it's interesting also that the how the Perception of the jit changed like back in 2011 like jit just in time compiler was seen as Something bad that the kernel should not be doing things things like that in 2011. There was a known attack called jitsprame where Parts of the Because x86 is such a interesting The instruction set you can encode instructions as part of the Constant And if you just like if you have let's say move instructor move some register some constant move register some constant You have a bunch of them if you look at them normally they just like nonsensical You're just moving the constant But if you instead of jumping at the beginning you just shift it by like a byte You just shift the whole thing by byte But because of the way x86 encoding works you actually will have completely different set of instruction and all of them will be valid So that was used as part of the attack and this is how jit jits can be used to insert Gadget gadgets into a kernel where the kernel now will have a Fixed set of instruction that attacker wanted to have so this is jitsprame So that's long ago like it was fixed Back then there was a randomization and a trap insertion like all over the place Then it was even We added what we call a constant blinding like everything that can be Attacker controls like any kind of constant we randomized was randomized inside before with jits so So it's like two layer defense though after the first after the first initial fix of like randomization And trap insertion into the level and they were no known attack Like security minded folks were still worried that's something like this possible. So we added the whole like constant blinding facility so And now because of the specter v2 like jit is now the only way to run anything So well it has to be on but what we also find out is Red polin that's used Red polin As part of spec v2 mitigation When cpu doesn't have a hardware fixes It creates significant overhead for every indirect call and having a jit Is actually very beneficial in terms of performance because Red polin mitigated indirect calls that are super slow because they guarantee to mispredict like guarantee to mispredict So pression eye cache and everything um Now jitted as a direct calls which recovers lost performance due to red polin uh, we went further In 2019 a year later after the first spec v1 v2 Fun, we've decided to get smarter about what verifier doing and this So daniel did a lot of work And I believe now so the key part here is to prevent such Zack there was this young discovered another way to craft the fancy bpf program into the coral The verifier was changed to like the way the verifier works is it analyzes It doing symbolic execution of the code to analyze all possible paths all memory access And this how knows that all accesses are actually like safe access in known memory in speculative execution what daniel is at this for the very for the verifier to To simulate speculative execution as well so i'm uh, my point is The very what the the analysis that verifier does is like unique in the industry. No other static analysis too Can do this kind of Speckled analysis of the speculative execution and not just like guessing like compilers like lvm and microsoft visual studio They analyze abstract syntax trees that analyze the ir They analyze the cc++ whatever program and they can Guess where the speculative execution would be and they have to be a conservative the microsoft was in sort of offenses back then But verifier because it's doing this symbolic execution And symbolic execution of speculative path knows exactly like what what and where the speculative execution can happen and can Meticulate which i think is super cool Then there was a spec v4 as well and again the pf was used as part of the attack the mitigation turned out to be super simple This is all it took 10 lines of code to sanitize the stack in this particular case um the root of the attack was that pointer versus con the cpu because the speculative execution can Can we'll assume that they actually trying to Load and this is a pointer that's pointer to a memory whereas Tricky program can actually use the constant there and replace it with something else So we had to add this trivial mitigation trivial mitigation says just instead of like accessing the pointer is zero it out So one extra instruction at it very easy, but It's not the fun again and the column didn't last that long. There was specter and g New gen i guess And they they just like keep coming we were like bombarded by various like researchers and bi-weekly calls with intel about like oh like we think bpf might be involved again and because like In nine years the bpf existed only two program ties were ever unprivileged and both of them were extremely niche use cases there's number of users less than number of fingers on a hand We're like it's just like not worse for bpf community to keep Mitigating all of this was like very far. So what we did we added this bpf unprivileged default off cscontrol Important here is the default y so Every distro every kernel from there on Disables completely disabled from privilege bpf. So you cannot so Pretty much. Yeah, we're just like yeah, let Let security researchers in the speculative execution use something else Because we're tired of being of bpf being a highlight and always in the spotlight of the security research We still have the cscontrol knob That just for some people who like really want to do unprivileged if they like don't care about security We're like fine. Here's a knob for you. You can still turn it on But the recommendation is to turn it off. So that's that's what it is now So that's been the case for plus well in years five Oh And because of the so in the past was this unprivileged So come back for a second was unprivileged and root. So root only root everything root Like most of the program times group and to unprivileged and during those years We had constant Request from the community like can you make this map unprivileged? Can you make this program type to burn privileged as well because like I don't really need the full root But you're forcing me to be full root with the full root privileges just to load my bpf program And I know it's going to be like networking program. I know that's networking program can only like access the Read the packets and drop the packets. I don't want to be a full root So that's why We've created this camp bpf. So effectively what we did was split this whole root capsis admin space into Different domains. So First cap perform on was it reduced it was Not part of anything bpf. It was part of the perf subsystem Where it would allow to install and use perf events that can read effectively arbitrary memory So if you have cap cap perform You don't need any bpf. You can read all the memory in the kernel So that's and we added cap bpf that allows to do all of this stuff At least listed here. Effectively it loads all kinds of bpf programs and can execute them But in addition there's two lines here important So if it is a tracing program and remember I said like tracing programs tracing bpf programs that attach to k-probs Your probes can access all the memory if you load in tracing program with jasky bpf It won't be able to do so it will be like denied access to All of the like helpers and functions that allow it to read it. So it will still be a tracing program, but it will be Basically, basically useless The user process to load tracing bpf programs We need to have perf mon cap perform and cap bpf to be able to actually Load useful tracing program and similar. So cap bpf is just like a core bpf features If you're like imagine overlapping circles with cap perform. This is tracing bpf stuff And similar overlapping circle on the other side with the net admin and cap bpf allows to load networking This networking programs you don't need access to like all the kernel memory. That's why you wouldn't even bother with cap perform But so like as I said like we needed to restrict the route but In kernel now we have this like perception challenge of like what cap bpf is actually really trying to do bpf is not namespaceable in general just reading kernel memory means like reading all kernel memory You can and say like oh load the program It will be only accessing memory of task within this container Is just like not Possible like for the networking we can restrict it to particular net and as the particular Network in namespace, but that's more exception exception from the rule. So cap bpf is really capsis model like So when you think like well Capsis model is what allows you to load kernel module. So if you have a kernel module You allow it to load it effective allow it to read everything Including like rushing the kernel but so cap bpf not wasn't trying to do this obviously like there is a whole verifier, but verifier is not 100% bug free there are always bugs bugs in it So but for the normal model of like capabilities when people look at capability like we're saying well cap bpf should do this And in this list does it say somewhere here that cap bpf allows to crush the kernel? No So well if you can craft a program that doing like a the memory leak or whatever It's immediately cv people discover some bug and say like yeah, so And that's a problem Recently there will be an article the title the bogus cv problem talks about Exploit that was a bug effectively small bug in curl user space tool that was fixed three years ago in 2020 And somebody decided that and the bug was in the integer overflow Of in the command line you pass a too big of a integer it overflows and the curl thinks a smaller number versus but this people Thought it's a good idea to file a cv for that and cv was filed it got a severe rating and Immediately like the curl developers were like well, you have to fix this bug you have to look back ports They were like going crazy. So the author of the curl wrote this blog post Like saying that well, this is security circus nothing else no other words to describe it and At the end, I think Ubuntu now is saying that this is not a cv Uh and it's disputed. Yeah an official term now that is disputed cv. So we see this uh behavior no, well behavior in the bpf subsystem as well And not the single time. So this is recently curl heated. We see this All the time what happens like people look for the bugs with that fixed and the verifier especially they just can grab for the fixes tag And how we find out somebody Posted like a patch request to backport to some like all the kernel He said look, this is patches that you guys need to backport to kernel whatever five something Because there is a cve to it. We're like wait what and we'll look at the bug We will we fixed it a year ago. That was like one of the verifier bugs we have many So we fixed one of the bugs a year later somebody create the cve for it and And of course, yeah backport cv is like so bad And it's even it's even worse some of this like some of the blog posts we see people just take Self-test that we wrote to make sure that the bug is fixed and then wrapped into their exploit and say look If you have an old kernel, here's the exploit I can show you how you can do this like security like attack on the fix that was made like Years years ago so Yeah, this is still happening. I think the last case was even like last week. I saw like cve is being assigned for some Bullshit and why this is all happening is because it said verifier is not bullet bullet proof. We're trying but The worst and but bugs happen. It's pretty complex bugs happen. The worst I would say of all bugs was this explicit 32-bit subregister bounds checking that the commit I'm referencing here was landed in a kernel three years ago and took 12 fixes over the last three years To we hope finally fixed it. So the commit is not Well, it looked like not a rocking science and multiple people reviewed it. What is what it's doing is Uh, the ppf has a 64-bit subregister and a 32-bit subregister is just like it matches to x86 and arm 64 instruction set They both have the same concept Before the verifier was only tracking the bounds if you write something bounds Meaning like what is the range of the value inside the register of integer value? Let's say it's from minus two to five and it's tracking in for 64 bit number What it did is started tracking it for a 32 bit numbers as well We'll say you're writing only 32 bit subregister. You're writing number five there or like not to survive but the range if it's a constant It's easy but if it's a range is getting like complicated when you have the range You know that this range like within in 32 bit is this range and 64 bit It's other range and then you're doing some like operation like plus one so to know to Answer correctly what would be the range of 32 bit and 64 bit later? Turned out it's a pretty complex problem We thought so after three years maybe we fixed it. So but what we what the industry is doing about it There were three different papers from three different universities over the years where people Focused their research on exactly this problem how to do the range checking and not only how to do them correctly like I think the first if I'm not mistaken they have the whole tool that Test all possible like I think it is exhaustive search of all possible combination and I forgot which one of this paper they actually go full like 10 pages paper where they do Mathematical proof that analysis that they verify doing is actually correct then another Paper is doing similar for TNAM. We have Invented this what we call 10 ternary in number is a number that can be zero one or don't care And we also have this mathematical operation on them. So this was also full multi page Proof that what verifier is doing is correct. So I think this is pretty cool. So the university is doing it There is another interesting Development that is happening because VPF is ubiquitous It's parked creation of many security startups and some startups are doing it In peculiar way, so This is the quote from one of the blog post of one of the startups not going to name which one This is not the marketing conference So they call it offensive BPF the whole article First describes that BPF has this offensive capabilities You cannot talk the kernel this way they describe everything and then the same to prevent this attack You got to use our product and it's using BPF underneath. So at the end they say like BPF is bad Use BPF to protect from BPF Okay Let's keep your slide. So lsm like BPF lsm So like as I was saying the run not that many public open source examples of how to use it This one is from system D. So system D is using lsm file open hook And they use case used to deny access to particular file system if you follow recent Kernel summit mainly is discussion over the last like several weeks. There is a whole debate between file system maintainers Everyone including linus XFS like Whole kernel community of maintainers like arguing when and how to remove file systems So system D was not waiting for the discussion to end many years ago. They said this file system is bad This file system we don't trust and this is how they do it in system D As magic is a magic of the file system You can see this is a whole court like I skipped like few lines here just to fit it on a slide But they check well if file system is this And it's part of the in allow list of denial is just like deny. That's it. So Another interesting security check that system D does is very similar to allow deny of the file system based on type but allow deny Network interfaces and here it's very similar from like security perspective But it's not using lsm hook in this case. It's using this group networking hooks, but The code this is the whole court here. I didn't need to like remove any lines. This is the whole court copy paste from system D Check if index and if index is one of in my like allow deny then That's it. So that's the whole program Here it should be very easy to like understand and to write And typical lsm example also looks like this. It will be like file open or bprm creates Or for exact or for a file. Just check something read it and then return zero So the security is so so the important example here from the system D this and this one is not only So that bpf Users not only use lsm hooks. They use all kinds of other hooks in for security whether it's like networking as bind connects and receive But they also using tracing hooks that I read only from the kernel perspective But they're using them as an additional aid to get more information to understand what is like going on in the kernel and based on this collective decision They later you use lsm stuff And a few security recommendations This one is more it's real. So if you can type this stuff and it will work What it does in old kernels now, there is a fail for fault injection facility You can type the sequence of commands and it will completely prohibit any of the bpf system calls So if some people are extremely worried about bpf like this startup that says bpf is bad don't use bpf or use bpf It's just disable it this way. But realistically, I think the true security recommendation is kernel modules Should really be written as bpf programs the advantage disadvantage of pros and cons of kernel modules is like arbitrary c code You can crush anything you can sex anything. So it's a plus obviously Unlike bpf that restricts you heavily But it's can crash whereas like on the bpf side It's safety is built in and they're portable Portable is also not I would say People who'd never worked in the hyperscalar In a big cloud don't appreciate this portability part. It's for example in meta fleet we have several main versions that could I installed on 95 of the fleet, but there is always long tail. We have hundreds of the All the kernel versions and developers of bpf programs, they don't want to Massage their program for hundred different kernels For the kernel module, we would have to do it kernel module We have to be done for a particular kernel version. We have to be recompiled for kernel version Whereas in bpf we have this facility called compile once run everywhere The program once it's compiled can be loaded and auto adjusted on the fly when it's loaded a particular kernel So this portability Of the bpf programs is super critical for the hyperscalar in big cloud So I will skip this and this And since I'm out of time Statement here, so we believe that the flavor the bpf flavor of c language Is a better choice for the kernel programming this extended c that we did With a safety built-in is more security and that's the true security recommendation I had That's it Any questions All we're out of time for questions Okay. Yeah. Yeah, I'm I'm here was today. So grab me at any time. Happy to chat