 Good afternoon everyone. It's my great honor to share our work, HotBPF Plus Plus, which will provide more powerful memory protection than prior work, HotBPF presented last year. This is a joint work with Dr. Yue Qichen. Unfortunately, he cannot make it here today because he needs to attend a university commencement ceremony. I am Zicheng Wang, currently a PhD student at Nanjing University and visiting University of Colorado Boulder under the supervision of Dr. Yue Qichen. My research focuses on operating system security. Since we are in the Linux security summit, so there is no need to further emphasize the importance of Linux kernel security. The vulnerability lifecycle consists of mainly four steps. The vulnerability introduced, discovered and patched, merged and developed. There are various techniques available to protect the Linux kernel, for instance, on the left. Tools such as cell color and state analysis have been developed to discover kernel vulnerabilities. On the right, after a patch is available, vulnerabilities can be fixed. However, there is a time window of at least 66 days between the discovering of vulnerability and the patch availability. During which the kernel is left vulnerable and unable to protect itself. We haven't seen much practical protection during this period. This is where HubBPF++ came in. Our work is quite straightforward as HubBPF++ generates EBPF programs automatically to prevent triggering errors without the need for patches. Before we describe all the technical details of our work, let me first present its overview. The vulnerability on the error site is triggered by a malicious process through a system call. And after the attacker triggers the vulnerability, the entire system will be compromised. Our work HubBPF++ can prevent the vulnerability on the error site from being triggered by loading a BPI program in the kernel space. The BPI program will be executed every time before the error site. And check if the current context meets the triggering condition of the vulnerability. If not met, nothing happens, the kernel will keep functioning, but if the error condition is met, the BPI program will help kernel skip the error site and jump to the return of the function and send signal keel to stop the malicious process. After that, kernel will return to the normal condition and keep functioning because BPI++ focuses on the accurate error site rather than the entire kernel. So it's lightweight and efficiency. Before I dive deeper, I will introduce the input of HubBPF++, a report generated by SetColor. SetColor is the most widely used kernel father developed by Google, which constantly generates randomized new inputs to detect the crashes in the kernel. And the kernel is compiled with sanitizers. On the left is the SetColor dashboard. When a corruption is detected, the sanitizer collects the information about the error and reports it to the dashboard. We can observe that many types of bugs are reported. On the right is the SetColor report. The bug report input a list of information such as the type and the position of the bug, and the code tree that illustrates how the bug instruction was called. In addition to this information, there may also be the proof of concept and the kernel configurations to reproduce the crash. This data helps our work to identify the error site and construct the triggering condition. HubBPF++ is built on the EBPF, the extended Berkeley packet filter framework. It's an internal virtual machine that allows users with root privileges to load programs into the kernel space. The BPF framework also includes three essential components, the verifier, the JIT engine, and the help functions. The verifier is responsible for verifying the installed BPF bytecode and ensures the memory safety, determination, and information flow security. The JIT engine enables EBPF bytecode to be executed just in time for achieving native machine performance. And the help function provides expressive interface between EBPF programs and other kernel subsisms to extend the EBPF feature. EPF also provides hook mechanism based on the K-Probe, which enables HubBPF++ to attach arbitrary error site in the kernel without recompiling the kernel or rebooting. EPF++ leverage these powerful capabilities of the framework to detect and prevent the kernel vulnerability with high efficiency. Next, we'll introduce the workflow. HubBPF++ takes bug reports triggered in the Sinitizer kernel. We can see the laptop. As input and generate the error prevention EBPF program as output, and the workflow includes mainly five steps. The first step is to extract critical bug information from the report. And the second step is to map the error information from the Sinitizer binary to South Code because there is no direct mapping from the Sinitizer kernel binary to the runtime kernel binary, so we have to use the South Code line as a bridge. And the next step is to map the error information from the South Code to native binary. Where we will pull BPF program out and detect if the bug triggering condition is met. Both the translation process from Sinitizer to native and native to Sinitizer based on the kernel debug information, the wolf. And the fourth step is to construct triggering condition as a native kernel binary, such as check the one-bit related registers and memory address. Finally, HubBPF++ senses all the information about and generates the BPF prevention program. We will use the example to describe the workflow. This working example is a heap or a slab out of bound vulnerability that can lead to a serious security issue in the kernel security app armor module. The triggering condition for this vulnerability is shown in the line 645 where the length of the array ARGS is defined as size, but you see the index dark up from zero, so the last index can be written as size minus one. But in the line 645, the index size is written, so apparently there is one bit of overflow. Attackers can use this simple overflow to get the root privilege and crop the entire system. And HubBPF++ prevents this error from being triggered by checking if there is an overflow at line 645. If the triggering condition is met, the BPF program helps the kernel skip the line 645 and it jumps direct to the line 693. Returning the error code, this makes the kernel believe that the function has failed and it kills the error handler in the cost stack until return to the system call. Next, we will use this one bit to show how the BPF program generates how the BPF++ generates the BPF program from the bug report. To generate the program, we need to locate the error size in the runtime kernel binary where we can, as shown in the workflow on the laptop, also we have a report collected from the same time the kernel as I described in the technical background. There is no direct mapping between the same time the kernel and the native kernel binary. Therefore, we use the south code line to bridge the two banners. We can use regex to extract that the one bit is a slap out of bound error and triggered as the address of the function app-armor-set-proc attribute plus hex 116. We first locate the error site in the south code line using the same time the binary devolve info. Then we use debug information to locate the error instruction in the native kernel where we can check if the error condition is met. The results show the probe address is function app-armor-set-proc attribute plus hex 8f in native kernel and number 0 will be written to the address register RSI plus RDI times 1. After locating the error site in the runtime kernel binary the next step of hotbbs-plus is construct the triggering condition to detect if the vulnerability is triggered. For this case, the triggering condition is when the pointer address is out of the legit range of the object it refers to The pointer referred address is calculated using the register RSI plus RDI times 1 and the RSI is the base address of the pointer and the RDX is the opposite of the pointer. To get the legit range of the object we implement new bpf-helper function to get the start address and the length of the object when we have the base address stored in the register RSI. The last step is to synthesize all the information and generate the bpf-program The bpf-program is probed exactly on the app-armor-set-proc attribute plus hex 8f just the way presented and every time when the CPU is going to execute the instruction the bpf-program will be executed first to detect if the triggering condition is met. We present the sample code of the bpf-program First, the program obtains the object's runtime size and then check if the error condition is triggered and last, if the triggering condition is met the program skips the error site and returns error code to the caller set and sends a signal keel to the vulnerability user space Proof of concept program and we will show a demo how the bpf-platform solve this problem solve this vulnerability The demo is on 5.15 kernel version and the vulnerability CVE-20166187 is imported to the kernel because it's too old but it's still working First, we will run the proof of concept program on the sanitized kernel to corrupt the kernel as we presumed Is it start? Okay, it's start First, we run the proof of concept program on the sanitized kernel to corrupt the kernel as we presumed As we've shown here, it's a set of caller reports we build a sanitized compiler option on it and here is the error type and the triggering site also we can deduct the triggering condition Then we start the hubbvf++ to prevent the program from being triggered On the left side is the proof of concept and on the right side is the bpf-prevention program We start the right side bpf-program first and then we run the malicious process We can see in the left the malicious process is killed And on the right side, the bpf-program has detected the error is triggered After that, I run areas to check if the kernel is still keep functioning Yes, it is a simple demo and the kernel is still keep functioning So in conclusion, we think there are five advantages of the hubbvf++ system First is protect the kernel when the patches are unavailable and the hubbvf++ is end-to-end system The bpf-program is automatically generated from set of caller reports and it has no interrupt to the kernel execution The bpf-program can be installed at the runtime on the fly without recompiling and rebooting the kernel and it has negligible performance overhead and memory overhead This result we will show in the following slides and we have to say it is easy to be extended We can use it to support policies for other vulnerability types Next, we will show some technical details Our work hubbvf++ adopts a layered architecture including the underlying error-independent mechanisms and the overlay error-dependent prevention policies The underlying error-independent mechanisms provides tools and mechanisms to support policies It includes the report processor, the scientist to native mapper and the checkpoint restore and analyze First, let's talk about the infrastructural mechanisms because the report processor is very easy it's a passive script with regex We will start from the scientist mapper The key goal of scientist to native mapper is to translate the error site from the scientist binary to the native binary and through the south core line as we described earlier Normally, we can use the address tool line to map the binary address to the south core line and south core line to the binary address But address line is well known to be inaccurate as shown in these examples One address can often map more than one line of code and the one south core line can also map several instructions So the hubbvf++ only needs one specific address in the native binary The rest are false positives to block our work So we present our method to improve the accuracy of the transformation flow Our method is to use another translation flow for cross-checking Except the address to south core line and debug information also includes variables to register and register to variable mapping So only the instruction and the code line shares the same operands will be correct others are false positives as shown in the figure The variable ARGS is stored in the register RSI as this address range Similarly, the variable size is stored in the register RDX So only the line 465 matches among several options We have a statistic on more than 20 error size according to the result With our two flow cross-checking the false positive number will reduce from 2 to 0 per case So our method can feed the transformation flow The next component is online layer is checkpoint and restore How do you use this How do you use this component to skip the error size recover the register and kill the malicious process However, in the component we choose not to skip the error instruction because it may break the consistency of the register context As we can see the red red font marks the changed value The context of the registers between instructions are very complex as we can see in the line 645 But we found the easier case when skip a call instruction the only register changed to the call E is the REX and REX is used to store the return value So we choose jump to the return instruction and skip the entire function to keep the register context consistency We set up a checkpoint to require the register as the function entry and when the error condition is met How BPF++ will restore the register context as the function exit and assign error code to the REX and keep the register context consistency The rest of the function will treat the rest of the function in the cost stack will treat this function as an execution field and leverage their error handlers to keep the system functioning Except for the function skip there are also about 500 pairs of operations as the function entries that need to be recovered in the function exit such as spin lock, the locks the memory allocation the global memory modification device ready and so on So on the right side this function will unlock and allocate memory as the function entry and unlock and free as the exit So the BPF++ want to jump to the return We have to unlock the spin lock Otherwise the kernel will be dialog There are related work on the kernel status recovery so we don't see more details here Our work will temporarily focus on solving the lock problem Other pairs may not corrupt the kernel We also test the robustness of our solution the temporal solution and the kernel keep working on daily tasks for weeks and nothing serious happens the kernel will keep functioning After the mechanisms, let's talk about policies The overlay error dependent prevention policies are designed based on the triggering conditions of each type of vulnerability How BPF++ provides the templates of the BPF program for automatic generation I will show more technical details to help better understand how to extend BPF++ to other error types To extend new error types how BPF++ need to need templates of the triggering conditions for new error types and maybe some new BPF helper functions to support the template For example, we used autobahn error in the previous slide The triggering condition is autobahn mainly include two styles as we can see the triggering conditions in template 1 and 2 The first style is the pointer plus offset style It is the same type of the CVE we just showed before just showed previously The address has to be in the pointer's legit range And similarly, the second trigger in the template is memory copy style There is a destination, there is a source and there is a length Also include something like memory side as string copy string side To obtain the range we implement getBufferStart and length helper functions We obtain the legit range of slab body, Vimalock objects at runtime and for the rest conditions such as global static objects or arrays on the stack we can statically get them because their size is statically allocated as a function start and will not change in the runtime Extending to other types of errors adopt a similar method So we first need to design the template according to the triggering condition and also design related helper functions to support the template Now BPL++ can support these policies out-of-bound access as we just said and the use out of free, the integer overflow data-race, uninitialized memory web pointer access and user memory access For integer overflow we can simulate the calculation operations in 6-bit operands because most integer overflow we see now are 32-bit operands and the data-race we can use a PV operation to record the usage of the RISC data and once we find the data-race we may kill both RISC processes and for the unusual memory we can require the memory at the creation site and as a use set we will make it a comparison Next we will show a more complex policy the use out of free Although most cases are straightforward use out of free one bit is very complex than the out-of-bound and other error types because it has no three forward solutions It's a temporal memory corruption the triggering condition of use out free is that the pointer is a dangling pointer dangling pointer is a pointer referred to a freed object and the dangling pointer is dereferenced by the attacker to read and read the freed memory so we can use a quantity and sweeping method to prevent the dangling pointer from being triggered The policy is here first we quarantine the freed object until no pointer, dangling pointer exists we quarantine it by not freed but we quarantine it's address and we periodically sweep the entire physical memory for dangling pointers to confirm there is no dangling pointer exists we also design optimizations that how do you apply a plus sweeper can only sweep the certain slab catches because we have researched on several errors in most cases the dangling pointer will be stored in another slab object the rest of physical memory except the slab catches will not store the dangling pointers so this is the optimization after the technical details we will evaluate the performance our work has a good performance overhead because it is lightweight and only focusing on the specific error condition not the entire kernel so we designed experiment to make a full evaluation first we collect set color reports and CVEs especially for the CVEs we also need profile concept programs to reproduce the error condition report so we collect the profile concept program and write it in the same time as the kernel then we evaluate this task including the performance overhead the performance scalability and optimal physical memory sweeper sweeping period and range for the UAF sweeper this table shows the result compared to the vanilla kernel the result includes the slab out of bound the page stack global out of bound and the use of the free including two case the optimized sweeper and the full physical memory sweeper and an integer overflow the overhead is negligible result from minus 3% to about 4% and the benchmark section includes OS core primitives such as OS bench and the perf perf bench they calculate the overhead of system calls of networking and so on these core primitives and the CPU and the calculation intensive benchmarks including the open SSL the mp3 encoding and GIMP the L intensive workload including the circular light database the very very good stress for the networking L intensive the rest are common server tasks including git, kernel compile compression, Apache and NGX the average is shown in the circle I think the average overhead is acceptable only use after free is a little high because we have to write a sweeper periodically but overall it's practical next we also evaluate the scalability the scalability means the performance overhead when multiple BPI programs were executed at the same time in this figure we can see each line in the figure legend represents a benchmark overhead when we apply more than 20 BPI programs in most cases the worst overhead is less than 8% but there are special cases here it's about 20 or 30% it is because there is a vulnerability maybe it's a global it's a Wimlock out of bound it hooks on the NAPI interface NAPI pull interface of the network interface which is called very frankly under the high concurrency web server requests so high concurrency benchmark like Apache and NGX behave not so good but the rest benchmarks are still acceptable we also evaluate the optimal sweeping pure rate and range in use after free whole physical memory sweeper the range is from 128 MB to 512 MB and the pure rate is from 1 second per sweep to 8 seconds per sweep apparently we have circled the optimal performance the 256 MB every 4 seconds have the optimal overhead overall we are very honored to share our work to the community it's my first time to attend How BVIP Cloud Plus is a kernel protection before the patches are available and we have achieved such contributions we have achieved a set call report processor to extract debug information a setting type to native map to translate the error condition and error site a checkpoint restore analyzer to keep the kernel functioning and a set of policy templates and helper functions for various vulnerability types finally a thorough evaluation of the overhead and scalability I hope the community like our work like our contribution and also we designed this Q-logo generated by mid-journey AI that's all, too fast any questions, thank you Hi, you said you identified I don't remember how many it was but function pairs that need to be unwound but it sounds like you said you're only handling the lock-on-lock cases right now Yes and were you looking also at data structure invariance that might get violated halfway through a function or is that still also future work? Yes It's a little ad hoc here but it's not our main contribution Is there a way for users to modify the generated BPF programs or is there any support for that? Yes, we can We can say more guidance on this ad hoc checkpoint and restore Thank you This might be a similar question The example you gave in slide 8 with the string truncation going past the end can you add other other actions besides just the return for example in that case work has happened in a function and you get to a point and you're like I shouldn't do this but you've left the string unterminated can you, before the return this is sort of like unwinders right to the memory right and all to one below the size so you have a terminated string before you then return an error sort of like a custom return handler I guess Sorry, I'm not quite clear Do you mean here? Yeah, if you look at slide 8 Is that what you mean here? Yes, in slide 8 Hi So, yeah, here like some work has occurred before you return before you hit the args size equals null Are you able to add additional things in these types of cases where you say well what I want to do before I return an error is say args size minus one equals null and now return so you don't leave some potentially fragile thing in a bad state I mean it's similar to the unlock and all these other things so can that be done in an arbitrary fashion right now or is there work to be done there? I have understand that's a good idea for them to say but we want an automatic tool but it's about a little ad hoc I have tested if we just do nothing but return to the error return instruction and assign an error code it will still keep functioning but I think it's an ad hoc, it's temporal it's not a full recovery we have Right Hi In the slide you showed where you showed the actual output BPF code I believe you had a helper in there BPF setregs or something to actually set the return value If I'm wrong, that to my knowledge is not a standard helper do you have to load a kernel module to define that as a new helper? Sorry How do you actually set the return value do you use an existing BPF helper or do you have to define a new We have to define a new helper function BPF currently denotes about to modify the kernel very rapidly so I have to add a new helper function such as setregs and contacts to make it to assign an error code Okay, and you load a kernel module to do that take it or do you patch the kernel how do you do that exactly? No, we just direct it as a kernel Oh, you just directly patch the kernel, okay, gotcha Yes, BPF helper function cannot be extended through modules because you have to modify some kernel code in the BPF maybe the directory under the kernel directory Thank you Thank you for your great presentation I like your idea, but I have a question about your evaluation part So for I remember there are 16 errors and the overhead increase allowed So I wonder how you know figure out the root cause of those overhead Yes, let me make a simple explanation The error condition and the root cause of a one-bit are different The triggering condition is where the error triggers For example, integer overflow The trigger condition is the instruction that triggers the integer overflow but the actual root cause maybe is earlier in the call chain there is less lack of check and for your for your problem for your question, sorry The root cause here is we have sampled the execution of this BPF program by dumping the countries we have found it is called by through the API pool if you currently execute a high-concurrency network workload this function will be called very frequently So the case, the one-bit, the BPF program hook on this country so it also executed very frequently and cause the overhead Thank you So in light of some of the questions that people had earlier about you know, are there states that you could leave when doing the sort of checkpoint and restore approach Yes Have you all considered running syscaller on the patched version of the kernel see if additional crash circumstances come up from that? Yes, we have also make a similar evaluation of your questions We just modified a little of the proof of concept program so it will execute on the on the error side but the parameter will not trigger the error condition and there is nothing happens, everything good I've been thinking of like other conditions that might have now been turned into a potential crash or bug of some kind as a result of the hotfix We haven't seen we haven't had those cases maybe exist, thank you So if I understand correctly what you're basically trying to achieve is you're modifying the execution in runtime to like step over the issue have you considered since you know what end result you want to achieve so you know how the function should be executed in runtime there is certain instruction you want to jump to generate the modified function by copying that original function modifying instruction in the way you want and then using the live patch infrastructure to replace the old function so won't this give you better performance compared to doing the breakpoints with ebpf like replacing the old function Do you mean hotfix? So I understand the advantage of this compared to actually doing the patch is that you can automatically generate the ebpf program that jumps that fixes the bug but using the same infrastructure you can generate the updated function to do the same doesn't make sense to use live patch I know your question Here is a problem the hotpatch and the hotbpf++ works in a different period hotbpf++ work coming when no patches are available but the hotpatch you at least need a patch correct, but you essentially you can generate the patch using the same workflow that you have as I explained previous the triggering condition and the root cause is different if you need to fix the vulnerability you need to analyze the root cause and the root cause is very complex but if you just prevent the error condition it can be deducted through an automatic tool is it clear? Yes, but using the same automatic tool you can create the new function just instead of patching it instead of having the bpf breakpoint which may have extra overhead you can create a new function and replace the whole function I understand what you mean but there is a limitation on the bpf actually the original bpf merely supports only a little modification to the kernel actually it didn't support this it's not a Turing machine maybe I can modify the kernel data as much as I want so I cannot just implement a new function to fix this problem because the bpf program is running a virtual machine and is limited is it clear? Thank you so much