 Hello, my name is Denis Efremov and welcome to my presentation. With this tool presentation, the tool is named CVHound. What it is for? It automates the checking of kernel sources for missing fixes of known CVs. I started this project about 10 months ago. The tool works as a static analyzer and to detect missing CV fixes, the tool doesn't use kernel version, doesn't require a development log and doesn't need to know how to build your kernel. But internally, the tool contains a special rule for each VE it is able to check for. As for now, I describe it more than 200 kernel CVs. The main value of the tool in these descriptions, and I will talk about them in detail a bit later. Additionally, the tool supports various filters like searching subdirectories or specific files. Checking of code only enabled in a build config also supports it. Kernel config analysis is based on the Undertaker projects code. I reuse CV metadata from Linux kernel CVs.conf for information like CWV and CVSS scores. It is a very hard work to collect and maintain the metadata and many things to the project. Not so long ago I also had a report generation in JSON for CI systems. Let's talk a bit about why, in my opinion. Do we need to automate these checks and why the automation is not so simple? I'm pretty sure that you all already know these facts, but I would like to explicitly mention them once again in my motivation part to give you a better understanding of why the tool works this way and not another one. Some automation is highly desirable because we've got thousands of CVs assigned to the Linux kernel. Most of them are in recent years. Records are in different databases and many of the databases are somehow synchronized with each other. Lack of information about vulnerable versions. Wrong commit references. It's a common problem for many records. Moreover, we've got national databases with alternative identifiers. For example, in Russia, there is a stack database with BDU identifiers. You can just take a kernel version and check whenever your kernel is vulnerable or not because you need to know vulnerable version intervals for many kernels like Stable, LTS, XLTS, super long-term support and even after that you will face that there are kernels with, I would say, an independent backporting process. Sometimes when you do your own backporting you take also features from mainline to, for example, speed up the things or you backport even full drivers. And in this case, the information that a specific vulnerability was in a mainline kernel version from version A to version B gives you very little information. Moreover, the kernel is a highly configurable project. In some configurations are vulnerable while others aren't because a driver is simply not enabled. It's a common workflow when developers' backport only fixes for kernel paths they care about. After all, why would you care about a floppy driver if you prepare an Android kernel? Git log is not always available to you because as far as I understand, GPL license doesn't require you to publish a care development history. And even if you have an access to it, without being a kernel developer, it's very hard to say what CVS are fixed and what are not based on the log because there can be reward commits and early version of patches for mainline clist and many other things. When I started tool development, I kept in mind a couple of use cases. For example, let's imagine that you are an engineer in a certification lab and you have full sources for a device and one of the requirements in a certification procedure is that all known CVS should be fixed or mitigated somehow. So you need to check maybe hundreds of CVS in the kernel. Or maybe you are a system administrator and you can just update the kernel because of third-party modulus that simply don't work with newer kernel versions. And you want to know what administrative measures you need to take to mitigate possible bad consequences of an attack. So you need to start by getting a list of CVS you need to take care of. Maybe you are a pen tester and you don't want to find new exploits. So just extract the config file from working kernel, take the closest possible kernel version and get a list of CVS to check on a device. Or maybe you are a kernel developer and you are doing your independent back porting and you just want to double check yourself. Maybe you want to enable additional kernel modules like one of network file systems on your phone and you also want to check the state of these drivers in a tree before enabling them and making your phone vulnerable. As I already said, developers sometimes don't back port fixes to not enabled drivers. So, how to implement this kind of checker? Actually, we've got one in the kernel for more than 10 years already. It's called Coxinell. Without reinventing the wheel I use it in the CVHound tool. Many kernel developers already use Coxinell and know how to write rules for it. Coxinell is a static analyzer that allows you to describe C language patterns in a C language with additional meta expressions and find the real code that falls into these patterns. Here's an example of Coxinell rule. It describes a code pattern when we copy data from user space and compare it to some node string in the kernel. The pattern is a code we've copied from user and string compare calls with upper E meta variable to match variables and pointers and dot dot dot notation to match everything else. The Coxinell will find many different functions for us that fall into this pattern in the kernel. Usually it's about what you want when you write a rule for a static analyzer. And to make this rule as many cases as possible. Cases of API misuses for example. But in the CVHound tool I want a contrary thing. I want to search only for one case in the kernel and check it. Is it present in the kernel or node? So I add additional details to the rule to make it more strict. To detect only one case but detect it through all changes from the commit where it was introduced to commit where it was removed. Breaks commit where vulnerability was introduced and to a fixed commit where it was removed. So I still need to abstract out from such details as for example variable names. Some statistics to give you a better feeling of how much efforts you need to take to write these detection rules. I started the project about 10 months ago and since then I have described more than 200 CVs. Most of them were assigned to the kernel in the last 3 years. I spent only 42 days adding at least one detection rule to the project and there were only 8 days when I spent the whole day writing rules. Usually I can describe more than 10 rules in a single day. Each rule is tested that it detects a CV in the interval from breaks commit to a fixed commit and which rule is tested outside this interval that it doesn't detect a CV. To write a detection rule I take a commit that introduces a bug and a commit that fixes it. I need these commits only to test the rule otherwise I don't need them like I already said but it doesn't need a git log to work. Sometimes finding a fixed commit is not trivial. I even maintain my own list of wrong fixed tags but I found in different fixed commits. Usually it takes about 5-10 minutes to draft an initial version of a rule and about 10-12 minutes to test it. There can be many iterations of refinements and testing. Let's have a quick demonstration. I reverted a floppy fix from 2018 on top of 5.15 kernel. Usually to run the tool all you need is to specify a path to the kernel sources directory but I'll also limit the search to floppy.c. And the tool will find it for us. Just to give you a bit more information the tool internally calls coxinell with different patterns and coxinell also outputs a line number of a match for us. To check a kernel config option you also need to specify a path to the config file. And in this case the tool will output for us a config option that enables floppy.c and it will also check .config file for this option and it's indeed enabled. Another example. Here I partially reverted two capability checks on top of 5.15 kernel. The original patch adds four of them in different functions and I reverted only two of them. And the tool will output for us exactly two lines of the functions where capability checks were removed. And I would say it's a pretty common then there is an error in backporting or you accidentally revert parts of the commits later. Let's get back to the slides. Let's look at rules and some typical patterns that are used to describe a CVE. We will start from simple cases and move to more complex ones to give you a better feeling of extensibility of this approach. CVE on the left just removes some code from the kernel and in this case we can simply search for removed code and we will find it. It definitely means that the fix wasn't applied. CVE fix on the right adds a capable check and to describe it we use some real answers like function name, global variable name and abstract out from details like local identifier name and use dot dot dot to match everything else. There is a dot dot dot when not equal notation that means match everything if it doesn't contain this line. So we'll match rowsoc create if it doesn't contain capable call before assigning rowsoc rowoops to a pointer and later I also added NS capable call because capable was changed to it. Here I use another approach and instead of using dot dot dot when not equal I prepared three patterns and made them depend on each other. So in the patch we just add an initialization to we just add another entry to a global array and to detect this I need to check that there is indeed such constant in the enum definition. It was introduced in a separate page. I need to check that there is a global variable and I need to check that there is no initialization or this global array with this constant. So instead of writing dot dot dot when not equal we can write a pattern and check that XNL can find it in the sources. Let's move to functions that change existing lines of code. The error here is self descriptive, wrong permissions and in the patch we simply check for them. In the rule we simply check for them. This is a simple case when a fix fully describes an error and if contains full information that we need to write detection. With this slide I want to show you that in most of the cases Diff doesn't contain enough information to describe a severe rule. So in general case fix doesn't contain full information and like for this function race access for a global variable of a patch just adds mutex log and mutex unlock to one of the functions but the error isn't the combination between these two functions and to reliably detect missing fix I need to describe both of the functions because one of the commits adds one function with mutex logs inside another commit adds another function and third commit adds a global variable and the error is in combination between them. Here is another case, a simple patch that initializes a local variable but to describe it first I need to check that there is a structure with reserved field inside it. I don't remember exact details but highly likely there is no information leak if there is no reserved field in the data structure definition. So we check here for the field for the copy to user call and we check for absence of initialization and these receptions are enough to describe most of CVEs on practice but sometimes coxinal is not enough because it's only suitable for matching C code patterns and like in these cases with errors in an LD script and assembly changes you need to do different things. In a coxinal rule you can write a Python code and in this case with an LD script I check that in Python I check that an LD script contains a special string with a regular expression and I depend a pattern match on this check before reporting an error and in our approach is to fall back to regular expressions and in this case for CVE it's also possible to write a coxinal rule but I decided to try a grep mode with PCRE regular expressions are not very readable but they are very powerful with all these look ahead and look behind expressions and many more features. So that's pretty much all approaches that are used to describe CVE patterns future plans. Short-term plans include adding the tool to the kernel side testing workflow I already started to do this as adding a coxinal I see no value in checking stable trees because I already do this when I develop detection rules but it can be interesting to check trees that are based on stable branches and I also want to add more precise analysis of four kernel config options as for now it's done on a per file basis and if there is an if death inside the .c file the tool may report incorrect results and I also want to add a filter for checking files enabled by a specific config options like I already said let's suppose you want to enable a couple of config options on your android kernel and you want to check is it safe to enable them or not and you don't know what files will be built if you enable them a mid-term plan is to add another checking mod to the tool this lightweight mod will check only the git log and some of my colleagues ask me to implement this and this mod will be useful mainly for kernel developers the simple idea is to check that there is a breaks commit in your branch and there is no corresponding fix commit in the git branch I already drafted an initial implementation for this and of course you can just check for commit IDs because they are different in different trees so you need to take into account things like commit title commit author, commit time to really check a git log and there are many hidden stones here with reward commits, multiple commits that fix one CVE and of course you need a very good mapping between a CVE and the breaks commit and fix commit and long-term plan is to automatically generate detection rules based on a breaks commit fix commit and history between them and maybe some manual markup like this line should be present in the sources and this line should also be present in the sources and this line should not be present in the sources and I'm pretty sure it's possible to generate detection rules in more than 80% of the cases this is the end of my presentation thank you for your time, any questions?