 Hello everyone. Thank you for joining this talk. My name is Ricardo. I'm a software engineer at Colabora and today I'm going to talk about fuzzing the Linux kernel and how to use this color to fuzz a Linux driver. So let's get started. So before we begin let me give you a brief overview of what the talk will be about in more detail. First of all I'm going to start by giving a short introduction to fuzzing in general, what it is and how it works and why it is a valuable tool in the Linux for Linux kernel development. Then I'll introduce this color for those of you who aren't familiar with it already. I'll show you how it works, what are its key features and how to configure it. And once we are familiar with this color I'm going to show you how to use it to target a specific part of the kernel. In this case a particular driver, a unit dedicated hardware board as a test machine. And finally I'm going to show you how to get results from this color and how to use them. Software testing is a very large area and over time it's becoming even more complex and developed which makes sense because as you may know most of the time in software development is spent in maintenance and debugging. And testing is a way to help us reduce both the number of bugs and the time we spend looking for them and trying to fix them. Now there are many approaches to software testing and different techniques with different goals. In this talk I'm mostly interested in using testing as a way to find bugs. In this regard there are two main methods or at least two of the most well known and most used which are unit testing and on the other hand fastening. Most people is already familiar with unit testing as part of the software development process. Unit tests try to check there are certain piece of code behaves as documented so developers and test writers write manually tests to make use of a function interface and try to cover the documented functionality including corner cases and edge cases in function input. And if done correctly over time this test will help catch bugs when they are introduced. For example when a certain feature no longer works as documented or when a code error is introduced in the path covered by the test. Fasting on the other hand is a more automated process and doesn't require you to write any kind of tests manually. What a fastener does is instead is to generate random or semi random sequences of instructions as programs including also randomized data inputs to try to trigger certain bugs that went unnoticed in the first place. So the idea is to produce both data inputs and runtime flows that may be similar to the one found in a real use case. And beyond that that is to also create patterns that trigger bugs in code parts that aren't normally executed in a normal use case but they are there and had to be found. In summary a faster will try to come up with non-conventional data inputs and interactions that are hard to get in a manually written test. When passing the Linux kernel the test surface is the system call API which is the part that is accessible from user space. So a kernel faster will generate sequences of system calls with semi random or data inputs to try to crush the kernel and then it tries to be aware of that when that happens. Now there are many types of buzzers and not all of them use the same strategies and the same techniques but in order to make passing effective in a target as big and complex as the Linux kernel there are at least some number of key features that we want in order to make the passing process more effective. Because otherwise we would be relying only in pure luck so in order to make the passing process more directed and more informed we want at least these series of features which is first of all the ability to check code coverage during the test so that the faster can do a directed search instead of a random one. We also want some way of getting information about the source code so that the faster can generate test programs more efficiently. It's also good to have the capacity to generate data inputs and code sequences in a random but smart way because we want the two things here. We don't want something that looks like a manually reading test but we don't want a completely random test either and of course having a good report generation it's also important because when the faster finds a bug it's good to know that there's a bug but we would also like to have some information about how to reproduce it. Now of all the kernel clusters that are available CISColor is one of the most recent efforts and it's probably the most successful today so we're going to see how it works and how to use it to discover bugs in kernel code. CISColor appeared in 2016 it was written by Dimitri Vukov and it has become one of the most important tools to make the kernel more robust and more secure. It's a coverage guided fuzzer we're going to see what that means and it makes use of many kernel debugging features to make the fuzzing process more efficient. Now what does coverage guided mean? Well CISColor any other fuzzer will generate sequences of instructions as program as test programs in this case the instructions will be mostly system calls but this generation process is not completely random instead what it does is to keep track of the amount of kernel code that ran as a result of running each of these generated programs and use that information to guide the code generation process. The idea is to cover as much kernel code as possible so when CISColor mutates its test corpus we're going to see in a while how that works. Code coverage is one of the most important aspects of it this means that a program that covers new kernel code has a greater mutation potential. Now in order to improve the fuzzing process CISColor uses a few kernel debugging features the most important one is K-Cov. K-Cov is something you can enable when you are compiling the kernel when you are building it and it's key to CISColor. What it does when it's enabled is to make the compiler introduce instrumentation code all over the kernel code so that the kernel can keep track of the curve of the code coverage and share this information with user space. Then CISColor will retrieve this information and check the code coverage when it runs one of its test programs to see how effective it was. Another feature used by CISColor are kernel sanitizers such as KSM, KT-SAM, KC-SAM etc. when available of course. These sanitizers offer runtime detection of certain error conditions such as out-of-bounds memory access or null pointer the reference or data races and they are also based in in compile time instrumentation sorry and the combination of an automated tool such as a fuzzer that generates programs and runs them automatically with these sanitizers is already a very good tool to detect some bugs some very simple bugs which are not tied to the application logic but are simply programming errors such as null pointer the reference and this combination is able to discover new bugs automatically in a pretty simple way without barely any human interaction at all. Additionally CISColor can also use the kernel fault injector such as failslab or faultfutex to make the kernel introduce control faults during a test and what CISColor does is to selectively enable these fault injectors so that it covers even more code that normally would be executed only when one of these faults happened naturally so normally there are very very little chances of getting to these deep code checking error conditions and such and fault injectors made code coverage of these blocks much easier. Now we're going to take a look at the architecture of CISColor in a global way. The slide shows a diagram of the architecture and you can see it's divided in three agents which are CIS manager. CIS manager runs in the host and it starts and monitor and restarts the targets which may be virtual machines instances or in this case a Linux system running on a dedicated board. So it starts the CISCfazer processes in the targets and is responsible for the persistent storage of both crash reports and the test corpus. Then CISfazer runs inside the test target which is a completely separate environment from the host where CIS manager runs. It keeps a communication channel between two sorry two CIS managers so even though they run in separate environments they communicate between each other and CISfazer guides the fuzzing process and sends whatever input it created that trigger new code that cover new code. It sends that back to CIS manager for storage and finally CIS executor which also runs in the target machine whatever it is. It processes the test program created by CIS executor and so it accepts the program runs it and sends the results back to CISfazer. The programs here are simply C++ programs generated by CISfazer and they are statically built and standalone programs. CIS color uses a user defined high-level description of the available system calls described in a domain specific language called CISlang. These definitions, these information lets CIS color generate more thoughtful code using reasonable parameters and usage patterns instead of simply generating random sequences. So for example the open system call for this system call it makes sense to use a path name parameter looks like our actual file path because if we simply relied on input in random strings for this parameter it could take forever for the generator to create a string that by pure lack looks like a file path. So most of the time the system call would simply bail out when checking this parameter and we'd never get to reach the interesting parts of the code of the system call. So here's an example of definitions in CISlang of open, close and read. You can you can see that the file parameter doesn't take a string as an input. Instead it takes a file name data type which is defined as a string with path lag formatting, a string that looks like a correct file path. And also the the system call flags are defined for the same reason to avoid simply using random integers here which would cause the error or the parameter validation code in the system calls to bail out immediately. However the the fuzzer would will still use some random numbers in these kind of parameters especially it will try to input some carefully crafted integers to test the edge cases for these kind of parameters because it it can trigger some bugs it's interesting to do so but most of the time it will find more interesting results by using the information provided by the user here. Now assuming that we have a test setup that is ready to be used by CIScolor for example a virtual machine image. Oh by the way this is pretty well documented in the CIScolor documentation. Now to start a test we have to create a configuration file that describes the target environment and the system calls that we want to enable or disable for this test. Here's an example in the slide and once everything is in place we can start CIScolor by running CISmanager in the host and pointing into the configuration file that we just wrote. Once CIScolor starts running it will take control of the target machine and it will boot it will configure it and runs this fuzzer in it keeping a communication channel between CISmanager and CISfuzzer. After that it will start growing the test corpus generating test programs and running them and keeping track of the amount of kernel code covered as well as the possible crashes when they appear. The evolution of the test corpus is influenced by internal heuristics and also by the priority given to each of the elements that's already in the corpus. So initially the test corpus may contain all the primitive instructions very simple system calls and over time as the test programs go uncovering new kernel code the test corpus is mutated taking these programs as bases for new ones. Finally I should also mention that CIScolor starts a local web server with a web GUI that's shown in the slide where you can check the current test corpus the system calls that are enabled the current code coverage and also the number of crashes and their reports. So now that we have a general idea about CIScolor we're going to see how to use it for a specific use case which is fuzzing a driver in a dedicated hardware board. Now there are many requirements on the type of machine that we can use as long as it can run Linux and that it has a network link with the host because the host will open an SSH session to the target machine we can use whatever we want so I'm going to use a rubby 4 board because the driver that I want to fuzz which is the hand through video for Linux 2 driver runs on that hardware. So first of all we have to prepare the runtime environment for CIScolor because obviously CIScolor expects the target machine to behave in a certain way, long way for it. So the CIScolor distribution already includes some tools to make this process automatic and to generate file system images for different types of targets. The create image script in the tools directory can be used to create a generic file system image that is suitable for CIScolor to use and also of course the kernel should be also generated with at least a set of debug options enabled which is this is all well documented in the CIScolor documentation and you can see that at least it enables and configures kcuff and one of the sanitizers here kasun as well as you know you can enable any other sanitizer that it's available in the board architecture or on your target machine that you want to enable for additional checks. Of course you also need to build all the necessary modules. In this case since we're interested in the hand through video for Linux 2 driver I also enabled it and then you can bring up your board using the kernel and the rule file system provided that you regenerate it and my personal recommendation is to configure the boot loader so that it can boot the kernel from tftp and mount a root file system via NFS because of I don't know you know I like to avoid relying on an sd card because of the potential where after long test sessions you know these kind of targets will probably stay in a laboratory running 24-7-4 months so you want to avoid using non-volatile media that is not too robust. Finally we have to change some options of the configuration file we rolled before to suit it to our test target which is the Rockpy 4 board. Specifically note that the type option is isolated which means that it's going to run in a dedicated board elsewhere not in a virtual machine or anything like that and in the vn section which is still called vn our target now is the IP of the board in our setup okay so now that we know how this color works in general terms we're going to see how to target the flossing process and how to make it more specific to a concrete part of the kernel which is the in this case the Hunchrow driver because unless we enable only and specifically the system calls that we want to test in the configuration file and so before this color we'll try to generate programs using every system call that it knows about that means every system call that is described in in syslang and this is not what we want because we don't want to fuzz the kernel in general in our world we want to spend most of the time fuzzing one particular part of the kernel our driver so the first thing we need to do is to make sure that syscolor knows about the interface for our driver and that means refining and improving the description of system calls now the collection of system calls descriptions in syscolor is something that is a work in progress and it's always evolving so this means that there will be some parts of the system call API which are described in more detail than others in syscolor for example the descriptions for video for linux 2 drivers were created a long time ago and they are not complete furthermore a video for linux 2 drivers have a notoriously complex and big user interface and it's always constantly changing so the descriptions may be out of date and if we use a recent driver it will most likely use newer operations and flags that aren't maybe described yet so the first step would be to make sure that our driver interface is properly supported in syscolor that means including the necessary new syscalls and flags described in syslang so that syscolor knows about them and can generate code that use them in this slide there is a fragment a very simple example of how you can define new system calls in this case 400 as you can see there are some things that are different from the normal open-net descriptions in this case there are some parameters that are fixed and there's a specific device file name which makes the test process use that path name directly instead of trying to use whatever randomly generated file name now let's talk about device access most driver interactions in linux involve opening a device file and working on the descriptor return by open the problem with this approach when it comes to testing is that a certain device won't always be accessible through the same device file name this is something that will probably vary between setups syscolor doesn't have any knowledge about the devices in your machine so it makes a best-of-fort approach by describing system calls using pattern matching to generate file name strings so for beautiful linux2 drivers for example you may find something like that long line in the slide which as you can see tries to open slash the slash video hash and that hash stands for any number so that system call which is actually a pseudo system call we'll talk about that later we'll try to generate open system calls to any kind of file name that starts with slash the slash video and then a number so maybe it will start targeting device file names that aren't really in your system so again a lot of time will be spent doing nothing trying to open something that don't exist and it would be much faster if we could restrict the open system call to our particular driver so our way to do this is to use some udev rules so that udev will create normal file name for the driver that we want so this way we can always be sure that we can reach our driver through the same device file name and we can add these in the box in the slide something along these lines to the create image script that we saw before in the tools folder to automate the process during image creation there's one more thing that we can do to make the fasting process more specific to our driver and this has to do with pseudo system calls now please take into consideration that pseudo system calls well incorporating new pseudo system calls to system color is discouraged because it makes the core system call bigger and why it makes it could make make sense for pseudo system calls that are general and that can be useful for many users in this case which is something that's very specific for one driver it's totally discouraged but still I think it's good to sorry I think it's good to show how they work and how it can be used so as we saw in the previous slide system call descriptions in sys lang don't need to be exclusively system calls but they also can be pseudo sys calls pseudo sys call is simply a function a block of code implemented internally in sys color so when the code generation process picks up one system call and decides to use it for a test program it simply outputs that call it's just an instruction and whatever instructions are needed to generate the input data and that's it but when it decides to use a pseudo sys call what it does is to take the whole block of code and output it in the test program using whatever parameters are needed in any case so there are some applications of this which are quite important for fastened drivers the first one is that this is a way of generating a static chunk of code that won't be reordered and won't change like a known sequence of instructions that will always run in the same way and this can be useful for preparing a setup for a part of the test or to run an operation that it's made of many system calls that need to be called in a particular sequence and it's also useful to generating input data in a controlled way or even more useful in a programmatic way beautiful Linux 2 drivers are a good target for these kind of things because most of the operations in these drivers involve calling a big long number of system calls in a proper order of execution and some of them depend on the format and the contents of the input data so leaving all of that to a random process will make it very hard for the passing process to converge to a valid input that will reach the inner parts of the driver code so now that we have six color running after some time if everything went right we should start seeing the test corpus growing in the webgui and also we can see that the code of our driver is getting covered and in case any bug happened we should see a crash report hopefully with a reproducer for the bug a reproducer for a bug is simply a C program that triggers the bug when it runs now this color will always try to obtain a reproducer every time it finds a crash although sometimes maybe it won't be able to do that because the bug may have been triggered because of a race condition so it's not caused by the code that ran itself by a test program but maybe it was triggered by a certain runtime condition so it's hard to reproduce simply by running a program and this color can't be sure of that or maybe because there's more than one candidate that could have triggered the bug because sometimes this color will run many test programs concurrently so it's the interaction between these programs that may have caused the bug itself still even if it can give you a specific reproducer for that bug with certainty it will at least try to give you a high level description of the program that caused the bug an example of that is shown in the in the upper side of the slide and and also this color provides a few tools that can be used to turn these high level descriptions into standalone C programs as you see in the in the bottom part of the slide so with a little manual effort you can take one of these high level descriptions and turn them into C programs and try to run them in the target and start investigating from there the advantage of having reproducers is that they reduce the amount of time that we spent investigating a bug tremendously especially if we take into account that this color already tries to minimize the reproducer to avoid having a lot of noise during the investigation finally I'd like to mention Cisbot which is arguably more widely known that Ciscolor itself. Cisbot is a continuous fuzzer and reporting tool which has a public dashboard where you can you can see it in the link in the slide and it has proven to be an invaluable tool for discovering new kernel bugs you know what it does is simply to run Ciscolor on a number of a bunch of kernels in a number of different platforms and keep track of the crashes it finds and report them with all the information it can find about them to the appropriate developers maintainers and the appropriate mailing lists and as you as we saw before you know Ciscolor gives as much information as it can about about a crash including a reproducer and Cisbot does that of course but it also gives by section information so it it's not only automating bug discovery process in many cases but it's also automating part of the investigation about the bugs that's all for today I hope you found the talk interesting and that you learned about the importance of Ciscolor as a tool for kernel improvement any contributions to the project are well received Dimitri is putting a lot of effort into it and is always welcoming new improvements and developments on Ciscolor so thank you all for listening have a good day