 Welcome everyone. My name is Magier Jachowski. I was working for a couple last couple of months on fuzzing NetBSD kernel. However, this knowledge also applies to fuzzing other kernels. It shouldn't be hard to get this working also on freeBSD. I was also doing some work to get this on freeBSD, so you can also apply this to fuzz your freeBSD kernel. So the outlet of the presentation, first of all, what we are trying to achieve. And this talk also assumes that you don't have a lot of knowledge about the fuzzing and breaking the kernel. So this will give you a pretty good overview of what you can do, what techniques are available, and some background knowledge like how these things work under the hood. So we'll talk about the coverage as a kind of feedback for our fuzzer that is used for the fuzzing. I will also mention a little bit of sanitizers, but I won't go too deeply to sanitizers because this is also a very white topic. However, yesterday was a very good talk about sanitizers on a VM dev room. So if you missed it and you want to learn more, you can also see the recording. We'll also show how to do the basic setup for the fuzzer. And as a demo, we will try to run fuzzing on the FFS, on NetBeasD kernel as a virtual machine. So hopefully we may find something interesting. Sure. Okay, so the first thing to justify why are we trying to break things or why I'm doing this is essentially need for having multiple different ways to improve the quality of software. And as we all know, kernel is a very critical part of the operating system. If you break the kernel, you have serious issue. And there is no known silver bullet to solve all the issue and ensure software quality. So usually what people does is multiple parallel actions to improve code quality and software quality in general. Because we know we always will have some bugs or some issues. But from the other side, we would like to have reasonable quality software and something that won't break very easily. So we can have some list of things that we can do to improve our software starting from stuff like getting code reviews, applying best practices and so on and so forth. This list is far from being complete and probably it can be a couple of other slides to just list different techniques that we know. But I think one of the most important and most interesting or some other people may say boring things is about testing the software. For me, testing is like the time when software meets the truth. So if you just do the reviews, if you are just doing analysis of your code without testing, without running the software, of course it's not good. From the other side, technique is also very wide. There's a lot of resources how people do testing. But I think in our context it's fair to say that fuzzing is also a kind of testing for our software. So first of all, what the fuzzing is. Usually we call fuzzing when we have some way to test the software which expects some input. And as a testing technique we are trying to give our program some strange unexpected input and we are observing what is the behavior for this particular input. So the simplest fuzzer that can be written is just, for example, let's say we have some binary that you would like to fuzz, which is called fuzz binary, and this binary gets us an input 1000 kilobytes and some raw byte input. So what we can do, we can just write some very dummy fuzzer in a couple of bash lines or even in one, if you like, very long sentences and just get some random input, save it somewhere or pipe it to the binary and then run binary. And the funny thing is if the program is not written having validation in mind, in that way you already can break some programs, which is very funny. But the good thing would be to think how we can improve our dummy fuzzer. So what we can do to get the fuzzing smarter. And one thing that came usually to head is how we can generate the input in a more intelligent way, more smarter way. So one of the techniques is mutation-based fuzzing. So mutation-based, essentially, assuming that we can come up with some strategies that will be smarter or have some logical sequence instead of just getting the random bits. And we can, based on those strategies, we can mutate the input and provide this input to the application. By definition, mutation-based fuzzing doesn't really care about the state of application because we are just caring about strategy. So we have some strategy to flip certain bits or the strategy can also be aware about the grammar. So for example, if you are fuzzing the HTTP requests, you can get some payload, which is like in order with HTML standard, and then play with different tags. You don't have to have just raw bytes that you are modifying. So in that way, we are able to much easier find some interesting inputs than in just a random way. And also, if you get a random input, usually if your program expects some different tags, some different header in the input format, you're usually just stopping on some first couple of checks that are making sure that your input has some, for example, certain header, certain pattern at the very beginning. Another way to improve the fuzzer is introducing the feedback loop. So in the feedback loop, you essentially get some feedback from running application about application state. And this state can be measured in different ways. The most popular one is the code coverage. But we also can think about other things like timing, CPU resources, and stuff like how application behave with the environment. And so the funny thing when I was doing those slides, I just put timing because first of all, timing is known from cryptographic software. For example, if you have some cryptographic software which have different timing for different requests based on the execution path, you obviously have security issue. But I was thinking, yes, that's a good example, but from the other side, I didn't saw many timing-based fuzzers. And yesterday when I was waiting for my Fosden t-shirt, I met my colleague, he's working for Tor project, and we started talking, and I told him, oh yeah, I'm doing the talk on BSD Devroom about the fuzzing. And he was like, oh really, I have a friend who actually was doing the fuzzing based on mutation, and he used a timing for execution of this program to the firmware, and based on this execution time, he was doing modification to the software. And by this way, he was able to break some popular electronic device, a string device. So I was like, oh really, so that means this really works. I will also try to depult it in this timing way to measure the feedback from the application. So it was pretty funny. Also from the testing court, we have stuff like, we can think about our fuzzing based on the application that we fuzz. So if, for example, in similar way as we are doing testing, so we have white box testing, black box testing, if we don't have, for example, all information, but we have some information, we can also say we have gray box testing. So this feedback loop also depends what kind of fuzzing are you doing. In our example, we'll be doing white box testing because obviously we have a kernel that you can compile, we can instrument the code, and then we can monitor the state of the application based on those informations. So coverage tracking. As I said, that's one of the main technique how you can get the feedback from the application, and that is how many fuzzers works. Main coverage trace is PC trace, program counter. It tells us about the execution path, and the good thing how we can start our understanding of this format is essentially the way how it's stored in our program. So whenever we run the program or whenever we have a kernel, because kernel is also a kind of program, we need some array, we need some memory location where we will be putting the sequence of PC counters. So in our example, we have an array of size 100, with 100 entries, but it can be much wider. And when the program is executing, this can be done for example per thread, per process, or if you are running the kernel, it can be also something per kernel thread, but also you can think about some global array. For example, if you don't care about the threads, but you care about the execution path, because what can happen if you are fuzzing the networking, you can put the request to the networking queue and then another thread will be getting this request from the queue. So if you are fuzzing based on the thread, then you will be just seeing the path when you put the packet to the queue instead of seeing another thread which can be different that will be processing your request. So how this exactly works, as I said, those are compile time instrumentations, so they are put by compiler. So on the right we have a program, very simple, which have main, which call bar, which call foo, which call bar. So what will happen during the compilation phase, our compiler will put some magic instrumentation at the very beginning for all functions and those will execute when we are executing the program and won't interact with our program. But what are those magic instrumentation? So I get a code listening from NetBSD kernel and this, I'll leave it modified just to give you some brief idea what is going on. So every time when we hit the instrumentation, we call this function instrument code and this function need to get our memory that we reserve for presenting the PC counters. And obviously we get the index because we need to write another entry. We also need to do stuff like border checking because based on also your intention you sometimes may overflow and if you do the questions for you what you would like to do. But the most important part we are just getting the PC from the function from which we were called by just compiler macro. So if you run our very simple program, I will end up with array of three entries main foo bar. So it's very straightforward. And we don't have only PC trace. We have also a couple other different trace. We have CMP, DIFF and GEP what they are. CMP trace is used for application or is used to instrument the comparison instructions. DIFF is for every time when you're doing division between the arguments. GEP is for manipulating indexes of the arrays. So compiler have some understanding about those. And when it's compiled the code it will put the instrumentation before every instruction is performed. So then you can have better understanding what you need to have different types. And so you can imagine like based on your program you may doing some mathematical operations like you can compare the argument, you have some graph or some tree that you are traversing. So only the PC counter doesn't give you full information because you may always see the same path. However, the path is not the same because you are just getting different arguments. Or if you are manipulating a lot of indexes of array you may also see the same function called over and over again. But from the other side you use different arguments. So I here present the CMP trace. It's a little bit different than PC trace. But from the other side is something that also you have in all DSD kernels as far as I know. And also other one are similar in terms of format. So instead of just one information you have arguments of the operations. We have the numbers that we are comparing. And the types tells you about the size of the arguments. So it's 8 bits or 16, 32, 64 so on. And also you have PC trace. So you have a sequence of those sets of four values inside your array. And as I said everything depends on your code that you are fuzzing. So it depends which part for example of the kernel are you trying to fuzz. It's always good to think. So if the PC trace is the only one that I would like to see in my fuzzer. Other important tools, synthesizers as I said I won't be going how they actually works. But I'm trying to convince you that they are very useful. And the reason why they are very useful is when you meet any issue in your code. The code may not exit after invalid operations perform. For example you can have memory corruption but this memory corruption won't expose easily. So in that case you can run the fuzzer and then don't have any crash even if you corrupted some data. So if you get the sanitizers, here we have three kernel sanitizers, other sanitizers, leak sanitizer and memory sanitizer. They are available in NetBSD. You can also have undefined behavior sanitizer. There is thread sanitizer. I didn't know too much about thread sanitizers at the moment but I am also looking about them. So the very simple one to start with is CASAN because it's the text things like out of bounds for chip stack and other like a common mistakes from the software. The downside is you cannot have all at once because some of them are mutual exclusive with other ones. So for example by definition you cannot have ASAN and MSAN, other sanitizers and memory sanitizers because they are relying on the same data so they are overlapping each other. So in LLVM it's specified that you cannot use both. And from the other side they also provide some slowdown because they are compiled times and they also introduce other instructions. But as I said they allows you to fast box easier and faster so I think it's a very big leverage for the fuzzer to actually run your fuzzer with address sanitizer or any other sanitizer. Depends on what kind of bugs are you expecting. Let's go to the fuzzer. The one that I was using for a couple of months is American Fuzzy Loop. I think everyone know American Fuzzy Loop is very popular. It has very good record for found bugs. Some people also claim it's pretty old at this moment but I think as a starting point it's very good because first of all it's very easy to use. It's rock solid but from the other side it didn't have a goal to fast the kernels but just the user space program so you cannot use it directly. To fast your kernel you need some modification. So what that modifications are we can think first of all what we have from the kernel site what we have from AFL site what format AFL require. So Unix Linux 3BSD, NetBSD, OpenBSD and other expose the coverage using coverage device slash dev slash cake off. Using this device you can get for example PC Trace, CMP Trace. The way how you access this data you configure the device using iOctos and then you do the memory map. Then you run your program and after you finish your map memory I should contain those areas that I discussed previously. From the other side AFL use its own specific format where AFL focus on part pairs of the PC counters. So it stores unique PC counter pairs which means every time when we perform another instrumentation which means we have another PC we remember the old PC counter or zero if we just started and then get the old PC counter and the current PC counter explore them together very easily make sure that we don't overflow and then increase one in our map which gives a compiler which gives a father hint. This pair of those two PC counter just happen. So then the father can analyze this array and say okay so I got another pair that I didn't solve before or maybe I always get the same pairs of PC counters which mean I need to do something different. So it's very easy to convert PC Trace to AFL Trace because they look the same. But from the other side we don't even have to do that. We can just store because it's just storing redundant data we can just do this and fly. So we can store two values and this map and this is everything that our AFL needs to work. In order to do that we did some work and modify a little bit and it did take off and essentially we allow you to plug in another kernel module which will hook to the coverage functions that are already there. By doing that you don't have to manage the default resources but you still can use them if you like. But from the other side you can do things like as I show like converting one input from coverage to another one. You can think about any other type of fuzzer that you can use and you don't have to copy all cakeoff code and then manage file descriptors, make sure that threads are open, closed, all of those things that you need to copy from cakeoff. You can just focus on coverage functions and M-Map that's what you really need and then you can leverage on what is written already. So how this looks. First of all when I think about the fuzzing I first of all think what is the input, what is the output. So here we have my setup for fuzzing the FFS mount. So my input is my file image. My output will be the result of mount or kernel clash and this sequence also requires a couple of other things like a wrapper as I said there is just like a plugin for cakeoff and it works like we have a fuzzer. The fuzzer creates the file system image. We need a wrapper that will prepare the file system image to be mounted but also will execute the code that we need so it will be executing the mount. Also we do not trace the wrapper. This is just a thing that help us to trigger the execution path in the kernel and then we do the MAM. MAM is calling skull, VFS, file system layer. On all of those layers we get the coverage data, the coverage data then transformed by our AFL plugin. This plugin is exposed to the fuzzer as this map of the pairs that I showed before. And then after every run the fuzzer will get the input, run the input will get the output from the short memory. We'll see what different strategy as our feedback we should change or did we found any interesting things and after the feedback phase it's just perform another operation over and over again until we've hit something interesting. So I would like also to talk a little bit more about the wrapper itself because also it's interesting. So the way how you usually write those wrappers is I write them in shell. I focus okay so what kind of operation I'm doing and then after I'm done with shelling I need to translate to some other better performing language. I usually just use C but you can use C++ or anything else that can expose your interfaces like system calls because you don't want to run on top of libraries, someone else libraries you just want to have the simplest, maybe not even the simplest but the shortest code that you can do. And this is also well documented on every fuzzer that the performance is very key even if your fuzzer is very smart and doing very good guesses about different inputs and analyzing the output very well. It still can perform much worse than much simpler fuzzer which is just faster because in order to find some bugs you'll need to run this 10,000 times, million times iteration maybe even billion times iteration if you have some very well-tested software. So in that regard you need something that is performing cooperation very quickly. So remember performance is key, use raw interfaces. Other things also that are useful when we are doing the fuzzing is how I can see what is going on inside my fuzzer and for example the problem that I had was when I run my first fuzzer into the first fuzzer it's not very slow but from the other side it's very unaffective so I was thinking what's going on inside this fuzzer how I can debug that. So then what I did, I run the takeoff on my fuzzing wrapper and I monitor what kind of execution path is inside so by the fuzzer. And I realized for example in NetBSD we have much more verbose kernel and we got a lot of operations from a virtual memory even if you are doing mount you see more virtual memory operations which mean you are actually not fuzzing the mount you are fuzzing stuff related to the management of the pages. So this is also a bit tricky and for example in this way you can get the input you can understand what is actually going on under the hood. Other thing that you can also do before you start or if you would like to tune something it's also doing coverage benchmark, fuzzer benchmark so you can see for example if you want to understand if your wrapper is fast or slow you can for example put some very dummy code inside the kernel what I've done was create some simple character device we just get some input compare this with the pattern and then if the pattern match you want the lottery so your kernel crash. If it doesn't you just run over and over again but then I can see how well my because this code is very simple I can see how well my operation in my wrapper perform and from the other side also you can compare this kind of check with user space which also gives you some intuition how well are you doing. So to do the local setup of the fuzzer we also need to have some initial corpus so this initial corpus you can create corpus from for example just raw zeros it's very well documented that AFL is able to reproduce some different formats so for example if you are fuzzing the JPEG or some other images AFL is able to reproduce those images but from the other side if you are running a little bit slower because you are in the kernel space and also you don't want to spend like a initial CPU cycle just to figure out what is the header or what is the magic number for the file system inside the image you can just create one and provide this as a raw initial data for the fuzzer so this also very helps with speeding up the fuzzing and once we have that we can run our fuzzer with option K which was added in AFL by a guy from Oracle when they also were doing the fuzzing of file systems so essentially I reused a day work and integrated this with NetBSD and FreeBSD to make them work for both and then you also need to specify the wrapper so the important thing is in that case you are running on the same binary that you are running on so do not shoot into yourself in foot because if your kernel crashes then your data may disappear you may not even see the latest output in the AFL so you need to make sure that you know what you are doing so if for example you are fuzzing something what is unstable and you will be finding a lot of bugs maybe it's good to get this input and output somewhere outside of your kernel so you can mount this on NFS for example it will be slower but from the other side you will have less issues and then run this process remotely because if you don't then every time when you crash you can lose the data that you are looking for but from the other side if you have well tested software and you just will be happy if you will find a couple of bugs or maybe one bug even you can just run this natively on your kernel what I have done for example was connecting the debugger to the kernel run AFL if anything crashes I can just get output from the debugger and see what is going on I can also get crashed somewhere sometimes I realize that it depends on the path that you crash you still may have some problems with getting the output because the ideal scenario is you get the output which is the latest image that you are fuzzing and then you can get this image and check if this image always breaks my kernel this was something that was from before testing so usually it's very nice if you have output that is a payload that you can just mount and then it breaks your kernel so how many iterations to find a bug in fuzzing FFS so what we will be doing we will just run on the same virtual machine the fuzzer and that's actually some other issue that I found at the very beginning but hopefully we can find something interesting okay so we need to get I don't know how big it can be because then the fuzzer needs to also okay so we have here the fuzzer it's modified version we have inside in a file which is our it's not zero so the name is a little bit misleading but we can see that oh this is zero okay I was fuzzing this anyway it's actually this is zero so I was fuzzing this before and I think yes thanks okay so it's not zero so you can see this is the legit file system image so we have the offset which is 8, I think 8k and we have some some magic from the headers of FFS I want to go to magic in detail but we see the structure is it's not just a zero file and so the way how we run it we can have this okay so I won't run this yet because I usually also get wrappers absolute buff wrapper so we run afl-fuzz minus k i from input out from output the intro number exactly but I think this definitely will work if we will get the wrapper itself okay from the other side this running on gdb so we have connected debugger because this is just a virtual machine and this is the debugger that is debugging the kernel so we can stop in every time we can see the trace we can continue trace we can also see that because we stopped virtual machine just froze we can just continue so then our virtual machine is still living okay let's run this thing oh we already get something so this run the dry so this this run the so it was running the dry input at the dry run at the very beginning so at dry run it's just do some simple modification and what you can see here is yeah usually it also should show you the afl console but because we hit something probably because I was testing this before so we have some old history so it was also running something that was found before but we can go here and this is already running in getting the core dump from the kernel okay but we have operations that crash so we have vfs reclaim which was called we know that was reclaimed from the vcache so that's kind of moment when you start debugging the kernel and I won't right now be doing this because it requires some time and also I'm not in the right now great position as a presenter to start debugging that but this is some this is something that it broke our kernel so this might be related to them to the data from the vnodes and this we can get this as a core we can start debugging this we can start debugging this in gdb okay okay so conclusion so fuzzing the kernel is another way as I said earlier we need a lot of different ways to test our kernel that we can just we can just do to make sure that our quality of our drivers, our file systems is better also if you like to start searching for some bugs this is also a very easy way to start and we have recently a lot of good work for improving things that allows us to run the fuzzing so for example gathering the coverage in netbsd and other bsd in general improved also sanitizers in particular in netbsd become much better over the time it's very easy to start as you see it doesn't require much knowledge so I think in this half hour I give you almost everything that you need to start your adventure with the fuzzing and also you should run the sanitizers by the way this was not run on sanitizer because we didn't have any sanitizer output so future work I would like to have some discussion with other people I saw Andrew presentation from FreeBSD.his also doing something on AFL so maybe we can came with some similar way that we fuzzed bsd I saw previously on linux they introduced another device it was slash diff slash afl which community didn't like it and it was not upstream at the end so I think if we will try to get this on common interface that will be beneficial for everyone because then people from security projects can just you know fuzz our kernel in the same way I also need to spend more time on improving the remote fuzzing because it didn't scale very well if you want to find many bugs and just continue with fuzzing and fixing the bugs the resources so our netBSD blog there is also paper oracle paper from 2016 my collection of mount wrapper clonic documentation also was very useful source of information and I will be happy to get any questions if any thank you very much one group for all the questions and the answers the question was can you use afl for remote fuzzing not for the fuzzing that was presented the answer is yes but you need to do additional setup because as I said afl use the same data on use input and output directories so you need to mount to have them remotely you need to run this as a process and then you need also some way to send back the information to fork serve that crash happened because your process just stopped and the thing that I'm currently looking on is how to get efficiently the kernel dump because then you have the dumping process so this location for dump also need to be mounted for something like NFS again or some other protocol I mean the simplest one but you can potentially come with some other good ideas and I don't know how hard it will be to get afl console to analyze those crashes to say because when you're fuzzing the user space library or user space binary you see those unique crashes for example so afl is going through the core dump of the process and then saying okay so this one occur so then it's not unique it's like something similar or this one is not unique because it's something different and then afl also take strategies from those crashes to improve so I think that part about analyzing the core dump from kernel is some unknown for me other things can be done easily go ahead okay maybe someone wants to leave the room before the second question okay