 the right talk, which I've already announced before. So I'm really happy to have here a talk, which goes more in technical detail than the one which you had in the morning. It's about emulate fast and break kernels. So fuzzing is something you perhaps know since quite a time, but now it's about fuzzing kernels, which is something I'm really curious about. And I'm really happy to have Dominuk here. I hope I pronounced that right. You can explain later how it's correctly. And if you have any questions during the talk, please wait until the end. Then go to the mic and ask it in there, because if you just showed them in, that's OK for here. But everybody on the stream and in the video will have to play a game of Jeopardy. So guessing which question it was when they just hear the answer. So please go to the mic at the end. And for now, please give a big round of applause. Yeah. Good morning, everybody. I'm Dominik from Teobelin. And I'm going to talk a bit about kernel fuzzing. So we created a little tool. We called it UnicorFuzz. We, that's me, and Bastjan and Benedict. It's based on a lot of other tools. Right, what does it do? So fuzzing, I think everybody kind of knows. And it'll help you fuzz the kernel using AFL Unicorn, which is a cool tool by Nathan Foss and Evatar too, which is a nice tool from Uricom. And it finds bugs. And it works on anything that you can attach a GDV target, like get a GDV stub in. And it's open source. So yeah, kernels, you all know what kernels are. Kernels are these things. Fuzzing, I think you also all know what fuzzing is. Fuzzing is basically shooting at things until they break. It's not that interesting. But still, AFL, which is the main known coverage guided fuzzer, has been around for quite some years. But we still find loads of bugs. That doesn't make any sense. We shouldn't have any bugs anymore. Why do we still have bugs, even though we have fuzzing now? Well, lots of code is legacy code. Legacy code is not written to be tested. It's pretty hard to test. It depends on all kinds of things, like globals, proper, non-initialized values that are hard to fuzz and all that things. And also, it's sometimes really hard to get the right input in the right position. So yeah, it's pretty hard to set up a fuzzing harness. And then the kernel is this weird beast that interacts with everything around it. It's like this monolithic thing, at least Linux, that if it crashes, it's just gone. You have to restart it or find a way to respond it quickly. And maybe you overwrite some memory, and it doesn't crash. That's even worse, because you found a bug, but you will never know about it, because it just runs on. So yeah, it's even worse. And obviously, getting the right state in a kernel, it takes 30 seconds to boot or depends on your kernel. But people are using. I mean, people are already fuzzing kernels. So there must be a way, right? So most of these fuzzes here, prior fuzzes, Trinity, DeFuse, Triforce, Syscaller, and also Sorts, they all use syscalls. So they have this API from userline, talk to the kernel. KFL is a bit different. That's actually pretty good. Can also do similar things to us. But for example, Triforce AFL, which is a fuzzer for kernels, and it's using QEMO to emulate. It's really shaky. If you tried it out and something forks first, like QEMO is an emulator, it emulates the kernel and so on and so forth, if something already had forked at this stage, then it would be weird. And you would be really hard pressed to find the same bug again and so on and so forth. And nobody really cares about the product anymore, sadly. The other ones have different problems. You don't get coverage in the earlier fuzzes. And syscaller is pretty good, for example, but you have to write harnesses for that. They can only use kiskalls and so on and so forth. And there's this one guy, Brendan Falk, who's doing crazy things, but that's just above my head. No, so. OK, I had this idea, let's rip out parses from the kernel and fuzz them anywhere else. So that sounds like a fun idea, right? Why parses? Well, parses are the main things. They take something, they interpret it, and they output something. And that is really hard to do right, apparently. So that stuff breaks all the time. I have not yet seen a bug-free ace in one parser, for example. It's just all really hard to do. And they also almost never interact with anything apart from their little buffer that they operate on. So it's really easy for us to fuzz. We don't have any hardware that needs to interact and everything. So yeah, just copy-paste the whole parser to user and fuzz it there, right? That would be pretty perfect. Yeah, it would be. But then we come back to the initial slide, where I said it's really hard to set up the correct state for everything. So you would have the parser sometimes rely on weird globals in the kernel or some space that, you know, the kernel is just this land that nobody went there. You may not even be on Linux or something where you have the source, or you may be on Linux, but you have a weird blob from some vendor that is not open source. So what do you do? Yeah, as I said before, we emulate it. So we use the fuzzer, the kernel, and fuzz it inside an emulator. Along comes AFL Unicorn by Nathan Foss. It takes input, it drops it in a test harness, and then it fuzzes it. It uses Unicorn underneath. Unicorn is a fork of QEMU that is just there to be used in like a Python script or some random script. And it can emulate a multitude of architectures, basically almost everything that QEMU can do more or less. So, and then AFL Unicorn just adds instrumentation on top of Unicorn. So every time QEMU does block for block translation, every time a new block gets translated, it just, the jump will be a part of the coverage. So it says, oh, I found a new block here. New block is interesting for the fuzzer. Let's fuzz from there next time. Or let's take this into our seats or harness, whatever. So what we do, we have a GDB stub that we can connect to using avatar2, which is a Python library that can orchestrate your runs. Like it can also interact with other things in GDB stub, but so far we've only used GDB stub to implement. And then we have on the other side, we have AFL Unicorn. And every time Unicorn finds some memory that has not seen before, it'll go ahead, drop like a request to the probe wrapper, as we called it, that will then go on to fetch the memory from the stub. And if it cannot fetch the memory, that means we found a page that cannot be mapped. So yay, we found a bug. So the probe wrapper, the thing that wraps avatar2, it'll set a breakpoint. The first time you launch it in the target, and then once the target is hit, it dumps everything that it can already know about. So it doesn't dump the whole kernel because it would be huge. It just dumps like the current page and the breakpoints that we have. No breakpoints, sorry, all the registers that we have. And then once the harness wants something more, it still has its little state and its breakpoint. And it'll just go there and fetch more memory as we go on. On the other side, we have the AFL, Unicorn Harness. And it'll just fork, and fork in Linux is a copy-on-write thing. So every page that we touch that is not changed will just stay the same. So it's pretty quick. It's still a syscall, so it's still slower than if you would do something in process, whatever. But it's actually pretty quick. I'll show you. There's a demo. So what would you do? If you use our tool, if you go to GitHub and download it, or first you download it, obviously, then you run all the set of scripts that we have. There's one optional one for debugging and one optional that sets your kernel up for fuzzing to be quicker. And then there's also some helper scripts that we use internally to spawn up QEMovieMs to first kernels. Then you find a parser. That's the biggest manual task. There's lots of manual analysis involved because you don't know where a parser is. It's this huge blob, so you drop this huge kernel into something and look for something that looks like a parser. Like it takes input. You know where the input goes to and it gets an output. Right. And then there's a config for our tool. The config needs where to break and where to stop fuzzing. Basically, where the parser ends. And then there's a function, a Python function that you have to implement that drops the data coming from AFL into its right position inside the kernel memory. For example, if you would fuzz OpenVswitch, you would take the input and then drop it. This is like this really weird setup because OpenVswitch doesn't take just a buffer. It takes an SK buff, which is an internal Linux kernel structure for socket buffers. And if we would just drop the input anywhere instead of the SK buff, then we would overwrite all the pointers and they need to be aligned. If we overwrite a pointer with 0, of course, it would crash. But in the real world, Linux would never put a 0 at this pointer address, so we would find loads of false positives. So instead, what we do, we go and just drop the data where the user could also put data so that we have minimal false positives. And then we start the probe wrapper. It'll wait. It'll set the breakpoint, wait. And then once the breakpoint is set, we can fuzz. And then start AFL, start AFL. And afterwards, we have a nice debugging setup where you can then see what happened. So let's see what happened. Since I just rebooted my machine, let's see if it actually happens. So there's the virtual machine here. This is inside the virtual machine. It says, oh, wow. It's super small. It's a lot better, right? I guess. Do you want to change colors? Sure. Sorry for that. Yep. Why is this here? I think white background maybe, this one? Sorry. Sorry for that. Thanks. Right, so this is the virtual machine. And then this is still small. This is inside our unicorn thingy. We can look in the config that we set up. We set the architecture. We set where the GDB port lives. We just need some addresses that you don't have to care about these. Then we set the break address. This is where the probe wrapper will break. And then we set where it should stop fuzzing. And then we have an init function that's called before fuzzing. So everything you can do here is called before. So what I did here was I just knocked out the function at this address, which is I can show you in. He, he, he, he. OK, probably that takes too long now. Well, you have to believe me that this would be K-printf. So it just takes forever to just print something. I don't think it would ever return because it needs some console syncing and we don't have console syncing in an emulator. There's no interrupts and nothing. And then this is the place input function. This one is a lot easier. This just takes up to 512 bytes if it's more, which is immediately exit. And then we read RDI, which is the parameter that we want to. That's just a buffer. And then we map the page there. So we make sure that there's actually something that we can write to. And then we write the input. The input comes from here. And this is just called by AFL, Magic Unicorn. And then this is the length that we actually have to set, right? So the parser knows how much it has to read from that. Oh, it worked pretty well, actually. Did I copy this address? No, OK. Well, it's K-printf. You can, I think you believe me. OK, so let's do the probe wrapper. This will set a break point. And then we insert this machine. We should start this function. And now it just stops doing things. It's paused now. The halted with the machine. And this guy says, OK, I'm initially done complete. I'm listening for requests from HarnessPy. So let's spawn up HarnessPy. OK, let's spawn up HarnessPy not. Let's spawn up AFL directly. Let's see what happens. So this starts AFL. And then it starts fuzzing. So I mean, some people may have seen AFL before. Who has seen AFL before? Sweet. So you see that it finds paths, which means that it's actually instrumented. So the emulator reports back if it finds new basic or jumps between basic blocks. You see that it's not super fast. Before, I had 100 per second. I think we would broke something or something. But it's a really slow laptop that we fuzz on now. And it's probably super hot out in the sun. It's fine. So it starts crawling. It's looking for new paths. And let's see if it finds something. I think there should be something to find. It's like an Easter egg hunt. But of course, it's not deterministic. So it might happen any time. Still waiting for new paths. You're waiting for new paths. OK. New paths. Fingers crossed. One minute. Damn it. Oh, well, that's fuzzing. Sometimes you don't find something. OK, let's continue as if we would have found something. Too bad. Well, I found something earlier, so that's good. That's live demos. OK, so anyway, we can continue as if we would have found. Say we would have found something at this point. Then we could do harness pi minus t, FL outputs. Let's see if, oh, should have reached. No, there's no crashes. So let's say we already found something before at the same place. And then we can do run it with minus t, which will trace every single instruction. And at the end, we find that it tries to read address 0. And address 0, of course, cannot be read. So it doesn't work. It crashes at this point, which is good, which is what we want. So that means the kernel would also crash at this place. And then we can do it with minus d, which starts the debugger, which is UD debug. And it's specifically written for Unicorn. It's pretty nice. It does all this. It looks a bit like your famous favorite GDB shell. You can just step through every single function and you see all the registers. So if we continue here, at some point, it'll show us invalid memory read. You see read unmapped. So we can't map this thing. And we see here the function is at this place, 8.6. And 8.6, we see there is a move from the address of rix to edx. And the address of rix at this place is 0. So there is an out point that it tries to rev. So that's a nice bug. That usually, if I would have found by now, oh, it did. Nice. Oh yeah, so we just didn't wait long enough. After two minutes, it found the bug again. So that's good. Cool. And then one more thing. Just to see that I'm not bullshitting, let's kill this. That's the debugger. So we're back in our little thing here. And now we do this one. And then it'll just reboot and kernel offset. It just panicked. So it crashed. That is the real kernel now. This is not an emulator. This is just taking the input from AFL, rerunning it as an actual network input to the kernel. And then the kernel crashes at this place. Cool. So this is your little bug. And it works. Let's see on what else we have. Cool. So speed, as you've seen, actually it could be faster after restart. Because then the pages are already there. Oh, it found a new bug. No, it's still only three unique ones. This time it should be faster, actually. Because the next time around, it'll already pre-map before fork all the pages. And then, yeah, exec speed should be a lot better. OK, now it's more than four times as quick. So now we should find the bug in less than the time. Anyway, we found the as in one parser bug. That was an as in one parser bug. Of course, the Linux has as in one parsers also. And of course, as in one is never working, so that broke. And yeah, the fastest we found with the proper machine was the fastest we could was like four times as fast. Because my machine is quick, and not that quick. And it's still like 10 times slower than proper AFL, if you would make when ported. And then even slower if you would use something like lipfuzzer, that would never respawn the process. But it's OK. Problem is you won't find state-dependent bugs. So something that you need to set up before, some syscalls that need to happen before, or something, you won't find these. But on the plus side, there's no timers, race conditions, nothing. And yeah, the speed could be better in Unicorn and lots of manual analysis. So Unicorn, where is GSBase stored? x8664, just like a pop quiz to the crowd. Anybody? Cool. So yeah, nobody. OK, so GSBase is a register, and Unicorn also doesn't know, or they know, but you can't write it directly. Usually you can interact pretty well with Unicorn in this case. That's why we need to scratch addresses. You need to do a workaround for everything. So there's many things that Unicorn can't do right. Hence, that's also why the ARM target is not done yet, because there's also bugs. But it'll happen at some point. The nice things, though, we can now fuzz many things that are not fuzzable easily, and we can reproduce the bugs perfectly. Because yeah, that's something in syscaller. It's kind of hard, and they got it now down. But if you have many syscalls, and races, and everything, it's usually hard to replay those. And that just works now. And then future work, ARM target, MIPS target, and so on and so forth. Also one thing, oh yeah, did refind the bug, obviously. One other thing would be the anger. OK, it's not instant. Anyway, I'm working on a symbolic execution in that thing. Also, that would be nice to find more paths, and so on and so forth. So it finds bugs. I think I'm already past my time. We did find this denial of service, and Samba stuff, kernel things. Speed could be better, could be worse. And it's now open source, and you can download it and run it and try to find your own kernel bugs. So.