 about high-speed binary fuzzing. We have two researchers that will be presenting the product of the latest work, which is a framework for static binary rewriting. Our speakers are, first one is a computer science master student at EPFL. And the second one is a security researcher and assistant professor at EPFL. Please give a big round of applause to NSpace and Ganymo. Thanks for the introduction. It's a pleasure to be here, as always. We're going to talk about different ways to speed up your fuzzing and to find different kinds of vulnerabilities, or to tweak your binaries in somewhat unintended ways. I'm Matthias Peier, or I go by Ganymo on Twitter. I'm an assistant professor at EPFL, working on different forms of software security, fuzzing, sanitization, but also different kinds of mitigations. And Matteo over there is working on his master's thesis on different forms of binary rewriting for the kernel. And today we're going to take you on a journey on how to actually develop very fast and very efficient binary rewriting mechanisms that allow you to do unintended modifications to the binaries and allow you to explore different kinds of unintended features in binaries. So about this talk, what we discovered, or the reason why we set out on this journey, was that fuzzing binaries is really, really hard. There's very few tools in user space. It's extremely hard to set it up, and it's extremely hard to set it up in a performant way. The setup is complex. You have to compile different tools. You have to modify it. And the results are not really that satisfactory. As soon as you move to the kernel, fuzzing binaries in the kernel is even harder. There's no tooling whatsoever. There's very few users actually working with binary code in the kernel, or modifying binary code. And it's just a nightmare to work with. So what we are presenting today is a new approach that allows you to instrument any form of binary code, or modern binary code, based on static rewriting, which gives you full native performance. You only pay for the instrumentation that you add, and you can do very heavyweight transformations on top of it. The picture, if you look at the modern system, let's say you are looking at a modern setup. Let's say you're looking at cat pictures in your browser, Chrome, plus the kernel, plus the libc, plus the graphical user interface to get a clock up at about 100 million lines of code. Instrumenting all of this for some form of security analysis is a nightmare. Especially along this large stack of software, there's quite a bit of different compilers involved. There's different linkers. It may be compiled on a different system with different settings, and so on. And then getting your instrumentation across all of this is pretty much impossible and extremely hard to work with. And we want to enable you to select those different parts that you're actually interested in, modify those, and then focus your fuzzing or analysis approaches on those small subsets of the code, giving you a much better and stronger capability to test the systems or those parts of the system that you're really, really interested in. Who's worked on fuzzing before? Quick show of hands. Wow, that's a bunch of you. Do you use AFL? Yeah, most of you AFL. Libfuzzer, cool. About 10%, 15% lipfuzzer, 30% fuzzing, and AFL. So there's quite good knowledge of fuzzing. So I'm not going to spend too much time on fuzzing. But for those that haven't really run their fuzzing campaigns yet, it's a very simple software testing technique. You're effectively taking a binary, let's say Chrome, as a target, and you're running this in some form of execution environment. And fuzzing then consists of some form of input generation that creates new test cases, throws them at your program, and checks what is happening with your program. And either everything is OK, and your code is being executed, and your input, the program terminates, everything is fine, or you have a bug report. If you have a bug report, you can use this, find the vulnerability, maybe develop a Pock, and then come up with some form of either exploit or patch or anything else. So this is pretty much fuzzing in a nutshell. How do you get fuzzing to be effective? How can you cover large source bases, complex code, and complex environment? Well, there's a couple of simple steps that you can take. And let's walk quickly through effective fuzzing 101. Well, first, you want to be able to create test cases that actually trigger bugs. And this is a very, very complicated part. And we need to have some notion of the inputs that a program accepts. And we need to have some notion of how we can explore different parts of the program, different parts of functionality. Well, on one hand, we could have a developer write all the test cases by hand, but this would be kind of boring. And it would also require a lot of human effort in creating these different inputs and so on. So coverage-guided fuzzing has evolved as a very simple way to guide the fuzzing process, leveraging the information on which parts of the code have been executed by simply tracing the individual paths through the program based on the execution flow. So the fuzzer can use this feedback to then modify the inputs that are being thrown at the fuzzing process. The second step is the fuzzer must be able to detect bugs. If you've ever looked at a memory corruption, if you're just writing one byte after the end of a buffer, it's highly likely that your software is not going to crash. But it's still a bug. It may still be exploitable based on the underlying conditions. So we want to be able to detect violations as soon as they happen, for example, based on some form of sanitization that we add, some form of instrumentation that we add to the binary that then tells us, hey, there's a violation of the memory safety property. And we terminate the application right away as a feedback to the fuzzer. Third but the, and last but not least, speed is key, right? For if you're running a fuzzing campaign, you have a fixed resource budget. You have a couple of cores and you want to run for 24 hours, 48 hours, a couple of days. But in any way, whatever your constraints are, you have a fixed amount of instructions that you can actually execute. And you have to decide, am I spending my instructions on generating new inputs, tracking constraints, finding bugs, running sanitization, or executing the program. And you need to find a balance between all of them. As it is a zero sum game, you have a fixed amount of resources and you're trying to make the best of these resources. So any overhead is slowing you, slowing you down. And in the end, this becomes an optimization problem. How can you most effectively use the resources that you have available? As we are fuzzing with source code, it's quite easy to actually leverage existing mechanisms and we add all that instrumentation at compile time. We take source code, we pipe it through the compiler and modern compiler platforms allow you to instrument and add little code snippets during the compilation process that then carry out all these tasks that are useful for fuzzing. So for example, modern compilers can add short snippets of code for coverage tracking that will record which parts of the code that you have executed or for sanitization, which record and check every single memory access if it is safe or not. And then when you're running the instrument of binary, everything is fine and you can detect the policy violations as you go along. Now, if you would have source code for everything, this would be amazing, but it's often not the case, right? We may be able on Linux to cover a large part of the protocol stack by focusing only on source code based approaches, but there may be applications where no source code is available. If we move to Android or other mobile systems, there's many drivers that are not available as open source or just available as binary blobs. Or the full software stack may be closed source and we only get the binaries. And we still want to find vulnerabilities in these complex software stacks that span hundreds of millions of lines of code at a very efficient way. The only solution to cover this part of massive code base is to actually rewrite and focus on binaries. A very simple approach could be black box fuzzing, but this doesn't really get you anywhere because you don't get any feedback, you don't get any information if you're triggering bugs. So one simple approach, and this is the approach that is most dominantly used today, is to rewrite the program or the binary dynamically. So you're taking the binary and during execution, you use some form of dynamic binary instrumentation based on pin, anger or some other binary rewriting tool and translate the targeted runtime adding this binary instrumentation on top of it as you're executing it. It's simple, it's straightforward, but it comes at a terrible performance cost of 10 to 100 X lowdown, which is not really effective and you're spending all your cores and your cycles on just executing the binary instrumentation. So we don't really want to do this and we want to have something that's more effective than that. So what we are focusing on is to do static rewriting. It involves a much more complex analysis as we are rewriting the binary before it is being executed and we have to recover all of the control flow, all of the different mechanisms, but it results in a much better performance and we can get more bang for our buck. So why is static rewriting so challenging? Well first, simply adding code will break the target. So if you're disassembling this piece of code here, which is a simple loop that loads data, decrements the registers and then jumps if you're not at the end of the array and keeps iterating through this array. Now, as you look at the jump not zero instruction, the last instruction of the snippet, it is a relative offset. So it jumps backwards seven bytes, which is nice if you just execute the code as is, but as soon as you want to insert new code, you change the offsets in the program and you're modifying all these different offsets and simply adding new code somewhere in between will break the target. So a core feature that we need to enforce or core property that we need to enforce is that we must find all the references and properly adjust them, both relative offsets and absolute offsets as well. Getting a single one wrong will break everything. What makes this problem really, really hard is that if you're looking at the binary, a byte is a byte, right? There's no way for us to distinguish between scalers and references and in fact they are indistinguishable. Getting a single reference wrong breaks the target and would introduce arbitrary crashes. So we have to come up with ways that allow us to distinguish between the two. So for example, if you have this code here, it takes a value and stores it somewhere on the stack. This could come from two different kind of high level constructs. On one hand it could be taking the address of a function and storing this function address somewhere in a stack variable or it could be just storing a scalar in a stack variable and these two are indistinguishable and rewriting them as soon as we add new code, the offsets will change. If it is a function, we would have to modify the value. If it is a scalar, we have to keep the same value. So how can we come up with a way that allows us to distinguish between the two and rewrite binaries by recovering this missing information? So let us take, let me take you or let us take you on a journey towards instrumenting binaries in the kernel. This is what we aim for. We'll start with the simple case of instrumenting binaries in user land. Talk about different kinds of coverage guided fuzzing and what kind of instrumentation we can add, what kind of sanitization we can add and then focusing on taking it all together and applying it to kernel binaries to see what will fall out of it. Let's start with instrumenting binaries first. I will now talk a little bit about retrowrite, our mechanism and our tool that enables static binary instrumentation by symbolizing existing binaries. So we recover the information and we translate relative offsets and absolute offsets into actual labels that are added to the assembly file. The instrumentation can then work on the recovered assembly file which can then be reassembled into a binary that it can then be executed for fuzzing. We implement coverage tracking and binary address sanitizer on top of this, leveraging the abstraction as we go forward. The key to enabling this kind of binary rewriting is position independent code. And position independent code has become the de facto standard for any code that is being executed on a modern system and it effectively says that it is code that can be loaded at any arbitrary address in your address space as you're executing binaries. It is essential in a requirement if you want to have address space layered randomization or if you want to use shared libraries which de facto you want to use in all these different systems. So since a couple of years, all the code that you're executing on your phones, on your desktops, on your laptops is position independent code. And the idea between the position independent code is like you can load it anywhere in your address space and you can therefore not use any hard coded static addresses and you have to inform the system with relocations or pick relative addresses on how the system can relocate these different mechanisms. On x8664, position independent code leverages addressing that is relative to the instruction pointer. So for example it uses the current instruction pointer and then a relative offset to that instruction pointer to reference global variables, other functions and so on. And this is a very easy way for us to distinguish references from constants especially in pi binaries. If it is rip relative, it is a reference, everything else is a constant. And we can build our translation algorithm and our translation mechanism based on this fundamental finding to remove any form of heuristic that is needed by focusing especially on position independent code. So we're supporting position independent code. We don't support non-position independent code but we give you the guarantee that we can rewrite all the different code that is out there. So symbolization works as follows. If you have the little bit of code on the lower right, symbolization replaces first all the references with assembler labels. So look at the call instruction and the jump not zero instruction. The call instruction references an absolute address and the jump not zero instruction jumps backward relative 15 bytes. So by focusing on these relative jumps and calls we can replace them with actual labels and rewrite the binary as follows. So we're calling a function replacing it with the actual label and for the jump not zero we are inserting an actual label in the assembly code and adding a backward reference. For PC relative addresses, for example the data load we can then replace it with the name of the actual data that we have recovered and we can then add all the different relocations and use that as auxiliary information on top of it. After these three steps we can insert any new code in between and can therefore add different forms of instrumentations or run some more higher level analysis on top of it and then reassemble the file for fuzzing or coverage guided tracking, address sanitization or whatever else you want to do. I will now hand over to Matteo who will cover coverage guided fuzzing and sanitization and then instrumenting the binaries in the kernel. Go ahead. So now that we have this really nice framework to rewrite binaries, one of the things that we want to add to actually get to fuzzing is this coverage tracking instrumentation. So coverage guided fuzzing is a way, a method for to let the fuzzer discover interesting inputs and interesting paths to the target by itself. So the basic idea is that the fuzzer will track, cover the parts of the programs that are covered by different inputs by inserting some kind of instrumentation. So for example, here we have this target program that checks if the input contains the string P and G at the beginning and if it does then it does something interesting otherwise it just bails out and fails. So if we tracks the part of the programs that each input executes, the fuzzer can figure out that an input that contains P will have discovered a different path through the program than an input that doesn't contain it and then so on it can one byte at a time discover that this program expects this magic sequence P and G at the start of the input. So the way that the fuzzer does this is that every time a new input discovers a new path through the target it is considered interesting and added to a corpus of interesting inputs and every time the fuzzer needs to generate a new input it will select something from the corpus mutated randomly and then use it as a new input. So this is like conceptually pretty simple but in practice it works really well and really lets the fuzzer discover the format that the target expects in an unsupervised way. So as an example, this is an experiment that was run by the author of AFL. AFL is the fuzzer that sort of popularized this technique where he was fuzzing on JPEG parsing library starting from a corpus that only contained the string hello. So now clearly hello is not a valid JPEG image. And so, but still like the fuzzer was still able to find to discover the correct format. So after a while it started generating some grayscale images on the top left and as it generated more and more inputs it started generating more interesting images such as some grayscale gradients and later on even some color images. So as you can see this really works and it allows us to fuzz a program without really teaching the fuzzer how the image should look like. So that's it for coverage guiding fuzzing. Now we'll talk a bit about sanitizations. As a reminder, the core idea behind sanitization is that just looking for crashes is likely to miss some of the bugs. So for example, if you have this out of buns one by three that will probably not crash the target but he would still like to catch it because it could be used for an infoli, for example. So one of the most popular sanitizers is address sanitizer. So address sanitizer will instrument all the memory accesses in your program and check for memory corruption. Which is, so memory corruption is a pretty dangerous class of bugs that unfortunately still plays C and C++ programs and unsafe languages in general. And ASIN tries to catch it by instrumenting the target. It is very popular. It has been used to find thousands of bugs in complex software like Chrome and Linux. And even though it has like a bit of a slowdown like about 2x, it is still really popular because it lets you find many, many more bugs. So how does it work? The basic idea is that ASIN will insert some special reasons of memory called red zones around every object in memory. So we have a small example here where we declare a four byte array on the stack. So ASIN will allocate the array, buff and then add a red zone before it and a red zone after it. Whenever the program accesses a red zones, it is terminated with a security violation. So the instrumentation just prints a bug report and then crashes the target. This is very useful for detecting, for example, buffer overflows or underflows. Whenever other kinds of bias let's just use of the free and so on. So as an example here, we're trying to copy five bytes into a four byte buffer and ASIN will check each of the accesses one by one. And when it sees that the last byte writes a red zone, it detects the violation and crashes the program. So this is good for us because this bug might have not been found by simply looking for crashes, but it's definitely found if you use ASIN. So this is something we want for fuzzing. So now when we briefly cover ASIN, we can talk about instrumenting binary in the kernel. So Matthias left us with Rhetorite and with Rhetorite, we can add both coverage tracking and ASIN to binaries. So it's a really simple idea. Now that we can rewrite this binary and add instructions wherever we want, we can implement both coverage tracking and ASIN. In order to implement coverage tracking, we simply have to identify the start of every basic block and add a little piece of instrumentation at the start of the basic block that tells the fuzzer, hey, we've reached this part of the program. Hey, we've reached this other part of the program. Then the fuzzer can figure out whether that's a new part or not. ASIN is also somewhat... It can also be implemented in this way by finding all memory accesses and then linking with libASIN. LibASIN is a sort of runtime for ASIN that takes care of inserting the red zones and adding, like, keeping around all the metadata that ASIN needs to know where the red zones are and detecting whether a memory access is invalid. So how can we apply all of this in the kernel? Well, first of all, fuzzing the kernel is not as easy as fuzzing some user space program. There's some issues here. So first of all, there's crash handling. So whenever you're fuzzing a user space program, you expect crashes. Well, because that's what we're after. And if a user space program crashes, then the US simply terminates gracefully. And so the fuzzer can detect this and save the input as a crashing input and so on. And this is all fine. But when you're fuzzing the kernel, so if you were fuzzing the kernel of the machine that you were using for fuzzing after a while, the machine would just go down. Because after all, the kernel runs the machine, and if it starts misbehaving, then all of it can go wrong. And more importantly, you can lose your crashes. Because if the machine crashes, then the state of the fuzzer is lost, and you have no idea what your crashing input was. So what most kernel fuzzers have to do is they resort to some kind of VM to keep the system stable. So they fuzz the kernel in a VM and then run the fuzzing agent outside the VM. On top of that, there's two links. If you want to fuzz a user space program, you can just download AFL or use libfuzzer. There's plenty of tutorials online. It's really easy to set up and just compile your program, you start fuzzing, and you're good to go. If you want to fuzz the kernel, it's already much more complicated. So for example, if you want to fuzz Linux, we'd say syscaller, which is a popular kernel fuzzer. You have to compile the kernel. You have to use a special config that supports syscaller. You have way less guides available than for user space fuzzing. And in general, it's much more complex and less intuitive than just fuzzing user space. And lastly, we have the issue of determinism. So in general, if you have a single threaded user space program, unless it uses some kind of random damage generator, it is more or less deterministic. There's nothing that affects the execution of the program. But, and this is really nice if you want to try to reproduce the test case. Because if you have a non-deterministic test case, then it's really hard to know whether this is really a crash or if it's just something that you should ignore. And in a kernel, this has been harder because you don't only have concurrency, like multi-processing, you also have interrupts. So interrupts can happen at any time. And if one time you got an interrupt while executing your test case and the second time you didn't, then maybe it only crashes one time, you don't really know, it's not pretty. And so again, we have several approaches to fuzzing binaries in the kernel. First one is to do black box fuzzing. We don't really like this because it doesn't find much, especially in something complex like a kernel. Approach one is to use dynamic translation. So use something like QMU or you name it. This works and people have used this successfully. The problem is that it is really, really, really slow. Like we're talking about 10x plus overhead. And as we said before, the more iteration, the more test cases you can execute in the same amount of time, the better because you find more bugs. And on top of that, there's no currently available sanitizer for kernel binaries that works, is based on this approach. So in user space, you have something like Volgrine in the kernel, you don't have anything at least that we know of. There's another approach which is to use a Intel processor trace. This has been some research papers on this recently. And this is nice because it allows you to collect coverage at nearly zero overhead. It's really fast. But the problem is that it requires hardware support. So it requires a fairly new X86 CPU. And if you want to find something on ARM, say like your Android driver, or if you want to use an older CPU, then you're out of luck. And what's worse, you cannot really use it for sanitization or at least not the kind of sanitization that ASIN does because it just traces the execution, doesn't allow you to do checks on memory accesses. So approach three, which is what we will use is static rewriting. So we had this very nice framework for rewriting user space binaries. And then we asked ourselves, can we make this work in the kernel? So we took the system, the original Rhetorite, we modified it, we implemented support for Linux modules, and it works. So we have implemented, and we have used it to fuzz some kernel modules. And it really shows that this approach doesn't only work for user space, it can also be applied to the kernel. So as for some implementation, the nice thing about kernel modules is that they're always position independent. So you cannot have position, like fixed position kernel modules because Linux just doesn't allow it. So we sort of get that for free, which is nice. And Linux modules are also a special class of L files, which means that the format is, even though it's not the same as user space binaries, it's still somewhat similar. So we didn't have to change the symbolizer that much, which is also nice. And we implemented a symbolization with this, and we used it to implement both code coverage and binary asin for kernel binary modules. So for coverage, the idea behind the whole Rhetorite project was that we wanted to integrate with existing tools. So existing fuzzy tools. We didn't want to force our users to write their own fuzzer that is compatible with Rhetorite. So in user space, we had AFL style coverage tracking and binary asin, which is compatible with source-based asin, and we wanted to follow the same principle in the kernel. So it turns out that Linux has this built-in coverage tracking framework called KCov that is used by several popular kernel fuzzers like Syscaller, and we wanted to reuse it ourselves. So we designed our coverage instrumentation so that it integrates with KCov. The downside is that you need to compile the kernel with KCov, but then again, Linux is open source, so you can sort of always do that. It's usually not the kernel that is a binary blob, but it's usually only the modules, so that's still fine. And the way you do this is, the way you implement KCov for binary modules is that you just have to find the start of every basic block and add a call to some function that then stores the collected coverage. So here's an example. We have a short snippet of code with three basic blocks, and all we have to do is add a call to tracePC to the start of the basic block. TracePC is a function that is part of the main kernel image that then collects this coverage and makes it available to a user-space fudging agent. So this is all really easy, and it works, and let's not see how we implemented binary asin. So as I mentioned before, when we instrument a program with binary asin in user space, we link with lib asin, which takes care of setting up the metadata, takes care of putting the resonance around our locations, and so on. So we have to do something similar in the kernel. Of course, you cannot link with lib asin in the kernel because that doesn't work. But what we can do instead is, again, compile the kernel with k asin support. So this instrument's the allocator, kmaloc, to add red zones. It allocates space for the metadata. It keeps this metadata around. Does this all for us, which is really nice. And again, the big advantage of using this approach is that we can integrate seamlessly with the k asin instrumented kernel and with fuzzers that rely on k asin, such as syscaller. So we see this as more of a plus than a limitation. And how do you implement asin? Well, you have to find every memory access and instrument it to check whether this is discussing a red zone. And if it doesn't, then you just call this bug report function that produces a struct trace, a bug report, and crashes the kernel so that the fuzzer can detect it. Again, this is compatible with source-based casein, so we're happy. And we can simply load a rewritten module with added instrumentation into a kernel as long as you have compiled the kernel with the right flags. And we can use a standard kernel fuzzer. Here, for the R evaluation, we use syscaller, popular kernel fuzzer by some folks at Google. And it works really well. So we've finally reached the end of our journey. And now we wanted to present some experiments with it to see if this really works. So for user space, we wanted to compare the performance of our binary asin with source-based asin and with existing solutions that also work on binaries. So for user space, you can use a Volgrain memcheck. It's a memory sanitizer that is based on binary translation and dynamic binary translation and works in binaries. We compared it with source asin and a red zone asin on the spec CPU benchmark and saw how fast it was. And for the kernel, we decided to fuzz some file systems and some drivers with syscaller using both source-based and source-based ks and kcov and kreditor-write-based ks and kcov. So these are our results for user space. So the red bar is Volgrain. We can see that the execution time with Volgrain is the highest. It is really, really slow, like 3, 10, 30x overhead, way too slow for fuzzing. Then in green, we have our binary asin, which is already a large improvement. In orange, we have source-based asin. And then finally, in blue, we have the original code without any instrumentation whatsoever. So we can see that source-based asin has some 2x or 3x overhead. And binary asin is a bit higher, a bit less efficient, but still somewhat close. So that's it for user space. And for the kernel, these are some preliminary results. So I'm doing this work as part of my master thesis. And so I'm still running the evaluation. Here, we can see that the overhead is already a bit lower. So the reason for this is that spec is a pure CPU benchmark. Doesn't interact with the system that much. And so any instrumentation that you add is going to massively slow down or considerably slow down the execution. By contrast, when you fuzz a file system with Syscaller, not only every test case has to go from the host to the guest and then do multipossess calls and so on, but also every system call has to go through several layers of extraction before it gets to the actual file system. And all of this takes a lot of time. And so in practice, the overhead of our instrumentation seems to be pretty reasonable. So since we know that you like demos, we've reported a small demo of Kretorite. So let's see. All right, so we prepared a small kernel module. And this module is really simple, contains a vulnerability. And what it does is that it creates a character device. So if you're not familiar with this, a character device is like a fake file that is exposed by a kernel driver and they can read to and write from. And instead of going to a file, the data that you read, that you in this case write to the fake file goes to the driver and is handled by this demo write function. So as we can see, this function allocates a buffer, a 16 byte buffer on the heap, and then copies the data into it and then it checks if the data contains the string 1337. If it does, then it accesses the buffer out of bounds. You can see a lock 16 and the buffer 16 byte. This is a out of bounds read by one byte. And if it doesn't, then it just accesses the buffer in bounds, which is fine. And it's not a vulnerability. So we can compile this driver. Okay. Okay, and then so we have our module and then we will instrument it using Kretorite. So instrument, yes, please, okay. So Kretorite does some processing and it produced an instrumented module with ASAN or KSAN and a symbolized assembly file. We can actually have a look at the symbolized assembly file to see what it looks like. Yes, yes, okay. So it's big enough. Yeah, as you can see, so we can actually see here the ASAN instrumentation. Ah, shouldn't, yeah. So this is the ASAN instrumentation. The original code loads some data from this address. And as you can see, the ASAN instrumentation first computes the actual address and then does some checking. Basically this is checking some metadata that ASAN stores to check if the address is in a red zone or not. And then if the check fails, then it calls this ASAN report which produces a stack trace and crashes the kernel. So this is fine. We can actually even look at this assembly of both modules, so object and then demo. Okay, so on the left we have the original module without any instrumentation. On the right, we have the module instrumented with ASAN. So as you can see, the original module has pushed R13 and then has this memory load here. On the right, in the instrumented module, Kretter right inserted the ASAN instrumentation. So the original load is still down here, but between the first instruction and this instruction, we now have the KS instrumentation that does our check. So this is all fine. Now we can actually test it and see what it does. So we will boot a very simple, a very minimal Linux system and try to trigger the vulnerability first with the non-instrumented module and then with the instrumented module. And we will see that with the non-instrumented module, the kernel will not crash, but with the instrumented module, it will crash and produce a bug report. So let's see. Yeah, this is a KMEVM. I have no idea why it's taking so long to boot. I'll blame the demo gods. Not being kind to us. Yeah, I guess we just have to wait. Okay, all right. So we loaded the module. We will see that it has created a FIC file, character device in Dav demo. Yep, we can write to this file. Yep, so this will, this access to the array inbound and so this is fine. Then what we can also do is write 1337 to it so it will access the array out of bounds. So this is the non-instrumented module, so this will not crash. It will just bring some garbage value. Okay, that's it. Now we can load the instrumented module instead and do the same experiment again. All right, we can see that that demo is still here. So the module still works. Let's try to write one, two, three, four into it. This again doesn't crash, but when we try to run 1337, this will produce a bug report. So this has quite a lot of information. We can see where the memory was allocated. There's a stack trace for that. It wasn't free, so there's no stack trace for the free and we see that the cache size of the memory, like it was a 16 byte allocation. We can see the shape of the memory. We see that these two zeros means that there's two eight byte chunks of valid memory and then these FCFCFC are the red zones that I was talking about before. All right, so that's it for the demo. We will switch back to our presentation now. So hope you enjoyed it. Cool, so after applying this to a demo module, we also wanted to see what happens if we apply this to a real file system. After a couple of hours, we were, when we came back and checked on the results, we saw a couple of issues popping up, including a nice set of use after free reads, a set of use after free writes, and we checked the bug reports, we saw a whole bunch of Linux kernel issues popping up one after the other in this nondescript module that we first. We're in the process of reporting it. This will take some time until it is fixed. That's why you see the blurry lines. But as you see, there's still quite a bit of opportunity in the Linux kernel where you can apply different forms of targeted fuzzing into different modules, leverage these modules on top of the a KZN instrumentate instrumented kernel, and then leveraging this as part of your fuzzing tool chain to find interesting kernel O-days that, yeah, you can then develop further or report or do whatever you want with them. Now, we've shown you how you can take existing binary only modules, think different binary only drivers, or even existing modules where you don't want to instrument the full set of the Linux kernel, but only focus fuzzing and exploration on a small, different, small limited piece of code, and then do security tests and those. We've shown you how you can do coverage-based tracking and address sanitization, but this is also up to you on what kind of other instrumentation you want. Like, this is just a tool, a framework that allows you to do arbitrary forms of instrumentation. So we've taken you on a journey from instrumenting binaries over coverage-guided fuzzing and sanitization to instrumenting modules in the kernel and then finding crashes in the kernel. Let me wrap up the talk. So this is one of the fun pieces of work that we do in the Hexhive lab at EPFL. So if you're looking for post-doc opportunities or if you're thinking about a PhD, come talk to us, we're always hiring. The tools will be released as open source. A large chunk of the user space work is already open source. We are working on a set of additional demos and so on so that you can get started faster, leveraging the different existing instrumentation is already out there. The user space work is already available. The kernel work will be available in a couple of weeks. This allows you to instrument real world binaries for fuzzing, leveraging existing transformations for coverage tracking to enable fast and effective fuzzing and memory checking to detect the actual bugs that exist there. The key takeaway from this talk is that retro-write and k-retro-write enables static binary rewriting at zero instrumentation cost. We take the limitation of focusing only on position-independent code, which is not a real implementation, but we get the advantage of being able to symbolize without actually relying on heuristics so we can even symbolize large complex applications and effectively rewrite those aspects and then you can focus fuzzing on these parts. Another point I want to mention is that this enables you to reuse existing tooling. So you can take a binary blob, instrument it, and then reuse, for example, address sanitizer or existing fuzzing tools as it integrates really, really nicely. As I said, all the code is open source. Check it out, try it, let us know if it breaks. We're happy to fix. We are committed to open source. And let us know if there are any questions. Thank you. So thanks guys for an interesting talk. We have some time for questions. So we have microphones along the aisles. We'll start from question from microphone number two. Hi, thanks for your talk and for the demo. I'm not sure about the use case you showed for the kernel retrowrite because you are usually interested in fuzzing binary in kernel space when you don't have source code for the kernel. For example, for IoT or Android but you just reuse the KCav and Casan in the kernel and you never have the kernel in IoT or Android which is compiled with that. So do you have any plans to binary instrument the kernel itself, not the modules? So we thought about that. I think that there's some additional problems that we would have to solve in order to be able to instrument the full kernel. So other than the fact that it gives us compatibility with existing tools, the reason why we decided to go with compiling the kernel with KCav is that building the, you have to think about it. You have to instrument the memory allocator to add red zones which is already somewhat complex. You have to instrument the exception handlers to catch any faults that the instrumentation detects. It would have to set up some memory for the ace in shadow. So this is like, I think you should be able to do it but it would require a lot of additional work. So this was like four months' thesis. So we decided to start small and prove that it works in the kernel for modules and then leave it to future work to actually extend it to the full kernel. Also, I think for Android, so in the case of Linux the kernel is GPL, right? So if the manufacturer ships a custom kernel they have to release the source code, right? They never do. They never, well, that's a different issue, right? So that's why I ask because I don't see how it is can be used in the real world. No, let me try to put this into perspective a little bit as well, right? So what we did so far is we leveraged existing tools like KZen or KCav and integrate into these existing tools. Now, doing heap-based allocation is fairly simple and replacing those with additional red zones. That instrumentation you can carry out fairly well by focusing on the different allocators. Second to that, simply oops-ing the kernel and printing the stack trace is also fairly straightforward. So it's not a lot of additional effort. So it involves some engineering effort to port this to non-KZen-compiled kernels but we think it is very feasible. In the interest of time, we focused on KZen-enabled kernels so that some form of AZen is already enabled. But yeah, this is additional engineering effort. But there's also a community out there that can help us with these kind of changes. So the Kretorite and Retorite themselves are the binary rewriting platform that allow you to turn a binary into an assembly file that you can then instrument and run different passes on top of it. So another pass would be a full AZen pass or KZen pass that somebody could add and then contribute back to the community. Yeah, it would be really useful. Thanks, Koop. Next question from the internet. Yes, there is a question regarding the slide on the spec CPU benchmark. The second or third graph from the right had an instrumented version that was faster than the original program. Why is that? Cash effect. Thank you. Microphone number one. Thank you. Thank you for your presentation. I have a question. How many of you actually do support and if you have support more, then? X86-64. Okay, so no plans for ARM or MIPS. No, there are plans. Okay. Right, so again, there's a finite amount of time. We focus on the technology. ARM is high up on the list. If somebody's interested on working on it and contributing, we're happy to hear from it. Our list of targets is ARM first and then maybe something else. But I think with X86-64 and ARM, we cover the majority of the interesting platforms. And second question. Did you try to fast any real closed source program because as I understand from presentation, you fast like just file system, what we can compile and fast with C-scolar in a bus? So for the evaluation, we wanted to be able to compare between the source base instrumentation and the binary base instrumentation. So we focused mostly on open source file system and drivers because then we could instrument them with the compiler. We haven't yet tried, but this is also pretty high up on the list. We wanted to try to find some closed source drivers. There's lots of them, like for GPUs or anything, and we'll give it a try and find some more days perhaps. Yes, but with C-scolar, you still have a problem. You have to write rules or dictionaries. I mean, you have to understand the format of how to communicate with the driver. Yeah, right. But there's, for example, closed source file systems that we are looking at. Okay, thank you. Number two. Hi, thank you for your talk. So I don't know if there are any K-Cov or a K-SAN equivalent solution to Windows, but I was wondering if you tried or are you planning to do it on Windows, the framework, because I know it might be challenging because of the driver signature enforcement and PetchGuard, but I wondered if you tried or thought about it. Yes, we thought about it and we decided against it. Windows is incredibly hard and we are academics. The research I do in my lab or we do in my research lab focuses on predominantly open source software and empowers open source software. Doing full support for Microsoft Windows is somewhat out of scope. If somebody wants to port these tools, we are happy to hear and work with these people, but it's a lot of additional engineering effort with very low additional research value. So we'll have to find some form of compromise. And if you would be willing to fund us, we would go ahead, but it's a cost question. And you're referring both to kernel and user space, right? Yeah. Okay, thank you. Number five. Hi, thanks for the talk. This seems more interesting if you're looking for vulnerabilities in closed source kernel modules, but not giving you too much thought. It seems it's really trivial to prevent this if you're writing your closed source module. Well, how would you prevent this? Well, for starters, you would just take a difference between the address of two functions. That's not gonna be IP relative, so... Right, so we explicitly, like even in the original WriterWrite paper, we explicitly decided to not try to deal with obfuscated code or code that is purposefully trying to defeat this kind of rewriting because the assumption is that first of all, there are techniques to obfuscate code or remove these checks in some way, but this is sort of orthogonal work. And at the same time, I guess most drivers are not really compiled with this sort of obfuscation. They're just compiled with a regular compiler. But yeah, of course, this is a limitation. They're likely stripped, but not necessarily obfuscated, at least from what we've seen when we looked at binary-only drivers. Microphone number two. How do you decide where to place the red zones? From what I heard, you talked about instrumenting the allocators, but well, there are a lot of variables on the stack, so how do you deal with those? Oh yeah, that's actually super cool. I refer to some extent to the paper that is on the GitHub repo as well. If you think about it, modern compilers use canaries for buffers. Are you aware of stack canaries? And how stack canaries work? So stack canaries, if the compiler sees there's a buffer that may be overflown, it places a stack canary between the buffer and any other data. What we use as part of our analysis tool, we find these stack canaries, remove the code that does the stack canary, and use this space to place our red zones. So we actually hack the stack canaries, remove their code, and add asan red zones into the empty stack canaries that are now there. It's actually a super cool optimization because we piggyback on what kind of work the compiler already did for us before, and we can then leverage that to gain additional benefits and protect the stack as well. Thanks. Another question from the internet. Yes, did you consider lifting the binary code to LLVMIR instead of generating assembler source? Yes. But so a little bit longer answer. Yes, we did consider that. Yes, it would be super nice to lift to LLVMIR. We've actually looked into this. It's incredibly hard. It's incredibly complex. There's no direct mapping between the machine code equivalent and the LLVMIR. You would still need to recover all the types. So it's like this magic dream that you recover full LLVMIR and then do heavyweight transformations on top of it. But this is incredibly hard because if you compile down from LLVMIR to machine code, you lose a massive amount of information and you would have to find a way to recover all of that information which is pretty much impossible and undecidable for many cases. So for example, just as a note, we only recover control flow and we only de-symbolize control flow for data references. We don't support instrumentation of data references yet because there's still an undecidable problem that we are facing with. I can talk more about this offline or there's a note in the paper as well. So this is just a small problem only if you're lifting to assembly files. If you're lifting to LLVMIR, you would have to do full end-to-end type recovery which is massively more complicated. Yes, it would be super nice. Unfortunately, it is undecidable and really, really hard. So you can come up with some heuristics but there is no solution that will be correct 100% at the time. I'll take one more question from microphone number six. Thank you for your talk. What kind of disassemblers did you use for retro-write and did you have problems with the wrong disassembly and if so, how did you handle it? So retro-write, so we used capstone for the disassembly. An amazing tool by the way. So the idea is that we need some kind of, some informations about where the functions are. So for the kernel modules, this is actually fine because kernel modules come with this sort of information because the kernel needs it to build stack traces, for example. For user space binaries, this is somewhat less common but you can use another tool to try to do function identification and we do disassemble the entire function. So we have run into some issues with like in AT&T syntax because we wanted to use a gas noose assembler for reassembly, yeah. And some instructions are a lot, you can express the same, like two different instructions, like five byte knob and six byte knob using the same string of a text, a mnemonic and operand string. But the problem is that the kernel doesn't like it and crashes, this took me like two days to be back. So the kernel uses dynamic binary patching when it runs at runtime and it uses fixed offsets. So if you replace a five byte knob with a six byte knob or vice versa, your offsets change and your kernel just blows up in your face. So it was kind of on case basis where you saw the errors coming out of the disassembly and you had to fix it, right? So sorry, can you repeat the question? Like for example, if some instruction is not supported by the disassembler, so you saw it that it crashed at something wrong and then you fixed it by hand. Yeah, well, if we saw that there was a problem with the disassembly, like I don't recall having any unknown instructions in the disassembler, I don't think I've ever had a problem with that. But yeah, this was a lot of like, you know, engineering work like, so let me repeat, the problem was not a bug in the disassembler but an issue with the instruction format that the same mnemonic can be translated into two different instructions, one of which was five bytes long, the other one was six bytes wrong. Both used the exact same mnemonic, right? So this is an issue with assembly encoding. But you had no problems with unsupported instructions which couldn't be disassembled. No, no, not as far as I know at least. We have one more minute, so a very short question from microphone number two. Does it work? Is your binary instrumentation equally powerful as kernel address space, I mean, Kasan? So does it detect all the memory corruptions on stack, heap, and globals? No globals, but heap, it does all of them on the heap. There's some slight variation on the stack because we have to piggyback on the canary stuff. As I mentioned quickly before, there's no reflowing and full recovery of data layouts. So to get anything on the stack, we have to piggyback on existing compiler extensions like stack canaries. But so we don't support intra object or flows on the stack, but we do leverage the stack canaries to get some stack benefits, which is, I don't know, 90, 95% there because the stack canaries are pretty good. For heap, we get the same precision. For globals, we have very limited support. Thanks. So that's all the time we have for this talk. I can find the speakers, I think, afterwards offline, and then please give them a big round of applause for interesting talk. Thank you.