 So, thanks everyone for coming. It's the last slot of the day and I know the headphone thing is kind of weird. So let's get started. I want to talk to you today about making fuzzing part of your software development life cycle. Before we do that, I want to say a bit about myself, just to give you some context. So I've been working at Google for five years and during this entire time I've been working on fuzzing and by fuzzing I mean that I've been working on infrastructure for fuzzing rather than, you know, writing different fuzzers my whole time and so the infrastructure that I work on is for developers, ordinary developers, you know, non-security engineers to fuzz their own code. So I've basically been helping developers do that for five years. The first three of that was on Chrome and the last two have been on the newly created open source security team. That being said, usual disclaimers apply and I'm only here on behalf of myself today. So just like the key takeaway I want to share is that there are a bunch of tools these days such as OSS fuzz and cluster fuzz light that make it very easy to just sort of incorporate fuzzing into your software development life cycle almost as you would say like an integration test or maybe like a particularly long running one. I'll share how those tools do this in the talk. So to start let's just, you know, define what fuzzing is, like why should you care about it, you know, that sort of stuff. So who cares about it? Well, a lot of people do. Obviously, as I mentioned, Google does. We fuzz on hundreds of thousands of cores around the clock and we fuzz pretty much every like major like product that Google has. We fuzz Android, Chrome as I mentioned, Chrome OS, you know, the internal code that we work on and many other companies use fuzzing as well. Microsoft actually develops like their own fuzzing infrastructure that's similar to ours and the reason why, you know, it's used so widely is because of how good it is at discovering vulnerabilities. Every year it finds thousands of bugs in software. Many of them are security problems and so fuzzing is a sort of unique way in which you could find bugs by spending compute time rather than developer time. And another reason why you should fuz your code is because it's a technique that bad guys could use too, right? So, you know, the saying like fuz your code or someone else will and this is an example of that where a fuzzing tool released by Google called syscaller found a bug in the Linux kernel and I believe this bug was unpatched when syscaller found it but in any case it was later found that the NSO group was using this to target Android users. So, you know, the vulnerabilities that fuzzing can find are particularly like low hanging fruit for bad actors and can be weaponized by them in many cases as well. So what is fuzzing? The way I think of fuzzing is it's an automated process where you're creating new randomized inputs and feeding them to a program in order to get that program to crash. And it would sort of work like the pseudocode that I have on the slide here where you've got an infinite loop and the reason why it's like an infinite loop is because we don't really know when to stop fuzzing usually like fuzzing can show that like your code has bugs but it can't really prove that it doesn't have bugs. So it's like frequently fuzzers run for an infinite amount of time and obviously to incorporate it in your workflow you don't want to do that so we have some solutions for that but essentially you've got this infinite loop and just on each iteration you're creating a new input to feed to the code you want to test. And so how did this come about like, you know, how did this get started? I think it was in like the early 90s or maybe late 80s which is, you know, quite a while ago. It was when, it's at least when fuzzing was given the name fuzzing like there might have been like sort of variants on it that existed earlier but basically what had happened was an academic was connecting to their university computer and this connection was being done over dial-up and it was a dark and stormy night and so the rain was causing that dial-up connection to send like spurious characters to the programs that they were interacting with on their university server and these programs, sorry, these spurious characters actually caused these programs to crash and so the academics being as smart as they are figured well we don't only need rain to send these spurious characters to crash our programs we can actually write other programs to send these spurious characters and that's basically how fuzzing was born. And so the way these original fuzzers worked was pretty much like copying data from like DevU Random and feeding it to a target program. So say for example we were fuzzing something like Chrome's JavaScript engine which is V8. It would look something like this where we're taking bytes from DevU Random putting them in a file and then running V8 on that file. Now, I think you're probably very unlikely to find a bug using this technique at least in V8. This looks like nothing like JavaScript. These are unprintable characters. You're going to get a parse error very early on and there's not many like interesting bugs in V8's parser. Most of the bugs are like deeper down like in the just-in-time compiler having to do with like memory management so you really won't find anything with this technique. On the other hand, you have code like test cases like this which are relatively like simple which can crash V8. So we want to get fuzzers to be able to produce things like this on my left over here, this snippet which was actually produced by a fuzzer and does crash V8 I believe. And so the way this second generation of fuzzers addressed this problem was they were format aware. If you're fuzzing a program like a JavaScript engine, you would write another program that could either mutate JavaScript or create JavaScript from scratch and that JavaScript would just be very weird like the example I have here and that would sort of do weird behavior basically. You can cause the target program to get into an interesting state and from there hopefully crash the program or find a bug in it. And so like right you can see here just like a simple snippet of how it creates for loops. It's just literally like concatenating like for open paren, you know, an init statement, a conditional and then iterate statement. So that's like all well and good and you know it's still a very effective technique and it's used by security researchers and bad people as well today. But how do you get fuzzing to a state where it could actually be used by developers? You know this previous technique, that's just one line but you know these fuzzers will be very long and it's basically like a full-time job writing one. And you know software engineers want to write their own software. They don't want to spend that much effort on testing. So before we do that I want to just maybe think about some of the things we want out of fuzzing by looking at what makes other sorts of tests good. So one of these is the written by the developers themselves. I think in some places they might like have QA which is testing separately. That's never been anywhere I've worked and I think generally it's accepted that like developers should be testing their own code today. So that's one thing we want out of fuzzing. Another thing is like this probably be first but you know you want to test that actually find things, right? And so you know it's kind of obvious but it's worth stating. Next is you want your test to be continuously run. That's with this whole like CI practice, continuous integration. You know it's based on the sort of realization that it's easier to like fix your, you know it's easier if you're increment, if you're just merging very frequently than if you're waiting all at once. You know if you're testing, if you're testing every time before you merge your code, it's going to be much easier to fix your code than when you, you know, if you wait like a month to test it. And finally like speed is like another thing that helps with tests. If your tests are really slow to run, it can make it pretty painful for developers to fix any issues found by the test, right? Because you know the test will point out an issue, the developer will try to fix it and then run the test again and if they have to wait like an hour, I think you know that's pretty much too burdensome for a developer to deal with. So let's look at how fuzzing can be done by developers themselves. And the technique that really allowed this is called coverage guided fuzzing. And so coverage guided fuzzing is based on the insight that instead of teaching the fuzzer about the formats that it's fuzzing, maybe the fuzzer could sort of learn to produce inputs that look like that format. And so this really allows you to write fuzzers in a way that you just want to point out the code that you're targeting. But you don't really have to invest in the mutator as that's generic. But the fuzzer sort of learns to produce interesting inputs. And I'll show how actually right next. So let's say we were fuzzing something like a PDF reader. You would start with a corpus, which is basically just like a fancy Latin word for for like a folder that will contain different PDF files that you think exercise different parts of your program, right? Like you'll have one with images. You'll have one with a form maybe, interesting inputs basically. And the fuzzer will pick one of these inputs at random to mutate. And when it has this test case, it wants to mutate. It'll also pick random mutations to do to the test case. So and these mutations are totally generic. There are things like bit flips and erasing bytes, inserting bytes. And so when a developer writes a fuzzer using coverage guided fuzzing, typically they don't even touch these mutations. Now this test case gets mutated and then fed to the target code. Like with older generations of fuzzing, obviously if we find a crash, Bingo, that's our goal and we could sort of stop there. But the sort of magic of coverage guided fuzzing is if there's no crash, the coverage guided fuzzer will look at which parts of the targeted code were actually executed. And if the test case exercises new parts of the targeted code, then it's able to produce new behavior in the targeted code. And so you could see how, so it'll add the test case to the corpus for possible further mutation. And you could see how after a few rounds of this, you might evolve inputs to get deeper and deeper into the target code. Like let's say you have a magic, let's say you have an elf, you're fuzzing elves, there's like a magic string like ELF that starts off the binary. So like on the first time, you might randomly produce a test case that has E, that'll get added to the corpus, then in the next iteration, you might happen to produce one that has the letters EL, and then so on and so forth until you're actually producing interesting inputs. So it's not really learning about like how to mutate the format, but it's still by being sort of like stupid but trying very many times. And typically this needs to run like thousands of times per second to work well. It can actually result in interesting test cases. This process is sometimes why it's called like evolutionary coverage guided fuzzing, because it's evolving more interesting test cases. Now another tool that made developers, made it easy for developers to write their own fuzzers was libfuzzer. Most other, so the original coverage guided fuzzer AFL was initially meant to run on the entire program. Like you would just take a binary like a SQLite and just through standard AFL would pass it data through like standard in. But what libfuzzer did was it had this concept of a fuzzing harness or target where basically you use this, you define this function LVM fuzzer test when input and you link the fuzzer against that function and the fuzzer will pass it, call this function in a loop and pass it the test cases. And so this allows you to write fuzzers that are actually almost like unit tests. And so you could fuzz specific parts of your code. And just like your workflow would be with unit tests. This is an actual example from OpenSSL I think. So that LVM fuzzer test when input although it was sort of created by libfuzzer is still, it's basically become the standard just because it was sort of like the lowest common denominator. And so a lot of other like alternative tools to libfuzzer such as AFL plus plus or hung fuzz, they support that function as well. But we're not really going to be discussing those fuzzing engines today. And so as I mentioned, you obviously want to find bugs. And so how do you do that? Like one way, the most obvious solution, well, let me describe the problem a bit more in detail. With like a unit test, it's very easy to determine if the program, like how the program should behave for a single given input. But the fuzzer is going to give the targeted code like arbitrary inputs. And so it's a lot harder to define how it should behave in an arbitrary case. So the most common way, you know, you determine if the program has a bug in it is like, you know, if it has like a, if it's exhibited like a sort of generic bug, that's like not application specific. And for C and C plus plus programs, the most common of these is memory corruption. So in C and C plus plus, the languages are not memory safe. And so you can do things like dereference pointers that don't exist, you know, use memory after freeing it. And this will sometimes result in a segmentation fault. But when that happens is not really deterministic. And so what you'll want to do is try to get that more deterministic and also more actionable, right? Like, you know, when you just get the message segmentation fault, it's not really telling you where the bug is or how to fix it. So we use tools called sanitizers, which the original ones were all written for C and C plus plus type issues like memory corruption, overflowing integers, basically, you know, bugs that wouldn't necessarily cause the program to crash right away. These sanitizers make the program crash. And that's how we signal to the fuzzer that it's found a bug. Now, that was only really for C and C plus plus. And if you are writing any code in C and C plus plus, like, you know, fuzzer code, I promise you, you'll find tons of stuff that'll be interesting. But if you're not fuzzing, if you're not writing C and C plus plus code, it's a trickier question like what kind of bugs you want to find with fuzzing. So the first, like the most obvious answer is just like crashes that cause the Nile of Service issues, you know, crashes in non-C plus plus code are not really indicative of memory corruption. And I don't know if I said, but memory corruption can not only cause crashes, it can cause, you know, if an attacker does it properly, they can use it for remote code execution, right, to execute arbitrary code. But so crashes in non-memory safe language, in memory safe languages, excuse me, still cause like the Nile of Service bugs. And some of these can be pretty important. Like my favorite is one such bug that OSS fuzz found in Geth, which is like the Ethereum client written in Go, which is a memory safe language. This was just a simple denial of service bug, but it could have been used to take down like every single Geth node that was running at the time. And that would have been most of the Ethereum network. So that's, you know, denial of service is one reason to fuzz non-C plus plus code. Another thing you could do with non-C plus plus code is testing for correctness. So there may be ways that you could sort of use like assertions or other means to determine if your programs behave correctly. That's one thing you could do. But another technique that I really like is called differential fuzzing, where you'll use two different implementations of the same thing and compare the results. And if there are any differences, that's a sign that one of them is incorrect. And so there's one fuzzer in OSS fuzz that does this called crypto fuzz, which takes the math parts of different crypto cryptography libraries and does the same operations in each library. So, you know, it could take like open SSL and do like an exponentiation there and compare the result to the same operation in embed TLS. And if you have different results from each of these libraries, from one of these libraries, then you know one of them is wrong. And so it's found tons of issues using this technique. And finally, for like non-C plus plus code, there's also the possibility of writing sanitizers that could find this sort of generic classes of bugs that are not, you know, generic classes of bugs and that are not application-specific. So typically it's harder to do these for non-C plus plus or C languages because they're just better designed, I think. Like, you know, the language won't have as many sort of like words like built into it. But like one such class of vulnerability would be something like command injection, right? Like this could, this is not an application-specific vulnerability. It's like, you know, it could be exhibited by any application and it's actually not even specific to any language. And so we actually wrote a sanitizer for command injection and it managed to find, we just wrote a blog post about this, but it managed to find a remote code execution that was like trivial to exploit in a image library, tiny glTF. All you needed to do was like back tick a shell command and it would execute it. So there's also hope for, you know, writing basically sort of sanitizers to detect non-C plus plus bugs that are also not application-specific. So let's take a look at some of the fuzzing infrastructure that makes it possible to integrate fuzzing into your software development process. And the key like tool that we developed at Google that allows this is called cluster fuzz. Cluster fuzz basically aims to automate everything in fuzzing except for like the parts that humans are absolutely necessary for. So I'd say like those two parts are really just writing fuzzers, fixing bugs, and also actually just like, you know, writing code and introducing bugs, right? And the way cluster fuzz works is it just does continuous builds of your code and the fuzzers testing that code. And it'll do continuous builds and then continuously fuzz them. And it'll do things like corpus management, which is a bit difficult if you are running fuzzers in parallel. And the most interesting thing though it does is what happens when it finds a crash. When it finds a crash, it does a lot of things that you would have to do on your own if you're running a fuzzer on your desktop, right? Like if you've ever run a fuzzer on your desktop, you've probably hit the same bug like thousands of times. You know, we can't report thousands of issues for the same underlying bug otherwise developers would kill us. So cluster fuzz places a big emphasis on deduplicating bugs and making sure that they're actually unique so that we're not reporting the same crash over and over. Another thing it does is it minimizes the test case causing a crash to basically make it easier for the developer to debug and figure out what's going on. And after doing all this, it'll bisect to find out when the bug was actually introduced and it'll file an issue in our issue tracker and assign it to a developer in some cases if it could figure out which commit introduced the bug through bisection. And then once it's been assigned, it'll periodically test if the bug's been fixed in a new build and it'll close the bug if that's happened. So yeah, this is basically just summarizing some of the automation cluster fuzz does, like triaging crashes, deduplication, and reporting. I think this quote is feedback from the curl developer who mentions that just false positives are very low with fuzzing. Maybe he's not talking about duplicate issues, but in general, that's like another property of fuzzing. Whenever you get a crash, it's typically because of a real issue. Here's what it looks like when cluster fuzz files a bug. It provides a summary of what the issue is and a few stack frames as well. And here you can see it actually assigned an owner for the bug because based on the fact that they authored a commit that introduced that bug. Another nice feature I want to share that it does are coverage reports. So cluster fuzz will actually show you which parts of your code are being fuzzed and that will allow developers to maybe alter their fuzzer or write new fuzzers to cover previously untested code. And so here's what it looks like on a file level. You can see part of this function is called, but not all of it. So how can you get this workflow at home? The obvious solution would be to run cluster fuzz yourself. Cluster fuzz is open source and anyone can run it in theory. And they're actually similar tools released by other companies. So I mentioned Microsoft has their own infra called One Fuzz. And I'd say these things might take a bit of maintenance to run. And obviously we do it, so it's not like... But another thing that might be a problem is also that they tend to be tailored, I'd say, towards the company that develops them their needs. So cluster fuzz runs on hundreds of thousands of cores. You may not need a tool that is tailored for that use case. One fuzz only runs in Azure. You may not use Azure. You may not use Google Cloud. So the next best solution, or really probably the best solution if it's available to you would be to use cluster fuzz as a service. And the only solution I know that allows for this is OSS fuzz, which is basically us providing cluster fuzz as a service to open source projects. And it's a free service. We're not trying to make money with this talk. And finally, a last solution I'll go over is cluster fuzz light, which is meant to provide a lot of the same features as cluster fuzz, but is meant to be very easy to set up and maintain and runs in your CI, so it can pretty much run anywhere you want it to. So as I said, running your own full-scale infra, there are other companies that use cluster fuzz, but really we're the main developer of it, and so I think it probably is more tailored to our needs. And yeah, it might be a bit more maintenance than you'd want to spend on fuzzing. But maybe, depending on how big you are, if you're a five-person company, probably not really worth it, unless it's very security-critical maybe. So OSS fuzz is basically this free version of cluster fuzz that we have as a service, and pretty much we'll actually pay open source projects to use OSS fuzz, and that's because we want, as part of my team's mission, the open source security team, we want open source to be as safe as possible. So we even fuzz things like Firefox, that's not a product that Google really uses directly. We have Chrome, but we'll fuzz Firefox because we care about the ecosystem as a whole. I think there are over 700 projects using OSS fuzz, and it's found, I think, over 20,000 bugs, and I think about 8,000 of them have been possibly security-related, like use after free, that sort of thing. You know, the others would be issues more like issues more like meldy refs, or maybe like integer overflows that don't lead to memory corruption. We won't accept toy projects, and also, you do need to be open source, but if you do qualify, that's probably the easiest way to get fuzzing into your workflow. And it's pretty easy to set up once you've applied. You just have to write like a pretty, there's a pretty short like yaml file that's just done for configuration that can be generated by our tool. I didn't put that on the slide, but other than that, you'd have to write like a docker file to grab your program source code and install any dependencies you need to build it, and then you just write a bash script that uses our CXX flags and compiler to build your project. And that's so that we could, the reason why we need you to use our flags is so that we could pick whichever sanitizer we want. I don't think I mentioned, but sanitizers and the coverage feedback, that's all done through compile time instrumentation. So if you're not open source though, and or for whatever reason, you can't use OSS fuzz or cluster fuzz. We released another tool that can help, and that's called cluster fuzz light. Cluster fuzz light is a, it's actually based on, it uses some of cluster fuzz's code, and it's meant to run in CI. So it's actually in some ways, it has some advantages over cluster fuzz, beyond being easy to set up. The feedback loop is much tighter. So like rather than, you know, with cluster fuzz, you, or OSS fuzz, I didn't mention, but OSS fuzz, yeah, like rather than just like committing your code, waiting for a build, and then waiting for a crash to be found, and then a bug filed. With cluster fuzz light, you'll basically get your, it'll test your pull request while it's being reviewed, and give you the crash right in CI. It's much easier to set up than cluster fuzz, and it provides many of the same features, like coverage reports that you'd want. And to get this working, we had to adapt fuzzing a bit to CI. So we try to make it a bit more deterministic. We try to make it like, so we picked like 10 minutes as like sort of an arbitrary amount of time to fuzz for. You know, we think this has the potential to find like 30 to like 50% maybe of vulnerabilities that fuzzing for like, you know, days would, which I think like if you're spending, you know, like 1% of the time to find like 50% of the bugs, that's like a pretty good trade off, I'd say. If you can't afford to do the full scale fuzzing. And cluster fuzz light sort of was born from OSS fuzz, and you know, a lot of OSS fuzz users also use it because they like the sort of tighter feedback loop where they get crashes in CI. So it was born from OSS fuzz, and it has, the integration is largely the same. So first you would integrate your build with cluster fuzz light, and that just involves also a Docker file, like a one line config file, and a bash script to build with our C flags. And then to actually run cluster fuzz light, you'd have to configure your CI system to use it. And so like on GitHub, this is like, like really, really trivial. You could literally just copy this file, and that'll run cluster fuzz light for you in GitHub actions. And so, you know, if you were to do that, you'd get something like this where on your pull requests, you would get cluster fuzz light would run your fuzzers, turn red if it finds any crashes, and you could download the test cases that cause these crashes to debug it locally. And you know, you'll see like a stack trace, also where the crash occurred as well. This is a toy example, so the crash happens in LVM fuzz or test one input. Another feature of cluster fuzz light that I wanted to share, just to give you an idea for like, how much thought we put into, you know, trying to adapt it to fuzzing the CI is fuzzer selection. So many projects that use fuzzing really go like all in and write tons and tons of fuzzers. So system D is one user of OSS fuzz, and I counted at least four fuzzers that it has that begin with the letter B. So like, you know, extrapolating, I think there's probably around like 100 high tens of fuzzers. And we can't run all of it. So if we're only fuzzing for 10 minutes, if we were to run like 100 fuzzers, we probably wouldn't be spending enough time on each fuzzer to really meaningfully find any bugs. You know, except super shallow ones. So what cluster fuzz light does instead is it'll look at the diff of the pull request that it's testing, and it'll look at that diff and it'll look at the coverage from the coverage reports of each fuzzer. And if the fuzzer covers code that's being changed in the diff, then it knows that the fuzzer can test the code that's being changed and it runs that fuzzer during, you know, while testing the pull request. And so this makes, you know, cluster fuzz light like much more effective than a sort of more naive approach. You know, like cluster fuzz actually uses the more naive approach and that's because it can afford to because, you know, it's basically running sort of like not interactively. So to conclude my talk, you should fuzz your code or someone else will. It's easy and effective to fuzz. And we've got great tools to help you fuzz. And, you know, I'd be happy to help anyone do that if they'd want to reach out afterwards. And I think we could take questions now. Let me get my headphones on. Thank you, thank you. Thank you. Is it possible to run the cluster fuzz on some kernel driver? And do you recommend that? So my talk was mostly about like userland fuzzing. It is possible to use cluster fuzz with syscaller, which is, you know, like the Linux kernel fuzzer. There's also some, like syscaller has its own infra called sysbot. And to be honest with you, I don't really know what like the advantages of running one or the other would be for running syscaller. The one advantage I am familiar with, and like at Google we do run syscaller on both infras, is because I think like certain projects within Google use cluster fuzz for fuzzing userland code and kernel code. And so it's like a single sort of like, you know, process that's similar for developers no matter which they're in. So yeah, good question, but you could use like, you can use cluster fuzz for fuzzing kernel code, but you might want to just use sysbot. Any other questions? Thank you very much for your presentation. Just a quick question. In terms of the chaos monkey and fuzzing, what is the kind of relationship there? Yeah, I think I qualified like my, I think I qualified like my history of fuzzing part where I said, you know, there might have been like forms of fuzzing earlier that weren't called fuzzing. And one of those was a program called the monkey, which I think is like the predecessor to, I think chaos monkey is like for Android, right? Sorry, I can't. It's Netflix, I think. Netflix uses it, but I don't know who originated it really. Oh, well, I know there's a, well, so actually, I know Netflix, there's a lot of chaos engineering, which is maybe like a similar concept, but it's not really like something I know about. So I guess I don't really know, I don't know maybe the specific tool, but there are tools called the monkey, which are meant for sort of sending like random clicks to like GUI programs. And so like one of these, I think like existed in the 70s, and there's one for Android as well. I can't remember the exact name, but yeah, like that's also like similar idea, but I guess like if you're simulating like user clicks, it's more of maybe like a stability sort of concern that you're looking at rather than like a security issue, right? Like if you're simulating what the user who's like clicking on your app can do, probably like they have the privileges to do whatever they want on the app anyway, but yeah, sorry if I don't know enough to answer that question. Is there any way to keep state while fussing? Say for example, all the fusses that I've seen, AFL, if we are writing a harness, we can only provide one input to it, right? So what if the application that I'm using is the server side of a protocol, and I'm trying to find a thread-raised condition-related issue where I need two different input to trigger it, and how can I handle those kind of cases? Yeah, I think the solution to that is snapshot fuzzing, although to be honest to you, I'm not super familiar with it, as we don't really do it yet, but like they're pretty like well developed or decently user-friendly tools for that that have been developed that I think can handle sort of more like racy conditions or like sort of like full system fuzzing rather than like at least with libfuzzer, like it wouldn't handle that well at all. Like if your code is stateful at all, like it's going to really mess things up. Like it should be ideally like set back to the same state every time that LVM fuzzer test when input function gets executed. Okay so that means we are not yet matured enough to do stateful fuzzing? Well like I think snapshot fuzzing probably is, but like I guess I'm not sure because I don't really do it, but I would like if you're trying to do that I would say like look at that my recommendation would be to like look into that. I don't know for sure that it does what you're trying to do, but yeah. Okay, thanks. Okay, thanks for coming everyone.