 Hello, my name is Charlie Miller. I'm going to be talking today about different kinds of fuzzing and trying to basically take a stab at figuring out exactly how much better different kinds are from others and some other fundamental questions about fuzzing. So basically, people have been fuzzing for a number of years and using that to find bugs. And there's a lot of sort of experience in the field but no one's really sort of looked at exactly, you know, tried to quantify how effective it is or not and that's what I was wanting to try to do. So basically, I'll take a little time to first talk about what fuzzing is real briefly and then I'll talk about basically one big case study I did and it involves portable network graphics, which are just images and in particular, libpng, which is a library that parses pngs. Then I'll talk about basically what the whole talk is about which was the two kinds of fuzzing that I'm talking about, mutation and generation-based fuzzing and I'm going to show exactly what happened when I looked to see how effective they were. So one of the founding fathers of fuzzing has this quote in his paper, intelligent fuzzing usually gives more results and that totally makes sense and everyone knows it's pretty much true but the question is, you know, what's more, right? Because doing intelligent fuzzing is just an order of magnitude harder to do than dumb fuzzing and so the question becomes, is it worth your time to do intelligent fuzzing or not? And that's what I kind of wanted to find out. And the other thing I wanted to find out was if you're going to use mutation-based fuzzing, that starts from a particular file and you make changes to it or a particular test case and I wanted to know how important is your choice of the initial test case? So does it matter or you can just pick something at random or do you have to think about it? And if you do have to think about it, how important is it? Okay, so what's fuzzing? You've probably seen, I've seen at least a couple talks that talk about fuzzing already so this might be a rehash but it's quick. So fuzzing basically you start by generating test cases and test cases, they're just inputs to the target program. They might be files, so like if you were fuzzing Microsoft Office you would generate Word documents or Excel spreadsheets and those would be your test cases, so they would be mutated versions of that. It could be network traffic, so if you're fuzzing like a web server or something, the network traffic that would count as your test cases. Command line arguments, so maybe very long strings and command lines, environment variables, whatever. So whatever the input is that's going into the program, that's what's called the test case. These test cases should be similar to real valid data but they have to obviously have problems too, so they contain anomalies as well. And the reason they have to be at least somewhat close is if you just send complete random garbage to the program, it's just immediately going to kick it out and know something's wrong and so you're not going to get very deep into the program. So you want to get deep enough into it but still have some problems. So fuzzing is basically two things, you generate the test cases and then you monitor the program to see if anything went wrong. And so things that might go wrong in the program is maybe it starts eating up 100% of CPU, so that shows something's gone wrong. Maybe the thing crashes, maybe it starts to open files that it's not supposed to. So anyway, you need something to monitor the program and see what's going on. So the point of sending in these anomalous test cases is that you're doing things that the normal client or the normal behavior of the program, it doesn't usually happen. So hopefully you'll find some assumptions the program are made and you'll violate those and good things will happen. So basically there's a lot of... I mean the big question is, I said fuzzing is two things, getting test cases and then monitoring the program. So on the side about getting test cases, how do you get them? You need hundreds of thousands of these things, so how do you make them? And there's basically three ways right now to do it and I'm only going to talk about two. Jared, who was in here before this talk, talked about the third one. So the dumb fuzzing, or it's called protocol unaware fuzzing, so you don't know anything about the protocol. You just take a valid input, so a valid test case, maybe like I said a spreadsheet or some network traffic or whatever. So something valid and then you just start making little changes to it. So maybe there's a program called file fuzz and what it does, among other things, is it'll take a file and it'll just start changing every byte one at a time and then it'll send it through the program. So that's one example of dumb fuzzing. There's other heuristics you can use, so instead of just flipping a byte randomly, you can find something that looks like a string and add a bunch of a's or something. So add long strings, you can start looking for things that look like integers and putting lots of different integer values in for it. Percent ends to look for four measuring bugs, whatever. The point about dumb fuzzing is it's super easy to set up and do. So if you give me a program and you give me a valid input, I can basically write a fuzzer and get it going in about 10 minutes. So I don't need to know anything about how the program works. I don't need to know anything about the file format or the network format or anything. So I can just turn the thing on, just start flipping bits, and I'm off to the races. So you can get into it really fast. And obviously it might depend on the initial test case. So you can imagine, again, back to my example, a fuzzing Microsoft, say PowerPoint. So it doesn't matter if I take a one kilobyte PowerPoint slide or a one megabyte PowerPoint slide. If I start from those two and I start flipping bits, is that going to make a difference? Well, probably. And I'll quantify that later. So that was dumb fuzzing. You just take something you already have and just start making changes to it. The alternative is intelligent fuzzing or protocol ware fuzzing. Or I think I call it generational-based too. So the point of this is you don't start from something that you already have. You basically build it up from scratch. So you take some description of what the program is to expect. So maybe it's an RFC, or maybe it's just documentation that comes with the program, or you could reverse engineer the binary or look at the source code or whatever. Somehow you get a description of what the program expects. Then you start building up the test cases from this description, adding in the anomalies as you go. So the more or less famous intelligent fuzzer is Spike. And for that, you write a description of what the protocol is. And Spike goes in and it changes one at a time each of the fields, and it adds anomalies. So again, you can do the same thing. You add in numbers to the end, plus or minus one. You can add in negative numbers, percent ends, whatever you want, long strings. The point of this is, and I mentioned this earlier, that it's very much harder to do this than the dumb fuzzing is, it takes a long time to sit down and read an entire RFC and then write a program that basically emulates every single part of the RFC. And then you have to do the whole thing through the whole program. So you can imagine that this is going to take days, weeks, months even, for very complicated protocols. So it should make sense that if for sure you're guaranteed you're hitting every single thing that's in the specification, you think that you have a better chance of finding bugs like this. And what I wanted to find out in this whole talk is exactly how much better is it? Because if I'm going to spend a month doing this, and I could have just turned on my dumb fuzzer in five minutes, it better be a heck of a lot better. Okay, so in order, and I should just mention, so the third type of fuzzing that is sort of around, besides dumb, smart, is evolutionary. And like I said, Jared already talked about that and that's a whole different way to generate inputs. You use evolutionary algorithms. Anyway, so I wanted to find, answer these questions that I had. And to do it, I needed to actually fuzz something. Okay, so what I decided to fuzz was PNG. It's an image format. It's used in, did I skip something? No, so anyway, it's an image format. It's used all over the place. Just to describe to you what a PNG looks like, it starts out with eight bytes. It's always the same. It has these things called chunks. And each chunk consists of four bytes to tell you how long the chunk is, four bytes to tell you the type of chunk, then some data, and you don't even have to have data if you don't want to, and then a four byte CRC checksum, just to make sure that it hasn't been corrupted. And you just get all these chunks one after another and that's all PNG looks like if you tear it open. And the original PNG RFC specifies 18 chunk types, three of which have to show up in every file. There's other extensions to the RFC that talk about other types, and you can make up your own type or whatever. I end up looking at 21 because that's what libpng knows about. And I don't know if you can see it, but this is a little program I have that can analyze files and break them apart by their specification. You can just see maybe on the bottom there the different chunk types that show up in this particular PNG and what it would look like. So that's just what it looks like. And the only hex I think you'll see in this slide or in this presentation. So the first thing I wanted to do is think about if you're going to do dumb buzzing, so you have to start with a PNG and you have to start flipping bits or adding strings or doing something. What kind of PNGs are out there? So you've got to start with something, so you're probably going to go to the internet and find some file and start with that. So what kind of files are out there? What are you going to stumble upon? So I downloaded every PNG I could find and they're all unique. I wrote a program which processed each one and printed out and stored the types of chunks that were in that file and the number of chunks. And you can see here that kind of surprised me that most of these files didn't have very many different chunk types. So on average they had about five different chunk types of the possible 21. There wasn't much standard deviation. I found one file that had nine chunk types and I found a bunch of them that only had the three that are required. So you have to think that if you're going to... Assuming that every... There's different code in the target program to process the different chunk types. If you start with a PNG that doesn't have a particular chunk type, you're never going to fuzz the code that processes that. So it's really important that you understand that and if you grab a file off the internet and you only have five chunk types, and you start fuzzing dumbly with that file, you're going to end up only fuzzing like five of the possible 21 functions if that's how they do it. So this is a distribution of the different chunk types and what percentage of the files contained one of those types. So you can see the three that are mandatory. Obviously they all had those. But otherwise there's a bunch that you can look at over towards the right that didn't show up at all. And there's a lot of other ones that only show up in five percent or less of the files. So that's what average PNGs look like. They don't have a lot of different types. And this is just restating that. So basically if you pick a random file of the internet, it's probably going to have around five chunks. Nine of the 21 types occur in less than five percent of files. Four of them never showed up at all. And again, the important thing about that is if you start to use this as the basis for your mutation-based fuzzer, you're never going to fuzz any of the code for the chunks that aren't present in your file. And of course I know this because I looked at it, but if you didn't know anything about PNGs, you wouldn't know that. And you would think you had done a great job because you started with a couple PNGs and fuzzed them. But really you wouldn't have fuzzed a lot of the code. So in order to... So I said earlier while assuming that maybe they just process all the different chunk types the same way so it doesn't really matter. And so to answer some of my questions, I took one particular library and I'm going to analyze the heck out of it. So I looked at libpng, which is an open source package used in a lot of the browsers. And I wanted to make sure that each of the chunk types had separate code to handle it. So the way I did that, and obviously with libpng I could just look at the source code, but originally I had sort of... I was planning on using like three or four different PNG processors, so that's why I chose this way because it's generic. But anyway, trust me, it really does have different code for each one. But anyway, I took PNGs that contained just the three mandatory chunks and then one more each. So let's see. And then so that way I basically had four chunk types and the three mandatory one extra, and I fuzzed each one. And this graph is just supposed to show you that there is different... So there's different amounts of code were used to process each of the different chunk types. And the sort of odd measure that I used was the number of lines of code required to process each as a percentage of the number of lines required to process a minimal PNG. It's a mouthful, and if you want to know why I chose that, please see me afterwards, but I'm running out of time. So some chunk types require more code than others for processing. Four chunk types. The four that never showed up actually require 76% more code than just an average file. So that's 76% more code than just a minimal PNG that you'll never fuzz with dumb fuzzing. So again, now I'm going to compare... So I guess the point of that was it's important which one you start with. And now I want to see how much better is generational-based than mutation-based. So smart versus dumb. So I ran three experiments. One was I wanted to measure the dumb fuzzing, so I started with three different files. One that had five chunks, which is sort of what you would typically find if you just randomly pick one. One that had seven, which you'd have to be pretty lucky, and then the one that had the nine. So that one, you'd have to, you know, play the lottery to win. And for each of these, I used something basically like file fuzz to generate 100,000 test cases. Experiment two was to make sure that the program is not just going to spit these things out right away because they had CRCs. I took the 300,000 files from the first experiment and fixed up all the CRCs and re-ran the experiment just to make sure that it wasn't something about the CRC that was in the data. And finally, for the third experiment, I used the spike file and the PNG specification, and I generated files that contained every single chunk type and fuzzed every single field in the chunk type, so the CRCs, links, chunk names, the data, everything. And doing that, I ended up with 29,000 files, which is quite a bit less than the other ones. So here's basically the whole talk in one slide. We look in the left, so the first three bars are using dumb fuzzing on the file that's had five, seven, and nine chunk types. And so you can see, if you can read the numbers, I can. So the one was 60% more, and the green one is 140% more. So you basically get twice as much, and this is code coverage, twice as much code coverage by fuzzing the, starting with the nine chunked PNG versus the five chunked PNG. So you can get twice as much code coverage just by choosing the right file to start with. And you get the same sort of phenomenon on the ones where I fixed the CRCs. Didn't seem to make much difference. And then you look at the one all the way on the right, and that's the one I got from using generational-based fuzzing. And that one, the number says 289. So you do get quite a bit more. So I should, the one caveat, I didn't mention code coverage till just now. That's how I was measuring this. The caveat is, if you get code coverage, that doesn't necessarily mean you're going to find bugs, but definitely if you don't cover a certain line, you're definitely not going to find a bug there. Anyway, so to wrap things up, mutation-based fuzzing is very dependent on the inputs. And choosing the right input in this case, you can get up to twice as good of code coverage. Generation-based fuzzing is much better than mutation-based in this case. It's two to five times better, depending on which file you started with. And this is only one file type, so it might not generalize, but it kind of might. So the only other data I've ever seen that talks kind of about this is in the book fuzzing that just came out. Very good book. They examined 10,000 SWF files. And this is the distribution that they found for the version numbers. And sure enough, if you just wanted to do blind, dumb fuzzing of Flash and you just picked some files up the internet, you're not going to be fuzzing anything that's Flash 7 or 8 specific, because only 3% of the files are going to do that. So again, this is an important case. If you want to fuzz Flash and you want to use the dumb fuzzing technique, you need to make sure you get a file that has Flash 8 and you can do much better that way. So that's it. I guess I'm out of time, so if you have questions, please follow me to the question-answer room. Thanks.