 Do we have a mic? You guys can hear me. Excellent. Well, I'm going to go ahead and get started. That's OK with everybody. I imagine we'll have a few people trickling in in the meantime. But in the interest of time, I'll just dive into it. I imagine we'll have a few minutes left for questions at the end of the talk here. But then they also have, you guys probably all know already, a question and answer room set up aside. So I'm going to hustle off when the time slot is over and go over there. I'm Jacob West. I run security research at Fortify Software. We're a software security company. We make tools for helping people build more secure software. And I'm really happy to be here today to talk to you guys a little bit about static analysis and fuzzing and what the right distribution of time from a software security practitioner standpoint one might make between the two technologies. So I'm going to start off with a little introduction about fuzzing. Some of this will be a review for a lot of you, I imagine, as well as some challenges that I see with fuzzing in terms of what you want to get out of it, how you're going to reach those goals. Then I'll talk a little bit about static analysis, some of the benefits and challenges with that, and how energy can best be put into static analysis to get the most out of it. And then I'll conclude with some experimental results running on a small program comparing the two technologies. So what is fuzzing? Really, it's anything that encompasses runtime testing of the application with random or semi-random input. So it was originally documented by Barton Miller around 1990 from University of Wisconsin. He's got a series of papers from that point on where they've taken fairly basic fuzzing tools and applied them to a variety of software ranging from Unix distributions to Windows applications, and most recently, I think, OSX. So just to clarify what I'm talking about when I say fuzzing, because I think it can be a little bit vague these days. It's not necessarily a well-defined field. I'm talking about tools that are really much more on the random side of input rather than the known attack side of input. So I'm talking about tools like Spike, Ph. Protos, not necessarily black box testing tools like Sanzik, Spy, Dynamics, and WatchFire. So a couple of thoughts from Barton Miller just to set the stage for what I'm going to say today. Many of the papers he's published on fuzzing since the first one, he's reiterated the point that he does not intend, never intended, fuzzing to be a replacement for more formal methods, so program verification or more formal testing mechanisms. Rather, it's a useful way to find low-hanging fruit, so put a little bit of effort in and get easy bugs back out. Unfortunately, these days, we hear a lot of people making would-a, could-a, should-a comments about fuzzing. So after you find a bug through whatever mechanism, either manual testing from somebody in the field that you didn't want to find it, static analysis, however else you go about it, it's often easy to come back and say, oh, well, we could have found that now that we know what the bug looks like. I think this is a dangerous, a dangerous approach because it leads you into a false sense of confidence, that you're going to be able to find bugs more and more easily just because you've found bugs in the past. This is referring to, you know, I don't actually remember what program this is in, but it's in the secure coding list, and Ken Van Wick came out saying, this buffer error flow would have been really difficult to find using fuzzing or manual testing because it involved a lot of program context and fuzzing a very specific type of data. Steve Christie came back and said, well, now that we know what the type of data was and what the bug that that data tweaked was, it'd be very easy to generate a fuzzing test that would find that, which is a great idea, but don't let that lead you into believing that the next time there's such a bug, you'll find it too. If it's the same scenario, you will, but you need all that context in order to decide what the bug was that you're looking for. Same thing goes with the animated cursor bug in Microsoft earlier. Mike Howard came back and basically said, yeah, that was a test case that was missing from our fuzzing scripts, now we can add it and we'll be able to find them in the future, but that doesn't mean that there isn't another very similar parallel bug that's not quite the same that they're gonna miss. So at a high level, fuzzing works in a couple of steps. So it's about identifying what the sources of input you want to fuzz in the application are, permuting existing files or input or generating new input entirely randomly, and then trying to monitor the application to identify when an error has occurred. And that could be as simple as the program crashing or an error message being displayed or a change in behavior that's unanticipated. So when you find one of these with an Oracle, you record the state that led to that error and now you've got a reproducible test case to come back in and document what the problem was and figure out what the real issue was so you can fix it. So fuzzing really is tied to specific types of input. So we've got file fuzzers, we've got network protocol fuzzers, you've got all kinds of things. On the file side, just to give a couple examples of these, you might fuzz JPEG files, TIFF files, PDFs and so on. You collect a library of those valid files because you need to know what the structure is going to look like. You could set up a program that would generate them from scratch as well, but you really want some existing structure. And then you start modifying them in various ways. And we'll talk about kind of smart fuzzing and dumb fuzzing, but the simplest is you just start changing bytes around in the file and see how the program consumes it. More advanced techniques are going to look for specific areas of the file so that the file isn't instantly malformed in terms of the structure, but also maybe tweaks problems with individual fields within the file. Then you run the program, have it consume the file, see what happens, look for problems. Network protocols, it's very much the same way. You can either try to generate network traffic arbitrarily or you can record some existing network traffic and then start tweaking fields within it. So I mentioned before, dumb fuzzing and smart fuzzing. So dumb fuzzing really is the idea that you're going to start with some rough idea of, okay, I'm going to send a network packet or I'm going to create a file, but you don't worry about the structure of that file. You just start arbitrarily changing bits around. Some of those will be valid. Some of them will get, most of them in fact will get thrown out right away because the file just won't parse to begin with or the network packet will just be rejected right away. This works pretty well for kind of basic test cases because handling those malformed files, you may have failures there as well, but you're not going to get to kind of deeper level bugs that involve specific fields being malformed and how the program handles that data. It can take a long time to enumerate all the test cases because you have a large file, you're changing one byte at a time. You can imagine there are now a lot of permutations of these and very few of those are going to get past kind of the first level of validation. Smart fuzzing, the idea is you start altering content. So you're aware of the data structures that are involved in the network protocol, the file, whatever you're working with and you start changing individual fields. So you might change a null terminated string to remove the null character. You might alter numeric values, you might flip signs, you might try a large stintage or small stintage or you can be a little more intelligent here with the kind of input the program expects in a given field and the kind you're going to give it, not necessarily that you want to give it exactly what it expects but enumerate the cases at least. This also can involve changing the, within the data structure, changing the structure. So adding additional headers, duplicating specific fields. So what happens if there are two of a field that the program only expects one of, is it going to use the first one, the second one, is it deterministic? So the real challenge that I see with fuzzing is related to assurance and that's kind of an overloaded word in security so I don't use it in any of the overloaded ways but how do you tell how much, how many of the input sources you tested? How do you tell how much the program underneath those input sources you reached? How do you tell when an error occurs? So this is all about quality of the oracle and then as you increase all of these things you're gonna have the side effect that the run time of your test is actually going to increase as well because in order to exercise more of the code, more of the input sources you're running the program more times or subjecting it to more input. So the Microsoft SDL has a pretty concrete policy about this, it says you need to run it 100,000 times per file format per parser. When you find a bug, you have to start at zero, start with a new seed and run up to another 100,000. Why is that? I'm sure they have empirical evidence that says they find a lot of bugs that way which is great but what does that really mean in terms of the quality of your testing? So these are some of the points we're gonna try to talk about today. So next I wanna talk about some challenges tied to those questions I had on the last slide. So how do you identify the input sources in your program? Sometimes this is really easy. So a standard web application and Ajax and JavaScript and dynamic stuff on the client is starting to change this a little bit but think of a traditional web application, you've got HTTP requests and responses going back and forth, you can record those, they've got a very concrete structure, you can start fuzzing fields and then this is pretty straightforward and fairly generalizable. So I'm starting with the assumption that most of us are not in the business of creating our own fuzzing tools for every application we want to work on that could be a lot of work. So it's desirable to have some generalization between programs, between protocols and so on. So when you've got a standard protocol like HTTP web traffic, it's fairly easy to generalize. When you have proprietary web service APIs or a specific network server that accepts packets with some structure in them or the client software with proprietary communication protocol, you start to lose some of that generalizability and you've got to start doing things much more specifically to your software. It's very difficult to, when you're dealing with new protocols, determine what would constitute valid input versus invalid input. So you see a couple of examples. You may very well get stuck in that no man's land of producing input that isn't valid and isn't gonna make it past the very first level of validation. Really what this is all saying is to get past these in more complex types of software or more proprietary types of software, you're gonna require customization to tune it to these protocols and you're gonna lose that generalization. So tied to this is once you've exercised that attack surface, those inputs, how do you tell how much of the program you've exercised or how do you increase the amount of the program that you've exercised? This is kind of a simple example with a couple of string manipulation calls in C, you do a stir compare on input one and then you do a copy operation on input two and we're gonna assume both of those are coming from the outside. Now in order to ever exercise this buffer or flow, which depending on how big input two is, is very likely a buffer or flow, the stir copy call, the fuzzer would need to know that input one needs to match this static string value. Now, obviously you could tune a tool to provide that input but now you're in the business of manually determining what all the bugs in your program might be. So the tool finding that randomly can be quite hard and you end up with permutations of input to reach all the given states. So you need, in this case, static string to have a particular value, input one to match static string and a large value in input two. So now you're dependent on what the values you're providing for both of these inputs is. To go into that a little bit more, here's another example with a couple of conditionals. So if X equals three, then we have one buffer or flow. If Y equals five, we're nested in that same if block, we have another buffer or flow. And finally if Z equals seven and those previous conditions are matched, then we have another buffer or flow. Now what's the likelihood of fuzzing tool is going to find each one of these. So the likelihood that if it's fuzzing each of these values independently, it's going to get X equals three is one over two to 32nd. All the values in an integer, let's say. Now for each of those subsequent permutations, you have to get every combination between each of the values in the conditional, right? Because now you need X equal three and Y equal five to get the second buffer or flow. So you can see the runtime and the number of inputs you have to provide to the program really can explode here. In fact, in the worst case, it explodes exponentially. And that is painful for you as a person running or developing the fuzzing tool because you have to come up with all those inputs and maybe try to do some optimizations to narrow the field a little bit. But most of all, it's painful from a runtime perspective. So how long is it going to take you to run these tests if you really want to get very solid code coverage? Just to give you an example, we were looking at a conditional depth of three in that made up example with the goddess calls before. I picked a couple of bugs that I've been thinking about lately. So there's a buffer or flow and a format string bug from UFTPD that have been around forever. Those are nested below three and four conditionals each and one in off biz from Apache across its scripting bug that's got four conditionals in front of it. So the idea of two, three, four, five levels of conditionals in front of a bug, not unlikely. So the second piece of this, those were really, if you've got a solid understanding of the conditional then it's fairly easy to reach those and through purely random input you could reach all of them given a sufficient amount of time. A second class of hard to reach states has to do with program semantics and program context. So imagine an airline booking system that accepts reservations for flights and there's a bug on a page that's only displayed when a flight is full. So the fuzzing tool now needs to know that it needs to reserve 297 seats which will all behave in no interesting way one after another, exactly the same and then the 298th time it's gonna behave entirely differently. So these are cases where again you're customizing the tool if you want to reach that with a fuzzing tool to handle the program logic and you're really losing that ability to generalize across multiple programs because you're tuning it to your very, very specific environment. So that can be fairly costly and the benefits are hard to judge because they're gonna find very specific bugs that you almost had to understand for example in this case before you did the customization to reach them. The final challenge I wanna talk about is identifying errors. So this has to do with the oracle that the fuzzing tool uses to tell when some input that it's provided has actually identified an error, produced an error in the application. So this is great for traditional system programs where any kind of bad input and it's gonna crash or return with the bad error code, great. You can tell when that happens. Modern applications for the most part are being architected to hide errors in a lot of ways or at least not reveal a lot of details about what the error was. So web applications often will take you to a single kind of a discrete error page. It doesn't give you any details about what happened because they don't want you to know about what's going on behind the scenes. So when you're using fuzzing, you're basically dropping yourself back to more or less the level of an attacker because you're coming at it from a black box perspective. You may put some knowledge into the tool but when the tool's actually running it doesn't have any visibility beyond whatever the interface that it's communicating with. So how do you tell when an error has occurred? You need better oracles. You can gain some ground here by looking inside the code, so instrumenting it with checkpoints to see when the fuzzing tool is exercising in this way, here's the behavior, here's what signifies an error in a given case. But again, you're talking about putting a lot more energy beyond that kind of simple run it to find low hanging fruit bugs that Bart Miller described when he came up with this technique. So really what I'm saying with some of those examples and in general today is that fuzzing bugs that you might find with fuzzing really fall across the spectrum. So at the very easy end are things in web applications like a shallow cross-excripting bug. So the shallowest case being it never even leaves the client or a little worse it goes up to the server and comes straight back. That's pretty easy to identify, fuzzing is a great way to go about doing this. And just to clarify for everyone, I'm in no way advocating. When I say stop fuzzing I don't mean never start fuzzing. I just know how much energy it's worth putting into the process and what you're gonna get back out of it and where that sweet spot in terms of ROI for your time is. At the hard end you have these things where you need to run millions or billions of inputs in order to reach the bug or you need to put so much understanding of the application's logic into the fuzzing tool that you've put more effort than it's really worth getting back out to find because you need to really customize it to not even an application but the case of a specific bug almost. So to summarize advantages of fuzzing it's really one of the easiest testing techniques you can go about. If you've got an application that runs there's probably a fuzzing tool you can let loose on it in one way or another and very likely find some bugs. I mean looking at Miller's research papers paper after paper, in every case he found bugs in a large percentage of the programs he was testing with very unintelligent tools. So he wasn't tuning them very carefully to the environment. The other advantage of any sort of runtime testing is you have a reproducible test case at the end. So you're fairly certain what you found actually is a bug depending on how good your oracle is because you had some input and you know what the output that came back out was and that tells you that there was a bug. There can be false positives there but in contrast with static analysis it's much closer to being a proven bug at that point. And finally there is the potential to generalize and to scale across programs where you have standardization on protocols and input sources. So the web is a great example. Doing fuzzing for certain kinds of vulnerabilities and web applications absolutely makes sense and you will find them more easily than probably with any other technique. Disadvantages of fuzzing. It can be very costly to achieve completeness so reaching all of those conditionals, reaching those hard to reach states can be quite hard and can involve a lot of effort on the tool developer's standpoint to make it work in their environment. If you succeed in that, which is hard enough on its own, then the runtime costs of the tool are going to grow with your success. So the more of the application you're testing the more inputs you're gonna provide and the longer it's gonna take to run your tests almost invariably. And finally you may have a great test and a bad oracle. So an example of this is we were running a fuzz test on an application in our office doing looking for path manipulation bugs. So you provide some nasty input and it ends up in a file system call. We ran this test and the fuzzing tool we were using came back and said, nope, didn't find anything. The web application was still running just fine. Everything appeared to have no problems whatsoever. Till we rebooted the machine, because it had overwritten one of the critical operating system files. One of the attacks had succeeded. It wasn't evidenced back through the web interface but our machine was hosed and we had to recover it. So having a good oracle really matters here, right? If that file wasn't a critical system file we never would have known about the vulnerability. It would have written something, sitting in temp somewhere, who knows? And the tool never would have told us that there was a problem. Even though it in fact succeeded in attacking the application just as it intended to. So I wanna switch over to talking about static analysis a little bit. And prehistoric may be a stretch. We haven't got quite that many years between these tools and where we are today but depending on where your familiarity with static analysis comes in you may have heard about tools like RATS, It's for a flaw finder, some of the early open source solutions in the space. And these really are dinosaurs compared to what we're dealing with today. They were useful in a lot of ways. So for a security professional these tools were great. They gave them the ability to really focus their manual audit techniques or process and they gave them a repository of information about security. So they gave them a list of functions that you shouldn't use and they can go through and verify whether they were bugs there or not. The downside is they're not bug finders. So the output of these tools was close to a glorified grip. It gave you a list of things that you should go inspect and not a list of bugs to file to your QA department. You really need to have security expertise to use them. And the difference between those and tools today is that we're trying to make tools that are a little more useful for developers, people who are not security professionals but security professionals too. So the biggest piece of that change from the prehistoric tools to where we are today is about prioritization. So a tool today should still warn you about every call to stir copy in your code because you just shouldn't be calling stir copy. But it should also be able to give you some prioritization between those two and tell you which ones are important. Because if you've got 10,000 calls to stir copy in your code, you need the ability to prioritize which ones you should look at. Unfortunately, this misconception about the prehistoric tools lives on to today. I was really excited about the new fuzzing book and I still am in a lot of ways. But on page four, they've got these examples where they're trying to refute the applicability of static analysis in finding security bugs. And they give two main methods, functions, each with a car buffer, each with a stir copy. The only difference is what you're copying into that stack buffer. In one case, you're copying a short static string that will fit just fine. In the other case, you're copying a command line argument. So this could be, this could be exploitable depending on the environment the program is running in, right? Rats, it's four, no difference, right? They're gonna say, there's a call to stir copy online, X, you should really look at it. Modern tools are gonna tell you, one case, there's a stir copy low priority. The other case, here's a command line argument being accepted into main, and eventually being used in a stir copy. This is accessible from outside the program, potentially exploitable. This is near the top of your list then. So why is static analysis good for security? It's fast compared to manual review and testing. It's complete and consistent in its coverage. So you get to consider all the possible states, all the possible inputs, basically for free. It makes the security review process easier for non-experts. So some of that knowledge about what are interesting, even from the early tools, rats, and it's for what functions are interesting, what we need to look for, can be encompassed in the tool and provided to the user who might not be a security expert. And they're useful on all kinds of code. So although the rule sets, we'll talk more about rules later, maybe tuned a little bit to certain environments, the technique and the tool, the engine, are typically applicable across entire languages and multiple languages in fact. So you don't get this kind of specialization that you need to have with fuzzing tools to a particular environment, a particular protocol. Not a silver bullet though. So despite how much I would love for static analysis to be the solution to everyone's problems, absolutely not. It's harder to get started with static analysis than it is with fuzzing. So again, if you have an application that runs, you can take a fuzzing tool, point it at the program, letter rip, and probably find some bugs in a lot of cases. Static analysis is more involved. You need to have access to the source code. You need to, in a lot of cases, be able to build the source code, understand how to build the source code, understand what the source code is doing. These are not skills that everyone who wants to find bugs in software has necessarily. And they do require a good bit more investment before you start getting bugs back out the other end. On the technical side, static analysis tools have some limitations too. They don't understand the program's architecture. They don't understand the semantics of the code. So this string, is it a credit card or is it a random temporary variable? These two things could be quite different. Statically, we're not gonna have any idea whereas with runtime testing, you might actually know because the data's there at that point, so you get a lot more context about it. And finally, they don't understand the social context of where these applications run and what the implications on those semantic values that are passed around have. So I'm gonna talk a little bit more about static analysis, what some customization potentials are, different types of analysis algorithms, and so on. Just to give you an overview of what I'm talking about, most static analysis tools work roughly like this. Take some source code in, include some modeling rules to describe things that maybe it doesn't have source code for, and builds a model of the program. Then it's gonna run one or more types of analysis against that model and use rules to describe what the problems it's looking for are in that case. And eventually it's gonna have some sort of viewer to render results. This might be as simple as the command line for those early tools or a graphical interface or something built into an IDE. So what's critical on the front end of that? You need to be able to support whatever languages you're interested in looking at. That's pretty important. This is actually something that fuzzing tools have an advantage over static analysis on, right? So fuzzing tool doesn't care whether your web application is written in PHP or Java or .NET. It's still talking HTTP, which is all it cares about. So having the context to the code is both helpful and hurtful in this case. You can get around some of the limitations of actually having to have the source code by looking at the binary. So this might help people who are in a traditional testing environment and aren't used to thinking about source code, test an application. The downside is you start to lose a lot of that context that source code analysis actually benefits typically because it's written out of the byte code version of it in some cases. And also, decompilation can be difficult. So getting back to something close to that original model, if you've got a Java jar file in debug mode, this is pretty easy. If you're trying to decompile C and analyze that, a little more difficult. So static analysis tools are driven off of rules. So what they know about the program outside of the immediate source code, so libraries that are linked in from the outside, as well as the kinds of security properties that they're looking for, are all provided by input from outside the analysis engine, basically rules in this case. And this is where a lot of the, one of the premises I started with, so everybody ought to use fuzzing, everyone ought to use static analysis. Now, from that baseline, how much energy do you put into each one and where is it best spent? And customization on the static analysis side, I argue, can potentially buy you more than on the fuzzing side because things can be more generalizable. So I'm gonna talk about some different kinds of analysis engines or different types of analysis techniques, as well as different types of rules, and give you an idea of how you could go about customizing some of these tools. So on the very easiest side of static analysis techniques is something called structural analysis. So you're not interested in control flow through the program, you're not interested in data flowing through the program, you're just interested in the structure as the program is declared. Simplest case, those GetS calls. You shouldn't ever have GetS in your program, how do you flag one of those? You need a function call, you need to match against every function call to a function named GetS. About the easiest rule you can write. Going a little more complex, you can talk about structures of the program, but beyond a single function call, you can start talking about a little broader scope. So the line under the example there is a potential memory leak in the case for realloc fails. So if realloc fails, it's gonna return null. If you ever write the reference to that pointer, that hasn't been freed, now you've lost the ability to free that memory. Doesn't matter how this is executed, what the flow leading into it is, that's a potential memory leak. Structural workflow, that is fairly simple again. So you're interested in a function call to realloc, and then you're interested in whether the first parameter pass to realloc is the same thing that the return value from realloc is assigned to. Dataflow is what a lot of people think about when they think about modern static analysis tools. So this is often what gives us the ability to prioritize and put things that we think might be exploitable at the top of the list, as opposed to the traditional mechanism of looking just at the structure of the program. This is a fairly simple command injection problem. So we get some input from outside fictitious function names in these cases. We copy it from one buffer to another, and then we execute whatever that was. So assuming we don't have source code for get input from network, copy buffer, we need rules to describe what the significance of those functions is to the program. So the static analysis tool on its own isn't going to know that get input from network is a source of input, for example. So we're gonna use a, you know I've got a little problem on my slide, the sync rule bit at the bottom, I'll save that for the end. We're gonna talk about up until then first, sorry about that. So we need a source rule, which is the piece that's missing from the bottom, to say that get input from network actually introduces input to the program. So that's gonna tell the analysis engine now that buff is something interesting. So it's gonna start following buff, but it doesn't know what the significance of that's going to be yet. So it follows on its own that okay, now buff is used in another function call. But what's the significance of copy buffer? So to a human reviewer, this makes a lot of sense, probably it's like stir copy, right? Second, else tool doesn't have any idea unless it's got a rule for this function. So it needs to know that buff starting in the second parameter is going to the first parameter and now if the second parameter is interesting, the first one is two after the function it's called. And finally we see new buff used in a call to exec, and the tool needs a rule to describe to it that that's a potential vulnerability, command injection in this case. That's the one at the bottom there. So we talked about following data flowing through the program, but order of operations and control flow through the program can be interesting. Particularly you can imagine in cases like dealing with resources. So here's an example of a double free vulnerability potentially in this code on the side of the slide. We've got a while loop and an if block. There are various control flow paths that one could follow through this code. So one is we don't enter the while loop, we don't enter the if block, no memory is freed at all. Certainly not a double free vulnerability. The way you typically represent rules for control flow analysis is by modeling them as state machines. So you have various states that represent potential states the program can be in, or small components of the program can be in. And then you have transitions between those states that describe how you move from one state to another. So the rule on the other side of the slide has three states, two transitions that are relevant, which are on calls to free, and then self loops from the state back to itself on any other operation. So we only care about cases where we see a call to free. On this first path through the code, we stay in that initial state. We're just self looping on everything we see because there's never a call to free. Another path through the code is we enter the while loop and we don't enter either the if blocks. So now we've seen a single call to free and we've transitioned from one state to another in our rule and then we've looped a couple times there on the other instructions that we weren't interested in. And finally you can see where this is going. There's a third path through the code that involves going into the while loop into the if block, hitting two calls to free and we end up in an error state and that's when the tool's gonna report a bug to you. Customization of rules like this can be really useful for managing proprietary resources. So you've got a particular kind of resource that needs to be cleared up in a particular way. The tool may not know about that. You can model all that to the tool in advance. So where do static analysis tools typically go wrong? When we talked about some of the upfront cost and some of the limitations in terms of what they can derive from the source code as opposed to the architecture, the context the program's gonna run in. But when it really comes down to it, what most people are concerned about are false positives in a lot of cases and false negatives in some other cases. So is the tool reporting things that aren't really vulnerabilities or is the tool missing things that really are? And these come down to a couple of factors that contribute to the likelihood of false positives and false negatives. One is an incomplete or inaccurate model of the program. So you link in an external library, get web input in the case earlier. The tool may or may not know what that is unless it has a rule to describe it. So it doesn't have source code for it. It doesn't know what something does. You can imagine it leading to both false positives and false negatives. You can see it on both lists here. Another is how conservative or forgiving is the analysis? So how much, what's the burden of proof before the tool will report a vulnerability in a given context? Do I have to be able to prove that I think it's exploitable or did maybe I follow a function pointer, which statically I couldn't quite tell, but it's possible in a certain scenario at runtime that this might be a vulnerability. Depending on what the purpose of the tool is, tools optimize in one direction or another. So there are tools that are really targeted being bug finders. So they want to report things that are as accurate as possible. They tend to have pretty conservative analysis. So they're not going to report something unless they really think it's a bug. And then there are tools that are much more forgiving in terms of their analysis because a false negative is really the most important thing for them. And the last one is the one I really want to call attention to today, which is missing rules. So that can affect the incomplete and accurate model, the program, these are kind of tied together. How much does the tool know about your environment beyond the source code? You can also use rules to convey some of the context and semantics of the program. So you can write a rule that says this function always returns a credit card number. In the code, it might just always be a string, but you can give it a little more information about it and then derive more value from it as well. So untapped potential. It's really about customization for static analysis. Too many people today are using static analysis tools out of the box. And we see a lot of people now putting energy into customizing fuzzing tools to their environment because it's clear that once you move beyond kind of a lot of the simple cases, old system software, you really need to put that customization in. Static analysis tools will give you some value out of the box. It's not as obvious that you need to take that next step and put energy into customizing them, but there's actually the potential to get quite a bit of value back out. So you can use customization to, as I said before, model the behavior of third-party libraries that you're using, or internal libraries that you pre-compile and then link in. You can also use it to describe semantics. You can use it to identify program-specific vulnerabilities. And this is certainly getting back into the realm of the thing I was critiquing fuzzing on, which is doing too much customization in a given, specific to a given program. But the effort involved in one individual rule, a handful of individual rules may very well be worthwhile in terms of the classes of bugs you can find because if they're very applicable to a large code base that you work on, this could be worthwhile still. So I talked before about the implications of performance on runtime testing, which is the more complete your test are, the longer your runtime is going to be. Static analysis doesn't necessarily have the same problem. So if you add a couple of those custom rules I talked about earlier, the runtime is basically going to be the same because it's primarily bounded by what the scope of the analysis is. So how much of the program is the analysis engine considering at any given time? And you've got early tools like RATS that basically consider a line of code at a time, and then you go all the way up through tools that consider modules, entire programs, or across process boundaries even. So you can make the right choice in your environment as to which of these is right, but no matter how much customization energy you put into it, you're not going to change that significantly. So the more energy you put into customizing the tool, your runtime is going to say roughly static. So you don't get dinged for the effort you're putting into it. So to talk a little bit about the experiment we conducted, scale was obviously a problem here. We couldn't test five zillion input cases like the example I gave earlier. We needed something that was fairly scalable. So we chose a quick SMTPD. I can never say that particularly in public, which is a couple of year old male demon. It's got some known vulnerabilities in it, not particularly large, written in C. And we picked a couple of tools. So I have to say, I work at Fortify, so that's the static analysis tool I chose because I have it sitting on my desk already. Could use a different one too. I don't think it would have a significant impact in this study. And we chose the at stake SMTP fuzzing tool because it was really customized to this environment. So it's very much, a lot of energy has been put into it to customizing it to this particular protocol, this particular type of application, as opposed to kind of a general file format fuzzer, let's say, or something like that. And we ran the tools, collected data, and tried to draw some conclusions from them. So what did we find? Four remotely exploitable vulnerabilities, two buffer reflows, two format strings. These are known for several years, so I'm not letting anything out of the bag today. And then a bunch of other stuff having to do with local file system attacks, environment variable attacks, and so on, which, since this is installed, set UID root on most systems, actually could be interesting. But for the purpose of today, we're gonna put most of the significance on the remotely exploitable things, since that's what the fuzzing tools are really gonna be capable of finding in this case. Although it's a good example, you could actually pick three or four different fuzzing tools and fuzz those different sources of input, and maybe get a little more out of them. I didn't do that. So what were the results? We've got these four remotely exploitable bugs. Static analysis tool found all four of them, hooray. Fuzzing tool found two very easily. So the first couple of test cases, they found these format string vulnerabilities because it's one of the things it looks for. It's gonna fuzz every field in the SMTP traffic and insert format strings into them, because that's something interesting we've seen in a lot of mail clients over the years. The buffer flows it didn't find. The reason being that where those stack variables were located, there was too much padding around them. So there were a lot of other buffers. So although they actually, it's kind of like the path manipulation example I gave before. The tool did exercise the vulnerability in many cases and overrode variables on the stack. But changing those variables wasn't sufficient to cause the program to crash or to cause it to return differently. Therefore, the oracle was really insufficient. So the tool couldn't tell that the vulnerability was there and therefore the user, if they weren't already aware of the vulnerability, wouldn't be made aware of it by writing this test. So what are our conclusions about that experiment? It found exploitable vulnerabilities in one qualification of the whole thing is the scale of the program is such that measuring performance really wasn't useful. It's not large enough to say it took an hour to run the test or five hours to run the static analysis. So everything was very quick. But in its defense, it found exploitable format string vulnerabilities in the program in a matter of seconds, fast as you can hit enter on the command line. It also missed critical remotely exploitable vulnerabilities that were within its reach. So I'm gonna ding it a little bit for having missed things like file format bugs or environment variable bugs, but it's not really its fault. It only was talking network traffic. But it missed these buffer workflows which are tweaked by fields in the network traffic that's passed to the server should have found these. These are something that the user would have expected it to find. And although it didn't affect any bugs in this program because there weren't any masked by significantly complex conditionals, you can easily imagine a case where a particular bug requires two different header fields to be put in a mail request, a value X and value Y, and then have certain behavior. This tool was fuzzing one input field at a time with a long list of variable inputs and would never get to that complex case of what if this had this value and that had that value. So there are definitely some advantages. Fuzzing is a lot less involved than static analysis. I am not an expert user of fuzzing tools, got this up and running. Once I got the mail server running, which took me a couple of days, very, very quickly compiled it, ran it, found bugs in half an hour, easily start to finish. Not a hard thing to ask someone testing software to do, an easy way to find low hanging fruit, good idea. You get a lot of context for free. So the tool, depending on how specialized it is into the environment that the program is going to be used in, is going to have a lot of context baked into it. So the fuzzing tool I was using is a mail fuzzing tool. So it's sending things related to mail, headers you would expect to see. You get a lot of the context that you would expect from an actual use case for the software in a real environment, where static analysis isn't going to have any of that understanding. And finally, it's great because what I ended up with was this string caused the program to crash. Now I can send that to whoever wants to fix the bug and say very easily there's clearly a bug, just run it this way. On the static analysis side, we get the benefit of thoroughness. So we were able to identify those buffer workflows in the case that we looked at because we're considering every path through the code, we're considering every potential problem, whether it evidences itself from the outside or not, regardless of what the oracle might be from a fuzzing standpoint. You get to handle nested conditionals and hard to reach states for free because the static analysis tool is going to assume any possible input. And that can certainly result in false positives in some cases because some of those inputs may not actually be plausible, but much better in some cases to come at it from the completeness and if you're really going for assurance, you wanna find all the bugs, then slowly inching towards that and at the same time increasing the length of your tests. Doesn't require running the code and it only requires, however much time it requires for the analysis to consider all of these paths. So you get all of them at the same time for the same cost from a runtime standpoint. And as you put more energy into tuning it to your environment or your types of software, you're not going to increase that runtime. Finally, it doesn't depend on an oracle. So the example I gave with the buffer overflow before is, it's going to look for what it thinks are problems at the source code level regardless of how they're actually gonna be evidenced from the outside. So it's not dependent on what interfaces it has access to. Is it coming at it from the web or from the file system? Did it overwrite one of my system files and so on? It's going to look at what does the code actually doing and do I think that's a problem? So to summarize, when we're talking about finding bugs, it's not clear which technique you'd want to use. When you're talking about finding security problems, I think static analysis is spot on because it's really going for, I want to have a complete view of the application. I want to guarantee that rather than inching towards it. And I want to report as many problems as I can. Important attributes when choosing a static analysis tool. You need to make sure it supports your languages. That's back to the example of a fuzzing tool doesn't need to worry about that. This is a new challenge when you're dealing with static analysis. You want to make sure you have the right analysis technique so you can do prioritization as opposed to those early tools. You definitely want to have the right rule set and this is somewhere that last bullet comes in. You should be building on that. Don't just use what the tool comes with. Performance is important. So if it takes two weeks to begin with and it always takes two weeks and never takes any longer that's probably not acceptable. You need to find one that'll work for your process in terms of running reviews. And something we didn't get to talk about today but that's really critical is results management. So the tool might report 100 issues. It might prioritize those some coming out of the engine. How is it gonna present those to the user? How is it gonna let the user review those issues and give them as much information to understand what the code is doing as possible? This is really critical in terms of you guys actually looking at the code and figuring out what bugs are. And the point I hope I've been harping on enough today is customize, customize, customize. It's worth putting energy into it because you're not shooting yourself in the foot by increasing the running time and you're really likely to find more bugs, more bugs that are relevant to you and more bugs that depend or that fill in for some of the weaknesses of the technology originally such as context and program semantics. I'll put a PDF of this up. There are slides with your CD but I've changed them around a little bit. So if you wanna get the new things, here they are. If you don't care so much about the small changes not a big deal. Feel free to send me an email if you want to disagree with anything I said today. Talk about any of this stuff. I'm really eager. I enjoy talking to interesting people. So if you're not interesting, maybe don't talk to me. And I have to give a pitch for, I was really excited this gentleman in the front row has a copy of my book. It's one of the only ones I've seen here because unfortunately it's a White Hat book. So it's about how to write secure software, not how to break into software. So Black Hat and Def Con not the prime audiences but hopefully some of you will find some interest in it or at least wanna know what the guys on the other side of the fence are doing. Thanks very much. I appreciate your time today. Thank you.