 So, let me take this out since we're allowed to do that. So, we're gonna talk about automated exploit detection. Fuck this manual looking at it shit page by page, line by line, it's fucking losers. Anyone who talks about that in their talk is leading you down the wrong path. What's that? See what? My friend Luis here is talking about making debuggers and static analyzers talk to each other as opposed to doing things manually, it's about doing things more optimally, but he admitted in his talk it is still a Band-Aid, is that correct? Yes, but it's a really good Band-Aid. It's like those Band-Aids, you know they have like a cartoon on them? Yes, very entertaining Band-Aid. Glue in the dark, scratch and sniff. G.I. Joe, Barbie, Teenage Mutant Ninja Turtles, et cetera, et cetera, just like my underoos, do you wanna see? No, all right. So, we're gonna talk about automated exploit detection binaries and I'm Matt and that's Luis and those are our URLs and our email addresses and the project we're gonna be talking about is bug report and that's the source for a URL. If anybody is so inclined to go and start down in the code right now so you can ask us extremely pertinent questions by the end of the talk if you are so inclined. All right, so like I said, I'm Matt and so see the last time I spoke at DEF CON was 2000 and we just taught classes on the- It was yesterday? It was yesterday, my mistake. Yesterday's kind of a blur but seven years ago, fuck yeah, like that's a rote memory basically. So, before that it was in 2000, we just taught a class, two-day class, two-day class is a blackout on this subject we're gonna give you a highly condensed version that is a lot more focused on the tool itself and as opposed to a lot of the concept and things like that. So in the class we basically taught people how do you write an automated code analysis tool for binaries from scratch so as to commoditize the entire code analysis market, which is making too much money for too little functionality. And so yeah, so see you spoke at the taught a root kick class at Black Hat and see we did talk together at DEF CON. Really several security advisors of the years for exploitable bugs. One of the major ones that people may remember is we found a one packet that would crash basically any BSD box. It was an unaligned ICMP timestamp option thing and net BSD open BSD and free BSD on non-X86 architectures, crashing that with one packet. I contribute a lot to open source. I fund a good deal of open source. Right now one of the things I'm funding is a PlayStation 2 emulator called PCSX2, which you can Google. What's that? How about fuck you? Thank you, I'm talking, all right. So anyways, PCSX2, I'm gonna fund the Xbox development on that because I wanna play catamari on the Xbox. So anyways, I fund a lot of open source development and I work on a lot of product stuff, a lot of practical solutions, not academic jerking off like a lot of people talk about and they never release anything. So one of the things that we've done is we're gonna release tool with source code that actually works as opposed to geez I wonder if I could make, I don't know, something that would translate every binary into the same kind of byte code so then we can analyze the byte code and work on it for seven years with no results, for instance. And then Luis. I do mostly hardware and software reversing for myself mostly and sometimes I do contracts for other people, but mostly it's for myself. I like reversing, I like to be able to see what's inside a binary of protocol and algorithm or what have you. To me it just is really interesting. That's really about it for me. Sweet. And Luis and I have been doing stuff like I said that first advisory was in 99, something like that. And we released a couple advisers over the years as well. So let's go to the next. Liz I. So here's what we're gonna talk about the definition of static analysis. We're gonna talk about some of the underlying concepts. We're gonna talk about challenges. Challenges are things that can be overcome easily if you just put your mind to it and not listen to discouraging assholes. And obstacles are things that are a little more difficult to overcome in reality. And we're gonna talk about the solutions for these things in C-sharp that we've implemented and real world exploit tests where we look at real programs that we've compiled and things like that. There are some limitations to the tool. We'll talk about those as we go along. So let's see. So bug reports approach we're talking about is a set of, start out as a set of tests for analysis tools. I worked on a commercial product that did code analysis and that product was moved from the market for various reasons that I can't go into. But anyways, I was kind of disappointed with the kind of state of code analysis tools. And so I was trying to come up with a way how to move this forward. So I figured a really good, it's kind of a suite of programs that would like be, give a kind of a objective evaluation of different tools. For instance, and there's a couple of things out there but they're all vendor related. A lot of these things are funded by, companies that start with F and end in, or DeFi and things like this. So you can't really trust it too much in my personal opinion. I don't think they exercise code analysis tools very well. So I started out with that kind of concept and I was like, you know what? I should do a black hat class or a DEF CON thing to kind of put these ideas out there to commoditize them. I don't like people who keep their ideas all to themselves like they're secret. We need to commoditize these things and move on to the next level and move on to the next better ideas. So it's a proof of concept kind of thing. Proof of concept in this particular context. I mean that actually works just in a very narrow context and it's very easy to extend. We're gonna talk about how easy it's going to be to extend. And after this talk, anyone who's interested can come pair program with Luis and I and Brian, Eric, maybe, Todd, Doug, or my husband Jeff. By the way, it's my husband's birthday today. He just turned 16 years old. We got married in Norway. Anyways, we're releasing this in the GPL V3 draft two so people can't do things like embed it into an appliance and then circumvent the GPL that way. Or do anything else like that. We're talking about using the Metasploit license but I figured the GPL with GPL V3 would be a little more broadly known perhaps. And we have a bunch of GPL V3 t-shirts here that the Free Software Foundation has sent me. And so we're gonna give those out for really awesome kick-ass questions at the end of this talk. And once again, that's the URL there if you're so inclined to download the source code and ask pertinent questions. And so we're doing, so it was, Bug Report was developed in a test driven way in a clean room re-implementation. Like I couldn't be involved with code because of my personal involvement in a commercial tool. So it was done in a test driven way where I would send Louis test programs and he read some TDD extreme programming and C-sharp books. And so basically this tool so far sort of represents what the tests thus far have guided the code to do. Nothing more and nothing less. However, as you'll see it basically just works and it's very easy to extend to make work which we're going to do this evening after this talk whether any of you guys participate or not. So I know I'm gonna get this question. Why C-sharp, Microsoft sucks, blah, blah, blah. All right. So it's very similar to Java and C++. I like object-orientated language. I know some people don't. Some people prefer functional languages or procedural languages. Different smoke strokes make the world go around. It's my personal preference. And so in sort of funding this project as it were, this is my personal preference that I chose to see it executed upon. It's an open ECMA standard unlike Java which is completely controlled by Sun under a bogus community process that took about, oh God, I don't know, five years to get generics into the language in a really shitty way. So I didn't really want to use Java. I didn't have any good personal experiences with it. And I wasn't ready to dive into scripting languages or interpreted scripting languages anyway because I didn't understand their underlying implementation. We'll have to know the language features and stuff like that. So I kind of stuck with what I knew. There's three different open source implementations of C-sharp. There's mono, there is dot gnu and there is the Microsoft shared source one. Pick one, any one, whichever license you like or don't like or whatever, whatever. There's three open source implementations. How many open source implementations of there are Java? Zero, zero. Anyways, other than some things that are just like starting right now and aren't very functional at all. So it has specific features that we like. High-speed generics, nullable value type, strong typing, stuff like that. I like strong type language. Some people like loosely typed languages. It's a matter of preference. There's no rule of judgment there. And also there's high quality and simple open source tools for unit testing. The ID that we're gonna show you is completely open source, sharp develop. I really like it. Heads, it's nooks and crannies and things that don't work, but basically just works. So target of detection. What do we care about? Now a lot of people use a lot of colorful languages and a lot of diagrams and things like this to basically tell you, to make it seem very, very complicated what's going on, to make it seem like they are the wizards of assembly or the wizards of item, blah, blah, blah. Here's what it comes down to. The essence. And it's gonna seem like an oversimplification, but we're gonna go over code and assembly and the analysis and why it's not an oversimplification. Then as we are looking for out of bound memory rights, we don't care about memory leaks necessarily. We don't care about double freeze necessarily. We don't care about that kind of stuff. Or no potential nullable dereferences and all this other kind of stuff. We care about out of bound memory rights because every exploitable buffer overflow comes down to this in essence. You have to overwrite a buffer of some kind or get your code on the machine somehow. There are some very minor corner cases like protocols, for instance, that take a dereference and memory pointer that's specified in a packet by address. That's a little bit different, but it is effectively the same for our purposes. So let's look at a simple example of an out of bound memory right. Let's see here now. Yeah, so out of bound memory rights. So there's loops, there's branches, people tell you all about dominator trees and lots of really important sounding stuff and very academic stuff. And how would I ever learn about dominator trees and algorithms and crap like that? It really doesn't matter. This is what matters. Out of bound memory rights. We're gonna go over loops, by the way, and make all those people look like they're full of hot air, which they are. Because you'll see how simple it is to understand. At the basic level. So this is an out of bound memory right. At the simplest level. Where's one I can see? There's one I can see. So here's a simple thing. We're malicking 16 bytes and we're writing one out of bounds out of the array with immediate data. So is this exploitable? No, it is not. Here's our definition of exploitable. It's really funny when you talk to some of these vendors and you say, oh well, do you detect exploitable bugs? They go, well it depends on what you mean by exploitable and bug. And it's like, what the fuck are you talking about? I'm sorry, I didn't come to the conference with a thesaurus under my arm to try to like, work out your market speak. In this case, what our definition, or my definition anyway, of an exploitable bug is an out of bound memory right that is with user supply data that is unconstrained. That has not been constrained properly. We'll understand constraints a little bit later. So in this case, this is not exploitable, but still a bug, right? You're still corrupting memory. You're still going one outside the bounds of the buffer, which may or may not fuck things up to the point where your program will do something unexpected. Back to the slide real quick. Can you make the font and text it in there? Sure. We can do that. No worries. LASIK doctor. I don't know, you hear what happened to Kathy Griffin? She had a really bad LASIK experience. You can look on her website, kathygryffin.net. See all the really gross pictures of her eye. All right, news of the day. Text patch sucks. Oh, maybe you have to deselect all, actually. Maybe they're right. Try selecting. This is the reference for the whole title. Okay. Yeah, just me about this. All right, two tears in a bucket, mother fuck it. Here we go. All right, so if we go back to the, back to the Slyzides. That's small too, all right. Yeah, well, there's no winning. There really isn't. So that's an out-of-hand memory write. And so if we look at the x86 for that memory write, and this time we'll try to do it in a bigger font, I promise you. There we go. Now it obeys. So, there's some preamble stuff where it's setting up the stack, that's line one, all about line, through line 118. One of the things we've done here, we generate this using ObjDump, which comes with bin utils, which is free with anything. ObjDump is really cool because if you build it yourself, you can say enable all platforms, and it will disassemble just about anything you throw at it. Mind you, it is not a replacement for IDA, by any stretch of the imagination, especially not for malware type stuff, but you can get a lot of mileage out of ObjDump. So one of the things that we did is we told ObjDump to intermix the source code with the assembly here for the purposes of illustration, making it easier to talk about and comprehend. Let's see, so we got some preamble stuff, and you see the void P star malloc, and so you see a move, D-word pointer ESP0X10, that's putting 16 onto the top of the stack, it's not using a push. First time I've ever seen an assembly like that. This is GCC4.0.3 that comes with Ubuntu, or Kubuntu in my personal case. But so it's putting 16 on top of the stack, it's calling malloc basically. So the first thing on the stack is an argument. So that's the malloc sub 16. And so on the X86 architecture, the return value comes in through EAX. So what you can see online, 123. So you can see here what it's doing is it is moving zero into the dereference of EAX plus 16. So EAX contains the pointer returned by malloc. We're going 16 bytes into that buffer, which is one too many. And it's gonna put zero, one too many into there. One of my German colleagues on the free SCI open source project calls these OB1 bugs. So that's what the assembly looks like. Out of bounds, memory access. All right, cool. So out of bound write tests. So one thing that we should do is we should look at an exploitable case, and we're gonna show you how it fits into the automated testing framework that we have, which is a simple Python script. Although Mr. Doug here actually has a patch on his laptop that's gonna make the Python script unnecessary. Sorry, Louise, I know that's your baby. You son of a bitch. So let's look at one of the exploitable ones, like with stack taint, or what have you. And this should come up in the larger font. Font, font, font. It's been a long week and a half in the desert on a horse with no name. That's fine. We can just look at the dot-dop actually. Check it out. There you go. Yeah, the O2. Swizz it. All right, so we got some stacks set up. So this is an exploitable case. And all these involve main, and because one of the limitations that we have right now is that we only scan one function as the main function. We'll talk about how easy it's going to be to fix that. So let's see. So we get stars to our argv, which is tainted data. It's user-supply data. It comes in from the command line. And in the context of Analyze this program, we don't know if it's at UID root. We don't know anything about this program. So we're just gonna say this is exploitable, even though it might not actually matter, maybe. Who knows? We don't. So let's pretend like it is. So, stars to our star, so stars to our argv. All right, so stars to our argv lives at evp plus 12. We put that into eax. And so that gives us star to our argv. We dereference that one more time, and we get argv, and then we dereference that one more time, and we get the first byte. Is that right? Yeah, the first byte out of argv, and we put that into ebx, all right? So then we go to our malloc thing, which we just went through, and then we go down to line 129 there, and we see that we are putting the lower bits of ebx out of bounds into that array. That's exploitable. That's what every bug basic comes down to. Anybody who says differently is, you can use a lot of colorful languages and a lot of really convoluted two day classes, et cetera, et cetera, just to get down to this one thing. And this is what it looks like in the assembly. It's really easy to find in a very small program like this, but it actually comes in all shapes and sizes. So how do we deal with that? Well, one of the ways we deal with that is, we compile the program a bunch of different ways with a bunch of different compilers. We do o0, o2, o1, oS, and then we do that for several diversions of GCC and Visual C++, and God help you borrow in C++. And so that's basically it. So if we go to the command line, bug report command line, congratulations, here are a few tips to enhance your enjoyment of your new stainless steel drinkware. Do not overfill hot liquids can scald the user. When filled with hot liquids, keep out of reach of children. This is a stainless steel product. It is not for microwave use. Fuck me, learn something new every day. So here's what we're gonna look at. So we're gonna run, by the way, this is a command line tool, because that's all we really need in this particular case. So you see, if you download the code and you are running Sharp Develop, or you've used Mono to build it, which is also relatively simple. That's it, you build it, it's bugreport.exe. It's about 2,000 lines of code, about 60% of that is unit tests. So I guess it's about 800 lines of code or something. And so, who's got their goddamn phone on? Give me a fucking break. Jesus, age Christ in a chicken basket. You born in a barn or something? So, so let's see, so we're gonna run that, and we have this command line option called dash dash trace, which gives us a debug output, basically, as it's going through and doing things. Right now, what we did to simplify things a little bit and make it easy to understand is that we don't actually parse L for P headers, right, you don't need to do that. What we've done is we let OBJ dump through the hard work for us, so we don't have to have a full on disassembler in order to analyze the binary code. That's not that we are not gonna have that. We were looking at the quickest way that we could start getting code to catch certain vulnerabilities. So we wanted to concentrate on the analysis part, and then eventually move on, we're gonna have to move on and decode or parse PE, ELF, and what have you. Yeah, so it was really kind of, for anyone who saw the XP talk, the extreme program we talked that we did yesterday, this is the simplest thing that could possibly work, and this is all about the analysis part as opposed to boring PE and ELF bullshit. So that's why we use OBJ dump to do this. We're not cheating, it's not difficult, it's a solved problem 100 times over. Therefore, it's not what we're talking about or focusing on. So yeah, so that dot dump file is basically an OBJ dump output for that particular binary, and all these tests are by the way on the source forage project, so you can just run these or look at them or whatever. So if I hit enter, wee. So it kind of looks like GDB output, and you're like, wait a minute, did you make a virtual machine or what? No, we didn't make a virtual machine. It kind of looks like it is, but that's kind of on purpose basically, because that's the way that, you're looking at binary code, you kind of want to think of it from how this run kind of perspective. Dash, dash, trace, is there just to help us debug things? We rarely, rarely have to use it because our unit tests usually catch all of our bugs for us when we do unit test them, unlike our main, which Douglas has fixed. Thank you, Douglas. So the relevant output here though, if we rerun it without dash, dash, trace, that's the relevant output. So that's the bug in that particular, in that particular problem we detected it's exploitable. So if we run it on our other program that doesn't have the stack taint at the end of it, it just says, it just says, oh, it just says out of bounds there. So we're able to distinguish between exploitable and not exploitable, which is amazing, because a lot of these tools don't do that up front for some reason. They're like, you've got a gazillion star copies, do something! And that's not very useful. It's only useful if it's actually exploitable. So that's what the main focus is. So this example here, it doesn't have taint, it's just an over, just an overwrite and it just shows OOB instead of exploitable. Yeah, it's just like the first one that we looked at, where it was writing out of bounds, but with zero, which is of course not used to supply data and whatever. It's not that user supply data is always bad, of course. Most programs take user supply data, unless you're like calculating the one millionth decimal place of pi or some bullshit like that. Otherwise you're taking user supply data. And so we're not saying don't take user supply data that's horrible or bad or whatever. You just have to constrain it properly and not have out of bounds rights. And we'll get into constraints in a little bit. So let's look at the next. Yeah, let's go through this first and we'll look at some code. We'll go back and look at the t-shirt code in a little bit. So dealing with branches. So the programs we looked at so far have been straight line programs basically. No ifs, no wiles, no whatever, no looping, no branches, et cetera, et cetera. That's not particularly real. So how do we deal with branches? Well, right now the way that we sort of, right now in the code, the way that we track things is that everything has an underlying value. It's kind of an immediate underlying value of everything because so far in our test program that's all we've seen. Malik 16, do this, do that. There's a kind of a constant number in there. And so what happens if you have, what the fuck did I say about cell phones? Jesus, H Christ. All right, so what the fuck was I talking about? Tracking values. Tracking values, that's right. So coming into a program, let's say you get a buffer from receive, right? You get a buffer of 10, 1,024 bytes. What do you know about the bytes in that buffer? What do you know about the values in that buffer? How can you make decisions in the program based upon that buffer? Can we look at a branch program real quick? And so the answer is you don't know. You don't know what's in there, right? So you don't, so without any hint or anything you have to kind of like figure it out. And so you have to like take a guess or say, oh well I don't know what this is so I guess I'll treat it as if it's the biggest number possible. Or zero, or the smallest number possible. Or null, which is what the current code does. So here's an example. So star star argv is exactly sort of a perfect example. Listen, that's what's at the end of that buffer. What's the zero width byte of argv? We don't know. So coming into this program, do we take that if? Do we do what's in the if? Or do we not do what's in the if? How do we decide? Well, there's a bunch of ways you can go about this but for the sake of brevity, we'll just pull something out of our ass. Well, if we're gonna be arbitrary, let's be really arbitrary and just be honest with ourselves. We could flip a coin, take it or not, and then, but the thing is, yeah, take it or not, whatever, so let's pretend like if we don't have any constraint, if we don't know anything about this variable, we're gonna constrain it as soon as possible in the program. So what we wanna do to help us along that path is we wanna take that branch because that's gonna help us constrain it. So after that if statement is executed and we do count minus equals two, on line nine, nine, what do we know about the byte at star star argv? We know that it is greater than or equal to period, right? So how do we express that? Right, so you go back to the slide. So we express that with constraints like this. We say value is less than or equal to y and greater than or equal to x. And so as we go through branch statements, we get more and more refined kind of constraints, right? So when we go to evaluate the use of this variable, say in a dereference of a buffer or what have you, we basically have a range of values we can use or several ranges of values we can use. And how do we decide which ones we use? We'll talk about that some other talk that is less time constrained. But for the sake of our argument, we'll choose the largest number that we can, this largest number and the smallest number because accessing outside of the array over the bounds this way is as bad as underwriting the array like this. So this is kind of interesting. And so we talked about, so we don't know anything about star star argv, for instance, we don't know anything about the buffer coming in from the network. So how do we sort of express that in this vernacular of constraints? Can you go to the next slide? And so we basically use the exact same vernacular, but we say it's negative infinity to positive infinity. We don't know, and if we were to codify this, we'd probably use negative max int plus 10 and max int minus 10 only because many people use max int for values in their programs or we'd have to choose some value that's kind of like a magic number. And yes, that has problems and it's a little bit bogus, but bear with me. So one really interesting thing that's worth bringing up here that's Jeff brought up in Louise's talk is what if you wanted to know what input it took to get to a certain point in a program? Which is a wonderful question, not a very difficult question to answer either if you've read any of the academic white papers whatsoever. So how would we do that? Louise, how would we do that? How would we figure out what input it would take to get a certain part of a program given constraints? You could figure out, you could try to figure out what type of constraints would get you down to certain levels, certain functions and whatnot and then go backwards from that. Yep, it's pretty much that simple. So once you're sort of going through the program and you're interpreting the program and you follow these branches and you don't follow them and you merge different constraints together, by the time you get down to the vulnerable code, you know what constraint had to be on every single byte in that network buffer to get you down to the point where the exploit is. And so you not only know one individual packet that might get you there by setting breakpoints and a bunch of manual nonsense that's gonna waste a lot of time, you know a range of packets you could use to get there. You know this byte could be between here and here. The next byte could be between here and here. And that's very useful and very practical in this particular context. Not a very hard problem at all. I'm not sure why anyone would think that was a hard problem given constraints and the basics of static analysis. So next slide. So so so so far we've been talking about to actually can we go to Loops? Loops is a interesting, interesting little topic. So so let's open up IDA. This is a show I don't hate IDA with all my soul. And so Loops are pretty much, Loops are a kind of branch basically. They're a branch that goes backwards instead of forwards to oversimplify things a little bit. And so what we're gonna look at is a vulnerability that actually my commercial product found, the product that's no longer in the market that I no longer work on or sell in any particular way. And so so this was a novel security vulnerability that the product found. And so some people talk about Loops in a very convoluted way and all my need are trees and here's this graph and here's that graph, et cetera, et cetera. I'm gonna give you a really simple algorithm. You can code into an IDA plugin in about 15 minutes or so if you can get past IDA's API that is. And or you could just code it into bug report and not use IDA at all, which would be really nice too. So yeah, this is an exploit I found in trillions of client-side vulnerability. It isn't what I call an unbounded buffer iteration bug. This is these are my words, I'm making it up. Maybe someone uses similar words, I don't know. For the sake of argument, I'm making it up in my words. So what I mean by unbounded, I don't mean like a while one loop. What I mean is that there is no counter based upon the destination buffer size that buffer being written to. There is no counter based upon that that will terminate this loop before that buffer is written out of bounds. That is a lot to swallow. So if we go back to the slide real quick, here's what it comes down to, folks. We wanna find a control flow block that forms a closed loop on itself, a loop, if you're into the visual thing, which a lot of people are. We like to be trapped in our GUIs and our GUI metaphors. And so where our pointer is written to and incremented. Written to and incremented. So we're writing to a thing in this buffer and we're moving the pointer next and we're writing something, removing the pointer next. And so the exit from the loop, if we can find one, if it is not a while one, for instance, is a tainted byte comparison and said byte was written to the pointer in question, the buffer in question. And so this is maybe a slightly narrow definition, but for the Trillion bug that we're looking at, the MSRPC bug that the Blaster room exploited, let me see the PGP key server bug in their LDAP protocol parsing that I found and never released. And what was the other one? Oh yeah, the Half-Life multiplayer server, which was the actual bug that I based all of my detection upon that found these other novel bugs. So this is kind of constrained. It could be a little broader to find more bugs, et cetera, et cetera, but for the sake of brevity and time, we will go back to the Trillion code and assume that that is the algorithm that we are going to use. So what do we got here? So we have a receive. So if I call the receive, that's gonna get tainted data. It's gonna fill buff, which is Eax there. It's gonna, there's no laser pointer, laser pointer would be nice. In the Black Hat class, like the things are like right here, so I can just like go like this and try to do that here and it doesn't work because I'm retarded. So Eax, so after that call to receive Eax, the buffer is gonna be filled with tainted data. In this particular case, they fill with one byte of tainted data, you read byte by byte, which is kind of funny, it's a little strange, but whatever, Eax contains a tainted buffer. And so it checks the return value of receive, we don't really care about that. And then it goes down into the block on the left. The block on the left, which gets a byte out of receive, puts it into AL, cop, which is tainted. So AL now contains tainted data. AL then copies itself into the ESI plus EDI, the dereference thereof, right? So in this case, EDI is the base pointer, the base buffer basically. ESI gets incremented, right? EBX gets incremented, then we compare AL to OAH, which is a carriage return, I do believe. And if that's not true, if we haven't hit the carriage return, go back. This is an unbounded buffer, this is exploited, by the way, it's a client-side thing. A lot of people focus on server-side stuff, I think client-side stuff is more interesting personally. It's just my personal preference of what I care about. It's all, no judgment. So that's it. So we have all the pieces here. We have tainted data, we have it being written into a buffer, a component of that buffer is incremented, and we loop and loop and loop based upon the tainted data coming out from the network. Now this seems like ad hoc and contrived and et cetera, et cetera, I've probably found a dozen exploitable bugs in real commercial off-the-shelf software using this exact kind of algorithm, right? And so we can talk about dominoes or trees and graphs and lots of really complicated pictures and bullshit like that, but if it doesn't find any goddamn bugs and you don't release a tool, who gives a shit? It doesn't matter. If you can't practically apply the knowledge, it does not matter, don't care what anybody says. So that's an exploitable bug. Don't use Trillion, they didn't fix all their bugs. I tried reporting it to them, they were really responsive at first, and then they dropped off the face of the planet. Trillion 3.1, which was last released they did, which I think was almost two years ago now, they fixed some of the bugs wrong. They know things that they copied and pasted this exact same code, which is HTTP response, HTTP header response parsing, response header parsing, was copied and pasted into almost every single DLL that they had and they didn't really fix them correctly. So, and they didn't fix all, they didn't even fix all of them either, so I really wouldn't use Trillion, it's just based upon this particular bug. So that's unbounded loops, and that's that. That's all there is to it. I mean, that's really all there is to it. It sounds overly simple. You'll find so many bugs. It's crazy how many bugs you will find. And every protocol parser is gonna have loops like this. We gotta look through about for this. In the case of the Blaster Worm bug, in the case of, yeah, the Blaster Worm bug that the bug in MSRPC, the Blaster Worm exploited, it was doing basically the same things except looking for a backslash in the UNC path. In the case of the PGP key server bug, which may still exist. I don't know how to report it to them and they threatened legal action. It was an asterisk, right? It's all the same kind of stuff. So you don't need God's own academic static analysis like algorithm that's, you know, the utopian solution for God to find bugs. You have to be pragmatic about it. So go back to the slides. So that was the truly next point example. And it basically comes down to this. The tool has not codified this yet, but as you can see from talking about it basically, it wouldn't be that difficult to write this code. Again, it'd be probably a 50 line IDA plugin. Or in the case of bug report, it might be less code and be in C sharp and not have IDA's shitty plugin API. Shitty in my opinion, anyways. So we've got to talk to bounded. So Banner loops are not, well, they're bounded. Basically, they're not unbounded by my definition. This is my definition of bounded and unbounded. So can we look at the Vulner server? C dot the C file. Vulner server is a thing that was used in a Purdue secure programming class. The reason I like it is because a bunch of people from several software vendors helped write this based upon real bugs that they had in their commercial off the shelf software. Not gonna say which vendors that it was. Other than it's cool that vendors are trying to help take real world bugs that they had and teach people, teach students about these bugs, what they look like, why you shouldn't do that and how to fix them. And so this is a contrived program to some degree, but I didn't make it up. And it's based upon real world bugs. And if we're not looking at real world bugs, who cares? You don't have the dot C? No? What can we do? What can we do? Oh, we can look at the, something you speak, oh, we can look at the assembly. Wait guys, you guys wanna see assembly? Yeah? Assembly in his house. Oh, you don't have it at all? Oh, that's balls. Let's see here now. Do you have any bounded anything? Is listener bounded? Listener is not bounded. That's a deep thing. That's a crunch. Yeah. Actually, can I see something? Well, we'll just talk about it in the slide. So, it's the exact same thing except in this case there is a counter. There's a counter counting down in the loop saying I'm gonna terminate when this counter reaches zero or when it's greater than some value or et cetera, et cetera. And so, what is that counter based upon, basically? So what you need to look at is each step through the loop, there's a pointer that's rent to and incremented, right? So one of the things to keep in mind is that you don't have to go through the loop a thousand times and keep sort of executing it and get caught in some kind of loop k-hole. What you have to do is you just have to analyze it once and then try to figure out, okay, the reason, the reason to the metaphor is the reason or the circumstance under which we exit the loop is if the counter reaches zero, right? That's the reason that we leave. So if ECX is zero for the sake of argument, we're gonna leave that loop. If we see that ECX is decremented each time through the loop, right? And we know that ECX comes in with a value of 15. We can extrapolate, we can do simple multiplication and go, okay, it's decremented by one each time. It's gonna exit when it reaches zero, therefore it's gonna go through the loop 15 times. Therefore, if the pointer that we're writing to is less than or equal to 15 or there's less than or equal to 15 bytes or whatever left in it, that's an exploitable bug. That's an out of bounds memory, right? Again, really complicated loop, all this bullshit, et cetera, et cetera, out of bound memory, right? I can't repeat this enough. Out of bound memory, right? What the fuck's going on in there? All right, let's hope wind doesn't dive a heart attack this year, everybody. Orserosa's a deliverer. Anyways, so without that code, do you wanna write something real quick in C? That was to move on. We'll move on. So let's go back to interfunction. So we've been talking about one function at a time. So wow, functions don't really have, real problems don't really have one function. How do you deal with something that has like 10,000 functions like oracles, 30 megabyte binary? Their main service binary is yes, 30 megabytes, 30 megabytes and you're gonna buy componentized things from a company that doesn't know how to componentize something that has 30 megabytes. I guess the GHB floated to the bottom. So top-down stuff. You're talking about top-down stuff? So top-down is kind of a classical way to think about things to some degree. Basically, you start at the top of the program and you go down. That's basically what you do. So we'd start at main and go down. So can we look at an example with we just skip to the immediate parameter one. Is the author of TextPad here? Very wise not to raise your hand, sir. Anyways, so interfunction stuff. So here's a really simple case of an interfunction kind of value tracking thing. And that is we have two functions. We have a main function and we have another function called bug. Bug passes zero in as the parameter to bug, right? So bug then mallocs based upon that parameter, which is size and then it writes a zero at array offset zero, right? If you malloc zero, there is no storage. Writing anywhere in the array is completely bogus. So that's a bug. Or rather, it's a bug because we passed in zero. If we had passed in one, it would be okay, right? So in order to not false positive or to not have a false negative even, in this particular case, we would have to do interfunction value tracking. So how do we do that? How do we do that? We would trace the constraints down to the next function. That is correct. Basically, all we have to do is follow the call instruction, right? We're doing our, we have to do our stack emulation to make the malloc work basically or the malloc emulation work. So we just have to push things on the stack, do the call, pop things off the stack like we're gonna see and use it just like we're interpreting the program, like we're emulating the program. So can we look at the assembly for this? It's just pissing in our face at this point, like the font is big, like that's just what we used to like it not working and all of a sudden works. That's awesome. Oh, is it? Actually, I think that deserves a teacher. What size are you? Medium? Medium size? Or the assembly? Yeah, so it just comes down to interpreting the instructions correctly. Once, once, once again, interfunction top down is not that difficult. So where is, is this the O2? Yep. So in the O2, it's inlined it completely because the compiler is basically said, oh, that function, your column's really small. I'm not gonna do anything. It's calling it. Oh, it is? It's calling it. Oh, that's weird. Is it O2? Oh, okay. So it's doing something clever that we won't get into. But anyways, so what it's doing is, is instead of doing a push, it's putting the zero onto the stack at EBP minus eight because minus is? Plus. Huh? EBP plus eight. And EBP plus eight is? It's gonna be an argument. Is an argument. And EBP minus eight. I'll open the OO. Okay, yeah, the O zero will be a little easier to read. So it got a bunch of stacks set up and things like that. So it's putting zero, on line 145 there. It's putting zero on top of the stack. It's doing the call to bug. Bug is up above. So all we have to do is follow the target of that call basically. Which is no problem. And so we have push EBP. We have some other stuff like this. And it's doing things in a very convoluted way because it's O zero. O zero, it's not GCC's fault. GCC doesn't suck. It's just O zero. And so it doesn't do any optimization. This is what you can expect. O zero, you'd think O two, O three would be the hardest kind of binary to analyze. O zero is a hard six of how tedious it is about moving things back and forth that basically do nothing. Anyways, so. So do do do do do do do do do. Yep. So we then have a move EBP plus eight into EAX. We then put that on the top of the stack. So EBP plus eight is the argument we passed in. We put that on top of the stack and then we call malloc. And then it doesn't move. That's useless. And then that is where it is moving into the array sub zero. Just doesn't put the plus zero, but it's basically zero. And that's our out of bound memory, right? And the EAX would return from malloc. So we did malloc zero. We put something in and it's out of bounds period. Interfunction value tracking. So why is that so hard? One of the difficulties with top down stuff is let's say that this call stack. So we have a thing that's one call stack to eight. Let's say the call stack was actually 800 function calls deep, right? We had one function up here that called 800 functions down, down, down, down, down, down, down, right? And so we called that top function from multiple places. Every time that function was called, we'd have to go all the way down the call stack and all the way back. It's like a yo-yo in the Grand Canyon, right? It's a lot of exertion for not a whole lot of results. So if only there was some way, if only there was some way we could cache those results because basically once we analyze them and we kind of come up with the constraints to describe those, if only there was a way we could cache those, how would we do that? How would we do that? You talked about this one. So we have all these functions. It's all gonna trickle down and they get called multiple times. They get called in a row. So what we're trying to do is, if this is gonna happen over and over, we basically have a description of the function. We can create what is kind of like a signature or a description of the function. Similar to what we're doing right now with malloc, we're not really calling malloc, we're emulating malloc. We know, okay, if this is a call to malloc, it's gonna take one argument, it's gonna do this one thing and it's gonna return a pointer to a buffer. So as we start analyzing from top down, we start looking at all the functions that we've already gone through and we can start passing things back up. We can start passing up these descriptions back up. So then we have descriptions of all these other functions and this is part of what makes analysis take a while is that you're gonna have to construct all this data back up. So he's gonna bring up an example right now. So in this case, we're actually passing, we're gonna call multiple functions. First we're calling something called getTaint. That's gonna set, when we're in main right now, T doesn't have anything in it when we call getTaint, the data that's gonna be contained in T is tainted. It had to go to another function and that function modifies it and creates a taint. That taint is then passed down to our vulnerable function bug. And bug is gonna, bug is writing, it's gonna malloc and it's gonna write to the first position on that array and if you do malloc zero, some implementations are gonna give you some type of buffer. Other ones are not. Depends on what version of glibc, blah, blah, blah, all these different things but basically you should never be doing a malloc zero in a program. You should never see that as a standard thing that happens. So we're writing our tainted data to the first position. So when we're doing this top down, we're gonna have to know that this function sets tainted data that comes back up and then we have to drop back down a bug. So basically with bottom up, what you do is you analyze each function in isolation without regard to the rest of the program and what do I mean by function? There's something I really wanna impress upon people here that is there are no functions. Functions are a construct inside of source code to make it easy to sort of organize commands and things of this nature. In assembly, they do not exist. All right, Jedi mind trick. There are no functions. Everybody say it with me. There are no functions. Thank you. This is funny but it's really important and what it is, people keep talking about functions and they keep using them for metaphor functions even though in like Windows XP and above with non-contiguous functions, guess what, they're not fucking functions anymore. They're not contiguous anymore. It was all an illusion. It was all some magical dream. The reason people keep thinking this way is because the tools that we're using which are overextended 80 by 25 UIs basically with tabs and resizable Windows, they're still overextended 80 by 25 UIs still refer to things as functions. Functions do not exist. There are only basic blocks. There are only control flow blocks. Some control flow blocks happen to be the target of a call instruction or a jump instruction or a whatever instruction but it is really, really important to free your mind from this stupid bullshit persisted by these UI paradigms from the 1980s, the early 1980s, right? These things that look like turbo debugger that never grew up. It's like Peter Pan, it's like the UI that never grew up. It's crazy, blows my mind. People are trapped in these metaphors. Anyways, so we have basic blocks basically that are kind of grouped together in a function like thing. And so we kind of discover these as a path through as kind of like a code discovery path basically. So we're discovering the basic blocks that are the target of jumps and calls and things like that. That are the sort of the entry point into kind of a collection of basic blocks, if you will. Or you could call them functions. I'm gonna try not to. Okay, yep. So we do that discovery pass, let's say, and we discover that there is get taint, there is bug, and there is main. So in bottom up, you describe each of these in isolation without any regard to how they're called or anything like that because we don't care in bottom up. So if we did one pass, let's say through get taint, all right, what we would do with get taint is we would say, oh well, get taint ultimately does a dereference of the first parameter and that's what it returns all the time. There's no constraints or anything. That's just what it does. So there's an operator of dereference and an operative dereference and we apply that to the first argument. That's how we abstractly describe this function, okay? And so let's say for the sake of argument, main was the next one, well, we happen to know that at this point in main, right, let's pretend like we don't know anything about get taint, right, we're describing it in isolation. We don't know anything about get taint. So get taint gets executed and we're like, whatever, we know that that's a kind of a call to something but we don't know anything about that yet. So we'll just have unknowns in our registries. We talked about unknown values, et cetera, et cetera. It's unconstrained. We don't know what the constraints are when it returns. Bug, we go to run that. We're like, we don't know anything else either because we're not calling these things. We're describing these functions in isolation, these basic, these groups of basic blocks in isolation. The next one we're gonna look at is bug, right? So we look at bug and we analyze it and we go, okay, there's a malloc sub zero and then we put the first argument into the zeroth array and into the zeroth element of that array, right? So there is a buffer. There's a buffer value of size zero, i.e. you can't put anything in it and we are going to put the first argument into that zero length buffer. So that's always a bug, right? So we can describe, this is always a bug, but if the first argument is tainted, it's going to be an exploitable bug, right? And we can know that just looking at the function here in the source code, bug. Functions are a construct of source code, not binary code. Doesn't matter what your UI says. It's an illusion, it's a construct. So that's what we know, right? So in our first pass, that's our discovery. So when we do our second pass, we know everything about every function here. So we were able to say, okay, in main, let's start with main for the sake of argument, we know that getTaint returns a dereference of the dereference of the first argument. Therefore, we know that argv going in, the dereference of the dereference is tainted, right? When we go to evaluate that, so we know that t is going to have a tainted char in it, right? We know that bug is always going to be having out of bounds bug, but if the first argument is tainted, it's exploitable. In this pass, we know that t contains tainted data. Therefore, when we call bug, it's going to be exploitable out of bounds error. That's bottom up. Here's the problem with bottom up. Can you go back to the slides? So the problem with bottom up is deeply nested calls, basically. So we're talking about a call stack that's 800 levels deep. Okay, so in order for it to bubble all the way up to the top, basically. So basically each pass you do, more information bubbles up to the next level in the call stack, right? Next double up, next double up. So if you have a call stack that's 800 levels deep, you still have to iterate over 800 times to get that to bubble up to the top, right? So if all your call stacks are 50 levels deep or so, but you have one that's 800 levels deep, it's kind of unfair to your program, right? The other one's going to be, you're going to iterate 50 times over the other ones and basically be done. You can't iterate forever. So one of the things that I learned in a practical implementation that worked and found real exploitable bugs is you have to do both. There's kind of academic bullshit cockfights about top down versus bottom up and this works and that does it, et cetera, et cetera. I found you used to use both basically. You get the best of both worlds effectively. So you're able to do bottom up and get most of the way there in a short amount of time and then you top down to basically meet in the middle and it sounds ad hoc and I suppose that it is, it's not extremely academic, it's not provable, et cetera, et cetera, but it happens to work. Thank you. We talked about that. So basically let's tell you to detect exploitable bugs, right? And so the C-sharp code that is in bug report doesn't do interfunction value tracking but we've described how easy that can be easily, that can be done. It's not that difficult. The metaphors are all there. So what does bug report do? Exactly. Can you show the system test list text real quick? So basically, since we've done this in a test driven fashion, we have a lot of unit tests, et cetera, et cetera, but we always start out with a system test of a program we want to find a bug in. This is our list of system tests. We have a Python script, well soon to be extinct Python script, thanks to Mr. Douglas here's refactoring, that basically goes through, runs bug report on all of these things on all those test tests and checks to make sure that the results don't change. Geez, that sounds just like common sense. Why wouldn't everyone do that? Beats the shit out of me. Some people just like to not know, I guess. It's like an adventure to let your customers find bugs. Whatever, I don't know. Different strokes make the world go round. I don't know. So this is what we talked about. Basic value tracking, out of bounds memory detection. It comes down to out of bounds memory detection in this case, which was in a very narrow kind of thing. We dealt with branches, we dealt with loops, we were finding real world exploitable bugs and how that would work with real working C sharp code. And that is at that URL there. One thing I'd like to mention, I think someone's gonna bring up switch statements and like jump tables and things like that, it's all comes down to constraints. If you take the jump, you know that ECX is this and therefore that's the constraint on ECX if you take the jump, for instance. And so struct discovery, all that kind of stuff, it all comes down to the same kind of stuff. It's all values, it's all abstract values that point to things that have a certain length or a certain constraint, et cetera, et cetera. With that said, thank you very much. Are there any possessions? I apologize, I didn't see the whole thing. How do you deal with where you have two different user inputs of some form or something that can't be determined by statically looking at the binary? One size is up a buffer and the other reads into that count. And I understand what I'm saying. So you've got a value that isn't in the program itself without running. Correct. So we did talk about that. Two different sources. Yep, so the constraint is still negative infinity, positive infinity until it gets constrained by an assignment or by a branch. Basically, the constraint is unconstrained. Like we don't know negative infinity to positive infinity. If we don't know what it is, if it comes from some unknown source where it's two different places or whatever, the source doesn't matter, basically. It's how constrained is it or unconstrained is it? If that doesn't answer your question, come up to me later. So there's this points to thing in the abstract value class. Oh, awesome, someone looked at the source code. T-shirt and the taint, the add taint doesn't do anything with it and there's this comment like to do, do something here later. What's a trade-off? What are we missing? Or what could we get if we did something and what would that something be? So for pointers basically, a pointer isn't really tainted, right? It doesn't contain any data. It's just kind of a pointer to data. So how could a pointer be tainted? It can't be. So kind of trying to figure out like, so I guess it could be tainted. How would we deal with that? And we don't have any tests, any system tests that kind of drive that. So we put a to-do and said, okay, that doesn't strictly pertain to our current customer test, but let's make a note for later so we don't forget about it. So later, do you try to solve the constraints, come up with some input and say, this points to things that might be tainted. Correct. So let's actually get an input. It could be, it could not be, the constraints will let you know basically. And just one quick question. Which constraint solver do you use? Just just roll your arm. We haven't read it yet. We could use any which one of them. There's a whole bunch of them out there. It's an academic thing that's been solved a long, long time. I've been using CVC Lite, it works pretty well. Oh, cool. It's good to know. Thanks. Hi, I was wondering if you had read much of the papers that were written by like Dawson Engler and that group doing that. Yes. Coverty people, yes. Yeah. I know them. Yes. What's your opinion on the research? I was curious. I mean, it's some good ideas. Absolutely. And they're really good at finding null dereferences and memory leaks and things like that. As far as value tracking and stuff like that, I really don't know. I haven't used the program personally, but I do know that PCLint finds a lot of the same bugs pretty accurately. And PCLint does C++ and a lot of other things that they just refuse to do effectively and does it very well for $239 for any size code base. And I do believe that their licensing is still per line of code, which kind of blows my mind, but different strikes and if they roll it around and that's their business plan. So I haven't used it personally, but those are my impressions. Does that answer your question? Yeah, cool. PCLint, by the way, is the one that I like. So everybody knows it's one I use personally. I bought my own personal copy. I use it in a lot of my open source development. Help me find bugs. Yes, Ryan? So I don't think you can get rid of the function abstraction entirely. Eventually you hit a malloc or a receive or something like that. And speaking of which, if you're gonna use object dump until you get your own disassembler, what are you guys doing about the lack of a flirt library or equivalent? Basically what we're doing right now is we're training ourselves, as you look in the system test list, to O2 compiled with dash G. Or anything compiled with dash G with GCC 4.0.3. So what we've done in these initial things that we've constrained ourselves so that we can provide a lot of value on open source code compiled specifically on my computer. All right, so all self-compiled stuff you can get the symbols and everything, right? Yes. Do you need a disassembler before you can go after the raw binaries, closed source? Well if you go look at the code actually, a lot of stuff that we've had to do basically to parse opcodes correctly, just in the very simple programs that we have, we basically can do that now because we know if there's a mod RM, we know if there's a Sib, we know the length, et cetera, et cetera. So just a matter of kind of integrating that and doing it. It basically could be sort of a refactoring step and more than a function feature ad. Good question. Do you want a GPLV3 t-shirt? Got them hacker-sized? We have an XL, football player, right? Quick question about constraint. It seems like since you know it's a computer, you know bytes are zero to 255, registers are either 32-bit or 64-bit. Would you really want to be doing max-int plus 10 or can you just do zero to max-int and be done with it? If you could, like there's no reason to do that. You're a hundred percent correct. Just curious. Yep. But basically you want to express that you don't know. Basically there's a difference between zero to 255 and beats the shit out of me. So that's an important distinction to have that it has been constrained or it hasn't been constrained. That's the main thing. It doesn't have to be max-int. Well, I mean it's constrained by the physical of the computer guaranteed, right? So you start there and just shrink it. That's true. You still want to express the intention of we have no idea what this is. It could be zero to 255, certainly. But for the case of static analysis because otherwise you said zero to 255 but if it's used in a context, you don't have a false positive or a false negative or whatever. You want to be able to go, I don't know about this. So you can have report items like, I absolutely know this is a bug. Absolutely know because I know everything about the constraint and have another kind of report item that is like, this could be a bug. I couldn't quite figure it out because I couldn't quite narrow the constraint enough. So while you're right, the intention of the negative infinity and positive infinity or max-int or whatever, infinity to infinity is still important, especially when you come to subtraction and addition. So if you have an unknown value, if you have infinity and you subtract one, what is it, infinity? We still don't know what it is. Whereas if we say it's zero to 255, let's say, as like a concrete kind of range, if we then subtract one, there's a difference. You have the user inputted data and you know it's a series of bytes and then you either do constants on it or some sort of system config notification on it. So it can't ever be in a state of infinity really. Yeah, you're right, you're right. I think we're in violent agreement as it is, yeah. Actually, I really appreciate this talk. It's been quite a week to go to talks at Black Hat and hear that, oh, we're just barely starting this to Halver's talk. This is nearly impossible to this talk that it's not only possible, but we're doing it. Yeah, I really thank you very much. I really, really appreciate that. Thank you very much. If there's no more questions, you come up afterward, that's fine. Thanks, guys.