 Today, we're going to be talking about programmatic debugging. Now, before any of you get up and walk out, because I didn't tell you exactly what that means, let's go a little bit about my past. I like to lead that up, because it gives me like 15 seconds, and I'm done, and you guys are happy instead of saying, hey, why didn't you talk more? I was raised on a computer, had one of my seven or eight dead work for IBM, took the programming, really liked it, then went and got a degree, started out as a vocal performance major. Trust me, after CTF, this long you do not want to hear me sing. It would have to be some sort of Elvis melody or something, because I just can't hit anything high. So after spending a year and a half as a computer science major, I jumped into pre-med, because, well, I didn't want to go into computer science without testing other areas. And of course, Bachelors in Computer Science was in for me. Got out, got into a teaching job. And I don't know, computer sought me out, did a lot of fun stuff, and, well, I'm here. I do a lot of research into security issues by divine intervention, some help from friends, and, well, because I love it. I lead a capture the flag team called Last Place. Anybody heard of capture the flag? Lots of fun. I encourage you all to try out. You've got to qualify for it. So you've got to sign up a couple months ahead before DEF CON. So check out Kenshoto.com, and you too can have a lot of fun. I have a lovely family. And they see me probably less than they'd like, but more than some other people. I keep trying to tell them that, and they don't get that. What is programmatic debugging? Well, you might think, what do you think? Let me speak it out. Fuzzing, interactively automating the debugging process. Dude, that's really good. Can we relay this, because I don't want to throw that far. That was close, but I'm getting there. You're jumping way ahead. I'm sorry. The idea behind programmatic debugging is the use of debugging code in a canned fashion. So you write a program to debug another program. Not always, but we're talking rough definition here. So what's debugging? Well, it's getting rid of bugs, right? And if you think that, well, I should probably get out and let Martha Stewart come up and finish the talk. I don't care about getting rid of bugs at this very moment. And there's a reason that I maintain my anonymity, because I can say that now. In my day job, well, I can't quite get away with it. But what am I interested in when I'm doing programmatic debugging? Somebody, finding bugs. Very smart. I don't care necessarily to get rid of them, but I'm interested in finding them. A debugger allows you to access the important components of a running process. By the way, I've not slept much. I've been competing in TCF. So if I trip and fall, or if I fall asleep during my talk, somebody please get up and nudge me, and I'll carry on what I can. It's the access of a running program, its memory, its CPU state, and several other goodies, in a way that is outside of the program. Again, it's a rough definition. Some of you can argue that, but that's what we're going to go with. In the last couple of years, I've seen the advent of several very interesting tools that implement programmatic debugging. The first one that I ran into is Pi Debug. And some of you have heard of it. Pi Debug is a part of the Pi May hacking framework. Thank you, Pedram. It's a really cool tool. Unfortunately, it only works on Windows. It does work in Python, woo-hoo, which is good. ImmunitySec just came out with a debugger with a Python interface to it. Basically what it looks like, and I can't speak to their coding practices, but it looks like they took Oli Debug of great fame. Anybody love Oli? Oli's good. I like Oli on Windows. They took Oli Debug, or at least the look of it and feel of it, wrapped it into their own debugger with new themes of colors, and provided a Python programming interface. Kudos to them. The Python interface was very smart. Then Invisigoth, in a very unvocal way that is Invisigoth, wrote a tool called vTrace and released it to the public. vTrace is what we're going to be talking most about because, well, vTrace is my favorite. And I'm the speaker. And if you have a problem with it, you talk. I promise to a good friend, I'd tell you about Noxdebug a little. Noxdebug is a Ruby-based program debugger. Used to be that all debugging frameworks were just you had raw C code that you had to write to. So this is why GDB has, well, it's one of the few debuggers on the POSIX platform. And it's why there are so few real debuggers on Windows because we had this C interface. What I'm seeing, however, is your favorite programming language and mine. Well, yeah, Python. Coming out with their own hooks into debugging, such as pTrace on the POSIX side and the Win32 debugging routines on Windows. What does this mean? Who cares? I'll never write a debugger in my life. I don't know why you're saying this to me. You should care because we've just taken the difficulty gap and shrunk it immensely for you too to do amazing things using programmatic debugging. So what can we do? Well, a few examples. From, am I allowed to say that I helped the Intel Guardians folks in their venture against virtual machine environments? You guys heard about Intel Guardians, right? Intel Guardians, they rock. They're so cool. They put government dollars to good work. I know you don't believe me, but to good work, government dollars, yes. Same sentence, good. DHS money investigating the relative security of virtual machines. Why? Well, because you probably have virtual machines in your data center. And probably on most of your desktops, at least most of your desktops in this room, I know I've got one in every system I own to prove that virtual machine environments are not to be used as security divisors. You do not mix security domains on virtual machines. Anyway, I got to play a part in that. And it was really cool. I built a lot of tools and some since then. And I'm going to tell you a little bit about them. Life patching. Anybody hacked any video games? Get around keys or whatever. You don't have to admit it. It's illegal. I'm not encouraging it. But I know some of you probably have done it. In fact, some of my best friends got into security field by, well, something like that. So I wrote a tool called Life Patch. I give a memory location, and I give what bytes I want to see there. From a bash shell, it attaches to a binary, modifies the running process memory, and releases it. And it gets to keep spinning. Live dumping. This is something that I often couple with process grep. Live dumping is the idea of, from a bash prompt, I run my tool, I give it an address and a length. And it spits out in nice hex fashion all the bytes in that chunk that I've just described. Then there's process grep. I don't claim responsibility for this one. Although I probably will have to give it to you since Invisigoths saw a fit to remove it from his tools. Mem grep is really cool. You type mem grep from a bash prompt. You guys get the idea like bash to. You can use cmd.exe as well. The idea is you type mem grep, give a seed or a needle, as it's called. Mem grep attaches to a process, searches through that process for the needle in your haystack, or process stack, or whatever kind of stack it's looking at. And a tool that's kind of had to take a back burner recently due to preparation for CTF and a few other things that work. Vampire Jack SSHD. This is a tool that Doc Brown and I wanted to get ready for CTF this year. But the idea is you create a program that attaches to the SSHDman, handles children as they're spawned, and then basically Vampire Jacks in and allow standard and standard output to be written to a file, even though you're not the one that has the session open. Really cool, fun stuff. What else can we do? Well, basically everything that your favorite debugger can do, we can do with Vtrace or one of the others that we mentioned. I like search memory, the basis for Mem grep. Anybody done searching through memory, space, and GDB? Oh, my goodness, do I feel your pain? I actually, since then, since finding Vtrace, I have figured out a way to make GDB's very obtuse programming language do searching, but it's extremely slow. Mem grep is very fast, and search memory. In order to use Vtrace for all that I want to use it for, I found that I have wrapped it into my Python shell. Yes, I like to use Python as a shell. Interactive Python is an amazing way to compute. By using Vtrace, it allows me to attach, just randomly attach to a process or start one up, step through execution, searching, running, break pointing, everything that you would do in a normal program, I can do right in my Python shell. The real question here is, what do you want to do with it? At the end of this, you're going to walk away with some rantings and ravings of mine, but I'm going to stick you with this challenge. What are you going to present on next year? What are you going to do with this information? Too many people that are very, very intelligent, or motivated, or whatnot, simply lack the encouragement to be excellent. I want to take that, take this moment, and encourage you all. And I will do it again, because that's just how I am. People like pictures, so I threw these in. I don't know how well you can see them. On the left-hand side, I've got a Sands program written by my good friend, Ed Skotis, called Format Flaw. Sorry, Ed, I didn't ask your permission before throwing it in. On the right-hand side, I've got an interactive v-trace session open. So we run Format Flaw, we import my own interface to v-trace, and I tell it to attach to Format Flaw. Gives me a little back trace, tell it to run, and then when I overflow the buffer, we get, oh, what is this? Anyway, we get the different messages that I can't read right now. Sorry. Using a tool, just a little helper script that I wrote, it basically provides me with the view that I like to see in GDB. In v-trace, allows me to step, so you see my next instruction, tells me where I'm at. This is stack information and more stack information from a different perspective, all my registers, where I'm at, and what the next instruction is. So the real question here is, so what? Yeah, yeah, unless you have fun. Cool, we knew that. But wouldn't it be interesting if we could take this new knowledge and apply it in some applicable way? I know that Java has a bad name. Probably some of you would cheer for that. Java has a bad name probably because marketers saw one little component of Java and ran with it, called a Java applet, and they marketed the crap out of it. And they had everybody and their boss writing Java code for little applets. And it was annoying, it was slow, and it got a bad name. The problem was, they had selected only one of the usable things of Java. So what do we do with each new piece of information? This went in, for example, how do we take programmatic debugging, or the ability to interact with a debugger at a Python code level and encourage vulnerabilities to magically appear? That be interesting? Well, that sounds kind of like fuzzing, right? I mean, you fuzz, and you make a program break, and that means that, well, you've got some sort of a vulnerability. Even if you can't exploit it for shell code, you can at least DOS the program. This is a little bit different. What we're going to talk about here is something that can be applied to fuzzing to help it. But the idea is more, how do we indicate that we're breaking? But more importantly, how can we indicate that we may break it? See, fuzzing is weak because only when you truly break something do you have anything at all, and then you have to try to track back how you broke it in the first place. If we could stop a buffer overflow before it happens and trigger and mention, hey, this might be a good place for you to go look, these things are more important, in my opinion, for vulnerability discovery than simply breaking a program. Although, I've got to tell you, I know some great fuzzers who break programs all the time. In fact, one of the Kinchotto guys was referring to what he does during the day, and he says, I think he says, I just throw a lot of poo and things break really cool. I like elegance. I'd rather know where that poo really hit. So let's take buffer overflows, for example. There are several major vulnerabilities in binary programs. Buffer overflows are probably the sexiest. So how do we identify a buffer overflow or a potential buffer overflow? Just so I know, how many of you have written any Python code at all? Raise your hand. Excellent, excellent. So you've probably heard of something called an object, right? Object orientation in classes? Well, Vtrace introduces a breakpoint class, which can be subclassed to do whatever functionality you want. So we set a custom breakpoint at key functions that are notorious for causing buffer overflows. We do some stack analysis and buffer analysis, and we can identify if this particular overwrite is going to go beyond the length of this buffer, or at least to a high degree of certainty. Yeah, there's a great deal of value in it. Give you a little example of attaching with Vtrace. We start Python. We run from Vtrace import star. That's how I roll. Me equals gettrace, and then attach to the PID. We add some breakpoints. I like to tell it run forever, especially in this type of work. This means when I hit a breakpoint, execute the code at that breakpoint that the object identifies. Remember, custom breakpoints. And then keep going. I don't want to break and take a look at the stack and look at the registers. Just execute the code and keep going. Then I tell it to run. So for the next few slides, I'm going to go through and address how we might approach this from some of the very most popular ways of causing a buffer overflow. So memcopy takes a source, takes a destination, takes a length, and copies from one to the other. It doesn't care anything else. So we break at memcopy, or at a call to memcopy, specifically, before it does its little prologue thing and sets up its new frame pointer. So we check the length of desks. We can find desks, the destination buffer, by looking at the stack pointer and moving up four bytes. That gives us a pointer to either, typically, either a heap location or a stack location. If you don't know what those are, that's OK. Don't get up and walk out. I'm not going to kill you. But they are just two significantly different and very important ways to store information. On the heap, specifically targeting deal malloc in this case, we can often check the length of a buffer because the pointer to a malloc buffer is exactly four bytes after the length of that buffer. Kind of cool, huh? So we simply go to the pointer and we back up four bytes. And we read in four bytes. How many bytes, or how many bits is four bytes? 32. Why is that important? Yes, this talk will be focused on 32-bit operating systems. It's just the way I had to write it. 64 bits. On a 64-bit system, what would this number be? 7. Exactly. Give that man. Oh, dude. You are so poised. By the way, I got to give credit. He squires. I won't waste this on them. Invisigoth and squires, everyone. They love giving me crap. The dude without any hair until you see the back of his head, he's the one who wrote V-Trace. That's Invisigoth. I got you. This one's got your name on it. You come back and I'll throw it at you. So anyway, as we were saying, thank you very much. So anyway, we jumped to the pointer for the heap chunk. We back up four bytes. We get the size, right? OK, in an ideal world, that'd be true. We'll go over in a few minutes how that's not always the case. But if you actually hit the pointer to a real buffer chunk, it does work that way. On the stack, well, stack variables are typically your local variables for a subroutine if you've written code. Hopefully, that'll make sense to you. All right, bring it on, baby. The truth is, the truth is, I am Atlas. I got a picture of one of those things stuck to my head last year at CTF. It's pretty fun. So anyway, on the stack, we generally store local variables for whatever subroutine we're in. And yes, you may have several of them. Most often, we are only intru- well, OK. The really easy to score vulnerabilities can be found. You bet. Love you guys. Dude, we need it. Last place, thank you very much. Although I may have you change that to whatever place we come in. Nice. Nice. Anyway, I'm in the middle of the competition right now. So I love you all a lot to leave my team and come talk. So anyway, it is interesting to overwrite different variables on the stack between you and return pointer. But if you can get to the return pointer, you win. In some cases. Anyway, but it's of interest to you, either way. So a very rudimentary way to look for buffer overflows on the stack is to check EBP plus 4, the base pointer, and beyond the base pointer is RET. So that's where we get the plus 4. And we subtract the destination buffer to see how long we have to go to overwrite the return pointer. So we wrote a, oh well, I wrote a very, this is a sparse version of memcopybreaker. If you want the real thing, go get Atlas Utils, downloadable on the web, probably at my website somewhere. Memcopybreaker, it's a breakpoint publisher, something else that I wrote. When writing a breakpoint, you simply overwrite the call to notify, or the definition of notify. Notify gives you a pointer to the breakpoint, the event, and the trace itself. So you can do all of your manipulation and investigation there. So just as an example, we pull in the EIP, the instruction pointer. We pull in the stack pointer and the base pointer. We use our special math skills to determine how long the copy is going to take, pulling that off the stack, too. We find out how far away ret pointer is. We get the dust in the source. We check to see how long the destination buffer is. And then we compare. If there's going to be an overwrite, we alert. Pretty simple, huh? Except when attacking certain libraries. Some subroutines don't start off by changing the base pointer. They use that base pointer for, well, whatever else they want to use it for. So measuring to the base pointer is not going to get us there. So what do we do? How do we overcome this? Well, there are several options. The one at the bottom, if you hate reading the whole book, the one at the bottom is the one that I found to be the best. But you can, and since this is my background, this is the first thing that came to mind, you can disassemble the whole code and use that to intelligently inform your tracker. It's kind of ugly. Most of those solutions, you look for stack setup throughout the subroutine, unfortunately. Well, for spaghetti code, especially code that could be jumping all over based on input that you don't know about, this can get really ugly. We can also take the bottom. Just before returning, you've got to clean up all that stack. So oftentimes, there are op codes that will tip their hat into how big a space you're living in. These are all very rough estimates, but they get you a start. What I actually found to be the best case in my exercises was stack backtracing. The idea is you start at your stack pointer, because, well, your stack pointer is your stack pointer. They don't use that to plot stuff all over and use it as a general register. Your stack pointer, you then start at the stack pointer and you back up four bytes. Take out that four byte. What's that? How many bits? You turn that 32 bits into a number and compare it against your memory maps, because your memory maps define to the computer what valid memory space looks like. If you find that that four bytes actually points to a legitimate memory space, you can then turn it around and do more analysis, such as, well, stuff on this slide. So if the 32 bit number that you've just reached is part of maps, the memory maps, we can then back up, and it's not totally trivial, but it's not that hard, to back up from that location and see if the previous opcode was a call. I say it's not easy because, well, if any of you have gotten into opcode disassembly, you'll recognize that, well, you can have a two byte opcode that performs a call or a jump. You can have, well, several other variations that get really nasty. So we just stuck them all in a list and we checked for all of them. And then ultimately, the more testing we choose to do, the slower the program goes that you're executing and the more accurate you become with your alerts. When hacking, I often tell friends, students, whomever, when hacking, you are iterative and not afraid of failure. Why? Because it only takes one win to make a thousand failures worthwhile. Who was it? Charlie, Charlie, yeah, Charlie Miller. Recently, not too long ago, posted on his blog a check from the government for $80,000 for one ode that he sold them. I think that's worth a few failures, don't you? So what we're trying to do is limit the scope of how much we have to look at to see if we can overflow a buffer or find some other vulnerability. And here's the majority of code for a part of the tool that I call FindRat. So we start at the stack pointer and we continue until we meet all those criteria that I just listed. Again, the final tool is in the Atlas Utils. You can take a look at it later. And hey, it's not perfect. I'm not perfect. I'm not the best person in the world. I just love to do fun stuff. Send me bug reports. Better yet, send me improvements. That's awesome. I will love to include you and list your name as a contributor. FindNextHeap, this uses... Oh wait, wait, sorry. What if we have a heap buffer? What do we do then? Remember, I said only in the most ideal situation do you have that magic length just before your buffer pointer. What about a struct? If you've got a structure, you may be memcopying in and out of that structure 20 times and never hit the actual base of the structure. Well, because you're accessing its elements. So if you try to take a length from the four bytes previous to the buffer pointer, you're probably gonna run into other areas of the structure. Ooh, that's toughy. So how do we overcome that one? Well, we write a tool called FindNextHeap and get connected chain. FindNextHeap says, all right, I have this address. I need to know what the boundaries of this heap chunk that I'm in are. Okay, so I create a buffer of 1,000 bytes, but I'm really only interested in, or I'm writing to four bytes in the middle of them or 100 bytes near the end, whatever. So we start at the beginning of heap memory and we traverse through a link list because, well, it's a big chunk of memory, but we have to keep track of it somehow, right? So we've got this huge chunk of memory and we create little allocations and they are linked. So we can walk through all the heap allocations by starting at the base. So we start at the base and we go through the range and when, well, we meet certain criteria to say that we're at the end, we stop. When we identify that we've now traversed beyond the address that I first handed in, we stop. We keep track of the address that we had before and we identify the actual chunk. Well, what's that get us? Gets me how long I gotta go to overwrite the next buffer. Why is that important? Well, the first couple of bytes on the next buffer can be sometimes useful. It serves as a good stop and often means that I've already overwritten something of value and sometimes means that I'll just crash the system because it's got a canary, but those are the things you gotta find out. For another example, stir copy. See how well my cliffhanger abilities have maintained. Stir copy. Well, stir copy is a great tool, right? If you're a hacker, what does stir copy mean? Well, it means I don't set how long of a copy it is. I simply say, okay, take this string and well, copy it as long as you don't run into a null byte. Copy that over here into this other buffer, right? Well, what if that other buffer is 14 bytes long and the original string was supposed to be seven bytes long, but well, the coder didn't really check much and it turned out to be 30 bytes long. We now have an overwrite. There's a, there are other more secure ways of doing stir copy, such as stir and copy. Stir and copy takes a little bit of mem copy, wraps it into stir copy and says, okay, copy until we get to a null byte because that's what terminates the string. Oh, by the way, if you go over 100 bytes or whatever number your programmer sets in, stop there. Cool, right? It was supposed to get rid of all buffer overflows having to do a string exception, stir copies. Problem is, just like everywhere else in security, if you fail to understand and implement a given security feature without considering its strength and its weaknesses, it's just another tool. Right? Stir copy, we check the destination pointer and the source pointer. We do the same calculations that we talked about for mem copy by getting a size. How long do we need to make this overwrite? And by the way, did it make it? Write your code, find out. Stir and copy? Well, we do a desk pointer and source pointer, but then we take size and we make sure that it limits it appropriately. Now, why is that important? Because the fact that my string, my source string is only five bytes long and I'm copying into a 20 byte buffer doesn't matter. That means you missed a vulnerability that time, not a problem, right? What about the next time? What if that string grows without bounds? Stir cat, very similar to a string copy, only you start at the end of the first string and append onto it. It's pretty difficult to get this one right. So, as soon as you run into stir cat, you probably just wanna go look at it. So, stir cat, at least one point in the code, just kind of alerted, hey, I got stir cat here, you wanna come check it out. Enough about buffer overflows though. That's old hat. What about format string exceptions? Is there a way that we can tell if we're about to, if we have a format string that, well, it can be provided by a user or somebody across the network. Not perfectly, but there are some hints. If you break on printf, for example, you check for your string, for your format string. ESP plus eight, it's the first parameter handed into printf. The other printf-like functions have their own location because they take different parameters. When we get the address of our format string, remember, it is always a format string, whether it's provided by the coder or by the user on the other end of the network in some foreign land that hates us. Where does it live? Now, shortly after I started writing this talk, I ran into some folks from a vendor, which I will not mention, but they did very well on Ed's malware endpoint security product bake off, I think it was. So, go check them out. They were talking about the exact same thing. When they get certain system calls, they trace back much like what we were talking about in here and display and check to see, okay, where did the system call come from? Did it come from the heap? Let's stop it. Oh, crap, Lotus nodes just broke. Well, something like that. So, did it come from the stack? Typically, you don't wanna be making this all imperfect. Okay, I'm only giving you options to investigate and trying to narrow them down. Yes, you may create a format string on the stack using various tools or other function calls. Most often, if you see a format string coming from the stack, however, you want to investigate because this should normally be something that at least originates in write only or read only memory. Aspirin.f, some more fun stuff. It just changes the locations pretty much very similar. Aspirin.f, hey, this is cool. We can limit that length a little bit. So, at least we're not creating buffer overflows, hopefully. Well, you wanna find out. Programmatic debugging will allow you to help or will help you, sorry. There are various others. ScanF, how am I doing on time? Yeah, that's kind of what I was thinking. That's why I'm blowing through these. GetS. I had one written and the notify was, oh my gosh, I have getS running. The man page for getS says, do not ever use getS. Gee, thanks, so I derived it. Well, they wrote it at a different time, but do not ever use getS. Why? Well, because it will just sit there and happily feed bytes into, well, wherever you tell it to go. This is a problem. This is a good thing. I'll let you decide. GetS. GetS gets one byte. Okay, how can I do a buffer overflow with one byte? Well, oftentimes getS is called with a loop around it set to some number. Loops in assembly code are typically wrapped around the ECX register or RCX, depending on your platform. Probably a good idea just to go find the getC function wherever it was called from. Look at the logic around it. Look for the loops. See how you can manipulate them that way. Ah, Memchar, I'm not even gonna go into you. If you would like to be really elite, you start looking at assembly opcodes. In the program itself. Such as RepStow, RepMove, Rep is an opcode modifier. It can be wrapped around many different opcodes. And it says, okay, as long as ECX is greater than zero and I decrement it every time. Hey, that kind of sounds like a for loop, right? We then do this opcode. So maybe it's storing data into a location. Maybe it's moving it into a location. Either way, by seeking out in the disassembly any rep byte operations, you'll often find another loop that you can then investigate. Not a whole lot I can do for you in programmatic debugging on a grand scale. However, you can create a template that you can simply say, okay, stop at this point and start tracking. Tell me what ECX is. Things like that. It's a little bit more focused than everything else I've been talking about, but it still helps and it works. Reminder of how format strings work. And here we are to the most important part of the presentation. What else can we do? I'll leave that up to you. The force moves strong in your family. Pass on what you have learned. Go and be fruitful. And I'm looking forward to seeing you guys up here next year. Many thanks to my God, my wife, my kids, my friends. Last place, Visigoth and Conchotto. Squires and all the bullets that they shot at me. Anybody want some of this raw energy drink? The great folk over at the OCTF, is that what it's called? Thank you very much to Moose. Comes up to me, says, dude, we gotta get you some brondo. What the heck is that? Says, well, it's like Mountain Dew with echinacea and like, or whatever. All sorts of nutrients or whatever to help you win CTF. Electrolytes, there we go, thank you. All right, come get one. Thank you all very much. Have a great day.