 Not very technical, it's just kind of, I'm going to make some points about how you can abstractify your payload and separate it from your injection vector and I'll cover some stuff in that. It's kind of interesting. The second part of the talk has, towards the middle, I'm going to talk about injection vectors, which is how to get control of the instruction pointer on a target system. And there's a lot of little tricks to do that. And then finally, the last part will be payload, which is of course the funnest, you know, all the nasty things you can do once you're on the system. I'd like to thank Ryan Russell. I don't know if he's here, but he recently busted his ass to get this book printed. And I believe they have copies of it here, about 100 or so. I wrote chapter 8, which is specifically on buffer overflows. The name of the book is Hackproofing Your Network. RFP's in here. Elias from BugTrack. Cesar, if you know Riley, Cesar, he's pretty cool. He's in here, too, talking about stuff. Jeremy Roche has a lot of good authors in this book. So anyways, if you want to come up after the talk, I can give you the ISP a number or whatever, or you can go find the security focus guys and then they should have a copy of it up there. Oh, the question was, who's the publisher? Singress. Okay, so these slides are not posted. I can post them. I believe that Jeff has probably already got something set up on the website to post them, as well as this entire audio and video feeds are all, of course, posted as well. So you should be able to get to this. Okay, so this talk is going to talk about some ideas here. Our first one is formalizing your attack method. So I'm going to talk kind of like a military person for a little while and talk about, if you were like a military organization, how you would design your cyber warfare group to take advantage of buffer overflows and payloads and how you can reuse your attack code. You can have a group of goons over here doing something like building a bunch of payloads, kind of like a missile with a warhead. You don't have to always send a nuclear warhead. You could send a biological warhead or an EMP warhead, whatever it is, but the delivery mechanism is still the same missile. So we're going to talk about how you can combine payloads and injection vectors in this way. So that's separating deployment from the payload. Of course, payloads can be chosen for desired effect. And then when we get more technical later in the talk, we'll talk about the restraints and some of the details of getting around those restraints when you have, like, for instance, limited size on the stack and things like that. So what is an exploit? I'm not going to linger here because everybody knows what that is. It's really easy to find bugs. If you watch USSR Labs, you see they post like, I don't know, three advisories a day or something. It's ridiculous. What they do is, or how you do it, is just simply run automated testing tools. It's very simple to do this. I've written several in the lab that I have. I can download a bunch of software off of two cows and then one day find ten buffer overflows. It's ridiculously simple. All I have to say is code sucks. The quality of code that you have on your systems is absolutely traditional, horrible. And that's why you're going to continue to see buffer overflows from now and years from now. I think companies over the next few years are going to start to realize this and there'll be a lot more attention paid to testing, due diligence and things like that in the labs before critical infrastructure software is released. Okay, so obviously an exploit takes advantage of a bug for some desired effect. Okay, so on a bad day I can just crash my machine. Machine crashing is getting harder to do these days. Obviously on Windows 95 it's really easy, but on Windows NT it's slightly harder because typically you're going to be exploiting a user level process. It's not going to crash the kernel generally. However, a lot of times it's very simple to cause it to DOS the machine, which is a lock it up, which is almost equivalent to crashing it. If it's a VIP process, a very important process such as CSRSS or something like that and you manage to get into 100% CPU loop, that's pretty much crashing the machine, because it's not going anywhere. Most common, of course, is an application crash. If I don't get my payload right, I'll bomb the application. The application just goes away. Another possibility is a recoverable exception. Does everybody here know what an exception handling is all about? Exception handling does not prevent buffer overflows from working. Exception handlers, the code that takes care of dealing with an exception is all on the stack, so if I just keep going down the stack with my buffer overflows, I'll just overwrite that. It actually makes it easier to exploit the buffer overflows. They don't have to worry about offsets. I can have the EIP, which is the instruction pointer, and then I'll go anywhere I want to go, and when it throws an exception, which is all I'm going to do, my exception handler that I just conveniently put in there for it gets called. This all totally applies to Win32. Absolutely. Oh, no, the handler for it is on the stack. You're right. It is FS0, offset 0 from FS. This is where the exception handler structure is called, but the previous exception handler is right there on the stack. There's code which explains that. If you want to come up after I get done talking, I can give all that to you. Recoverable exception is a possible thing that can happen. Of course, if we're successful, we'll get mobile code under the system. Mobile code, obviously, is very, very deadly. You can do a number of things. You know, you're all from the Earth viruses, and if you were around in the early scene, you saw lots of really cool stuff. Viruses were doing all kinds of neat things back in the day. This all can be done today. There's no reason that, you know, you have to limit yourself. Obviously, reading and writing files on the system is a good possibility, and, of course, it don't all serve stack. Sorry I'm going so fast, but I got to get through these slides, so. Exploits can obviously be grouped. There are certain bugs that keep coming back. You know, if you go through all the lists, 8LGM, bug track, and you go through them all and classify the bugs, you're going to see, you know, certain groups of them. This book actually covers chapter by chapter each one of these subjects. Like improper filtering, content-based attacks, bounce overflows, impersonation, bad authentication, obfuscation, which they call encryption, certain things like that. In other words, we just need better testing on software. Okay, so what I was going to get to with this military missile thing. Your attack payload is not the same as your entry point. Let's see how the given bug, on a given version of exchange, on a given service pack, that's going to all dictate how your injection vector looks. And when I say injection vector, I mean the code that gets the instruction pointer owned by you. Once you have that, however, the payload can be anything it wants. So what you can have is a situation where you've got like a group of goons over here all writing injection vectors for all these qualified hosts. I have 335 different buffer overflows in this collection over here. And it applies to these 2048 machines. And I send out my scanner crew, and they scan out and find like 27 eligible hosts on this network. And then I go over here, my other group of goons is all right in the payloads. And I can then select from the payloads that I want to use and deploy into their network. Maybe I have a payload that shoots out ICMP redirects so I can screw with their switch. Maybe I have another one that does an ARP spoof across the network and floods your entire subnet. Maybe I have another one that DOS is their router. So this is like missile versus warhead technology. All of the governments of the world are investing time and money into doing this. So I just talked about this. Obviously, there's a difference here and you can keep them separate and match them as you need. Yes, injection vectors. They're target dependent. What this means is that if you have a different service pack, for instance, you might have to have a completely different injection vector, but the payload can be the same. So injection vectors are tied explicitly with the environment you're about to exploit. A different version of exchange. A different version of IIS. One service pack is different. Maybe a different version of kernel 32DLL in the system makes all the difference. And now I have to have a new injection vector. But the payload all the time remains the same. So it's very OS dependent, target dependent. Oh, yes, product encoding dependent. That's very important as well. If you want to exploit through anything that does content, filtering, URL requests, for instance, only has certain valid characters, mind encoding as well, then that's going to limit the number of characters, the character set that is that I can use. Well, when I'm thinking about character set, what I'm really thinking about is the number of assembly language instructions, which are available to me. So I might have a limited set of instructions that I can use for my injection vector. The payload is independent of the injection vector in most cases. It will still depend on the architecture of the machine, of course. You're going to have an Intel payload. It may not work on a Spark machine. However, I can tell you that there are payloads that will work on multiple hardware platforms, and that's pretty elite trick. Think of it like a virus. Think just think payload, think virus. It's almost the same thing. Once it's established, it can spread by any means. Trust relationships, obviously, are the easiest and then another one is scanning for more bugs. There are worms on the internet now that are crawling all over the place using buffer overflows. We don't hear about it very often, but if you read cert, you'll see that ADM worm has been rooting machines for almost a year now and it's amazing the same bind vulnerability and it's been around forever. You see that, so there is evidence of this occurring. The payload, this is some of the things you could do, denial of service, obviously. I could use the machine I explained. I could use them as a watching point and I could get several of them perhaps into a DDoS attack. Remote shells, one you've probably all seen at some point or another. The remote shell does not have to be tell net like or net cat like. It can also just be some sort of covert channel. I could put some code on the system. For instance, it looks at offset 48 in any packet coming through the system and looks there for a special character and if it sees that and interprets the next five characters of some command, it's a covert channel. I could embed those in ICMP packets. If I don't have access to kernel mode, in other words, I can't permiscuously SNF under NT, I could just patch IIS. I could go into the IIS process space. Let's say that's the one that I exploited and I could just go and find where it does all the URL moniker, where it parses all the URLs. Well, then I could just use that as a covert channel and send all my commands through a URL and nobody would be none the wiser. Obviously, the worms are the most dangerous of all and root kit is obviously for stealth. You can say that an injector will only work on certain number of qualified hosts. I think when Barnaby Jack and EI wrote the IIS injector back last year about this time, actually, it worked about on 20% of the web hosts that were currently online. That was enough to cause insane amounts of damage to the e-commerce world. So it's real. When you find a buffer overflow, it means something. There are plenty of them out there that can root tons and tons of machines. A lot of them that nobody knows about. Okay. Obviously, there's two types of injection. The first type is content-based. What that means is I'm sending some content that's being interpreted by an application server of some kind and I'm going to bend its arm. I'm going to say, hey, dude, I want you to do this over here. The process remains in control. It's doing something it shouldn't do, but it remains in control. That's the content-based attack. That's doing things like what Rainforest Puppy did with the RDS exploit. Then there's buffer overflow. This is different significantly because the process remains in control. I have removed that process from ability to control where it's going. I've changed the instruction pointer on the chip. I'm going to make a point to my own code. Trust-based content-based stuff, trust-based spreading could be like any of the stuff you've known for years with the virus underground, boot floppies, things like that. Let me go through here. Melissa is obviously another good example of a trust-based exploitation. I'm opening mail from somebody I thought I knew. I have a couple slides talking about the government's stuff. There's a couple declassified reports. They originally were classified, but they were released as early as 95. They had evidence of Cuba doing virus teams. You can bet you're asked that today they're still doing it and they're using buffer overflows. The Russian KGB as early as 91 was involved in this type of stuff. And then E&Y reports. I had a couple that came up just talking about virus and mobile code. Over 50% of the people they interviewed reported this is a problem. The UK, same deal. Costing almost $8,000 per incident on average to repair these things. So it's very expensive. Does everybody remember the Morris worm or at least heard of it? Yeah, it's real famous. It shut down most of the internet. Of course, most of the internet back then wasn't very big. You see, the Morris worm worked really well because everything on the systems that it was attacking was all the same. It's like a monoculture. And I believe it was a single buffer overflow and finger D and some send mail exploit. They used both of them in life. I remember correctly. And it spread rapidly and it took over most of the systems because there was this monoculture. Well, today we still have that monoculture. We have that Wintel platform out there. And then we also have the Linux and Apache platform. Together, those two things make up a web server market. And that's pretty amazing to think that if I had one really good buffer overflow, I could take out half of the web server market out there. In 89, there was another worm. I just did this just to show that there are worms and they've been reported. It was called WANK. It hit NASA and it spread into HEPNAT, which is a higher in physics network by the US DOE. It took two weeks to get clean all those systems up. Okay, so let's get into buffer overflow a little more. There are a couple ways to do it. The most common is a stack overflow, but I'm sure you've all heard of a HEP overflow as well. I'm going to talk about both of those. The goal of a buffer overflow is to own the instruction pointer. That is the goal. So I'm going to get it to point to something that I control. Now, how many of you think that the only place you could put instruction code is on the stack itself? I could put my instruction code anywhere. I could put my instruction code on the heap, on the stack. Any transaction that's recently taken place in any form on the system is probably going to be floating around in memory somewhere. That's usually going to be on the heap in many cases. So there we have a lot of room to play if we can get at least some of our code into the system anywhere. The stack portion of the overflow may just be enough information to get us to jump to the other place in the code where we have the rest of our goodies waiting to be run. Okay, so the challenge is with all of this, just injecting and owning an EIP is there's obviously size restrictions many times. And depending on those size restrictions, we might have to make some very tight code. However, I can tell you it's amazing what you can do with 100 bytes. Assembly language is very interesting in that you can do a lot with a few bytes. Two bytes is a very important instruction. Well, if I had every instruction was only two bytes long and I had 100 bytes, that's 50 instructions. Believe me, I can do a lot with 50 instructions. Is somebody asking a question? Okay. Obviously I'm going to be sending a payload as well. If the injector part and the payload part are in the same buffer, for example, all on the stack together, which is typical on the most of these that you see, we have to make sure that they don't step on each other. So I'm going to talk about that a little bit. Then there's this whole thing called offsets. You typically see these under Unix systems and not so much under NT. When you don't know where you are in memory, you don't know where your code ended up, so you kind of have to guess. So I'm going to talk about that a little bit. And then obviously you probably all know the issue with null characters. If you're doing a stir copy and there's a null in there, the stir copy stops at the null. So obviously we can't have a null character just floating around in the middle of our payload or we're going to have half a payload when they get done. So there's a couple of encoding tricks. I'm going to talk about null characters today, but in this book is some information on some other stuff. If you have other types of encoding, for instance URL or MIME, there are certain characters you cannot use. So there's just some tricks for getting around this problem and they're actually explained in chapter 8 in this book here. Okay, so, oh, Ryan. Hi, thank you very much. I'm going to ask a couple of questions, I guess, sometime, and I'll try to think of good ones. And then if you get the question right, I'll give you a book. Let me borrow, I'm hiding it over here. Okay, so I'm going to go through this. I'm going to tell you why Stack Overflow works. The Stack keeps track of a lot more information than just user-supplied information. It keeps track of what we might call housekeeping data. And housekeeping data is essentially what we're going to try to overwrite in order to own the instruction pointer. So there's a couple of issues there. Obviously if we're filling a buffer, that buffer should be growing towards our housekeeping data that we want on Nuke, right? And we then must overwrite it. So I'm going to talk about that. When we make a subroutine call on the architecture of any processor, we've got to store the location of where we just were. Otherwise we won't know how to get back. So typically we're moving along, we're running, blah, blah, blah, you know, and we're jumping here, and jumping there, and jumping there, and jumping there, and jumping there, and jumping there. Oh, I'm done here. Okay, so now I've got to get back. I've got to remember how to get back to here, to here, to here. So all that information is what the Stack is for. It keeps track of all this information. So let me do this little animation for you. This IP appears, that's the instruction pointer. This block right here is actually kind of my little version of the Intel processor. It would look different depending on what processor you have. The concept is the same. So the instruction pointer is pointing on it into a code page. See that? It says code at the bottom. And that little pink square is supposed to mean there's some code there that's important to us. And we're running. Now let's say that code calls some other subroutine. We have to take the instruction pointer and put it on the stack. Okay, then the stack draws... Oh, then the IP, of course, is updated to point to a new location. The stack grows up, and that area up in here is going to be used as temporary scratch pad while this other subroutine does its things. All the automatic variables on that subroutine will be stored here on the stack. And that means, you know, arrays and buffers, as well as like integers, whatever it's using. So then the stack pointer is updated to point to the top and they run. And then when we're done, it goes away. It's kind of like deallocation, even though it's really just adjusting the stack pointer back to where it was. And the old IP is put back in the originally where we were. That's the essence of the stack operation here. Now, obviously, what we're going to try to do is change this value. So the stack overflow works like this. There's our housekeeping data, or a return pointer, whatever you want to call it. Stack grows. Now, let's say it's going to fill a buffer, like a historic copy, let's say. The buffers grow down towards the information, towards the housekeeping data. So it grows, see, right over the top. That's how it works. Now, the problem with a null character would be this. Same exact slide, pretty much. Store the housekeeping data, grows up. We grow down, but we hit a null character. Guess what? We stop. So essentially, we can't have a null character in our payload, and that's the problem. Now, I'm going to show you a trick which will allow you to have a null character in one particular instance. Okay, so obviously, if we're going to have, or we're going to overwrite the housekeeping data, we need to be able to make sure the null is past that, right? Okay, does everybody get that? Okay, great. Good. Okay, so I'm going to talk a little bit about little and big Indian numbers. Has everybody here at least heard of this term? This might be basic for some of you out there. And, you know, I know it said advanced buffer overflow techniques, but if you want to get some more insane stuff, I'll be around, you know, later on tonight, and I can show you some really insane stuff, but I'll just go over some of the things I have here. Okay, so big and little Indian. The Intel processor stores numbers backwards, in my opinion. What this means is if I have a number, 004010FF, stored internally on the processor, it's actually FF10400. Spark architecture, I believe, is big Indian, which means it doesn't do that. So, okay, the least significant byte goes first. I think everybody can understand this, right? It's just backwards. So I'm going to store an address somewhere. I actually do have to store it this way on an Intel processor. I have to store it in this reverse form here. So, let's say there's a... These colors might look a little screwed up. I apologize if you can't read it from the back. So here's our housekeeping data. Same as before, we have an original return address. This one here, up in 7F, is probably talking in NTDLL somewhere, typical. All right, so then our stack's going to grow, and then we're going to override it. And we're going to be able to put a null there. Now, watch. Now, I know that's really hard to see. This says 0C20400. It's this address, I believe, right here. Yeah. So, does everybody see how I can put a null in there and they still have it work? This stack starts in an address. It runs in an address range, which starts with 00. But that doesn't prevent me because of the little indianness of the x86 processor. I can now jump into that area. I put an address in there that is valid. So injection is complete. We've put in an address. Now, let's talk about where we can put the payload. Does everybody get how... I guess I didn't explain that very well. Because I put that there, when we return from the function that we were in, and we replace EIP, that's going to be taken from the housekeeping data that we're replacing. Does everybody understand that? Okay. What's that? How do I put the... You have to up-supply more... Okay, the question was, how do you actually overflow? What you have to do is depend on the fact that there's going to be some software out there that does no bounds checking. It's going to look for a null character, like I showed you in the previous slide, to stop that overflow. Now, if you don't supply that null character, and it doesn't check for it, it happily will continue copying as long as it wants, as long as you want. That's essentially the problem with some calls like sprintf or stircopy. They look for a null character, and they don't allow you, the programmer, to specify the size of the buffer. They assume that whatever you supply will be smaller than the buffer that was allocated. Okay. So after we've done this, let's talk about where you can put the actual code that you want to run. Okay? So here's our housekeeping data. We have a stack. We overflow. There's our new address. The null is okay. So it jumps back into here. And then that's our payload area. We can put our instruction code in that. That is the simplest form of stack overflow. That's not going to give me a whole lot of room, unfortunately, but it would work. So now let me talk about a little way to make some more room. The question was how do you find out where the instruction pointer is stored on the stack? It'll be in a known location. You will have soft ice. You will explode it, and you will know that it is exactly X number of characters and in this particular buffer. A trick that I use is I send, when I'm sending the buffer overflow initially, kind of black box testing, I'm supplying large buffers with a Perl script or something into an application. I'm giving a username that is 300 characters long. The characters that I send, I alternate them in a predictable way. And again, the algorithm for that is in this book. And they alternate in a predictable way so that if I get a crash, when I get a crash, it's going to tell me what address it crashed on. So I see that, and I'll know exactly what four characters it crashed on because it'll tell me. And I'll know exactly what the offset is. It's just a little trick for debugging. Oh, probably. Sorry. As long as you get it. Okay. So let's say we were dealing with the previous slide here, and we have a very confined payload. So we have confined payload. What are some of the things we can do? What are the restrictions? How can we get around this restriction? So we can compress our payload and then have a little decompressor in there that'll grow it out larger somewhere else, maybe on the heap or further down the stack. We can use preloaded functions. Obviously, any process that's running on the computer is going to have a bazillion functions already loaded and ready to use. That's how it works. So as long as we know where they are, we can just call them. We don't have to load them or do anything weird. And oftentimes it's worth fine. That means that my payload doesn't have to build what's called a jump table. Of course, the key is that all these functions I want to use have to already be loaded. I can use hard-coded addresses. There's two ways to use a preloaded function. I can either go scan for it and find it because I don't trust that it's always going to be in the same place, or I'm just going to believe to myself that it's always going to be in the same place. So I hard-coded the address to it. I wrote a buffer overflow about a year ago that had the hard-coded address for reg, open key, and reg, right key, and those worked fine. The hard-coded address worked fine. So it was very small. Once again, this has to be predictable and it has to always be there. And of course, if you're sending a payload, you're going to also have data probably set along with the payload, and I'm going to show that in a second. You can't put very much data on something if you have this confined space problem. So let's say we want to use more stack. Okay. Now, you were complaining about the typo. I'll get preempt you here. This is also a typo. Our stack should be 0, 0, 4, 0. Okay. So ignore that 7, 7. We're going to use more stack. So here we are. Here's our HALC saving data. Da-da-da-da. Now we go down the stack. We put a whole bunch of stuff on it. Now, when we overwrite this address, notice I didn't put a 0, 0 at the end. We have to pick an address. Now, the code page has plenty of stuff that's not in the 0, 0 range, right? Most of it, in fact, I don't believe is in the 0, 0 range. We can pick an address out there anywhere and it'll jump to it, right? So that's the key. We cannot have a null in the address, and I'll show you the trick. So here's our larger version of the stack. We overflow a long ways. You see we went quite a distance here. So we've got plenty of room. We're still going to have to overwrite this guy, though, because it's a trim pointer. So we go down here, bonk. We still overwrite this guy. We make it point somewhere that ends up getting the instruction pointer to point back into this buffer. I'll show that trick in a second. Essentially, we jump to here and then we begin execution. Much more room now. Everybody get that. So when does the address contain a null character? I call it a lowland address. Under NT, the stack is stored in lowland. What that means is that there's a 00 in the first part of the address. That's the size of our payload. I believe the stack under Linux is not stored in lowland, it's stored in highland. I think it starts with 77 or BF or BB. Hobbes was in the audience earlier. He probably knows that. BF. Okay. So Linux wouldn't have this problem with the lowland address. I am an NT guy, and this is what I'm going to talk about mostly. Yes, sir. The question was, if I populate the instruction pointer with an address which points into the stack, how do I know where I am? Remember when I was talking about guessing offsets? That's exactly what that's about. I have to guess. I'm going to show you a trick in a moment that makes guessing it very easy. In some other cases, I'm going to actually grab that address out of a register or something else, and I'll show you that trick as well. Okay, so highland address is no zero in the address. Okay, so you guys understand that. Okay, so under NT, you're going to be dealing with large payload lowland address. We cannot use the lowland address directly because it will limit our payload size, typically. So we're going to have to figure out a way to get back to the stack without directly hard coding the address of a stack. I love the stack itself in there. Okay, so I'm going to show a couple tricks. I have a couple more that are in slides where I'll talk about them. We can use a CPU register. Typically, the CPU is going to have a bunch of information stored in the registers, right? Oftentimes, one of those registers or even more than one will be pointing to locations on the stack near or about the area I'm going to overflow. And what this means is if I can get my payload or my overflow to cover that area, all I have to do is just make sure my payload starts right there and it'll jump to it. I'll show that trick. I'm going to go back somewhere. Let's say it's the D register. I overflow far enough so that I can begin putting payload information here. Here's our housekeeping data. We overflow. And we do something that makes the value of D end up in IP. And what this does then is boom. We're now executing right here. That's the basics of the trick. How do we really do that technically? Well, if I could find an FFD0 two bytes in there somewhere, that's call EAX. What that means is if EAX has the address of something on the stack, if I can just get EIP to point to somewhere in location and memory somewhere that has these two bytes, then boom. I'm now going to call through EAX and be on the stack. The key thing to remember about this is when I put that address there in the housekeeping data, it's the location of these two bytes. It doesn't have to be in code. It could be in data. The computer doesn't care. Call EBX, call ECX. There's a couple examples of the same thing. Another way to do it is to find a push of the particular register we're interested in followed at some point in the future by a return. As long as there's only one push, that means the top most value on the stack is going to be what we pushed. The D register, the one we just showed in the slide. So now D is sitting on the top of the stack and then maybe a couple of instructions later we see a return. It pops the top of the stack off what was the D register and puts it in EIP. Same exact thing, just a different way to do it. I don't have a slide for this one but I'll talk about it. It's another one I found. A couple layers down on the stack. Let's say it's not a register. There's nothing in the registers we can use. The value is down on the stack. We have an address of something on the stack with self-referencing in a way. That happens all the time, especially if you're passing arguments from subroutine to subroutine to subroutine. I pass the automatic variable of one subroutine to a parameter of a second subroutine. The original is going to point back into the first one which has its own little area there on the stack frame. What was I going with this? Oh yes. So if it's down there a couple layers as long as I can go out there in code and I can find a bunch of pops on or out of a pop-up, I can peel off the stack if you will until I get to the value I want and then there has to be a return. This works all the time. There's plenty of places in NTDLL where you can find code that will pop three or four and then you can get a return. Okay. So I'm going to talk a little bit about what you were saying out of the offset and how we can reduce the precision of that a little bit. We can use no-ops. Does everybody know what a no-op is? A hex 90. And it runs and does nothing, right? So there's this thing called a no-op sled. What we can do is make an area of just no-ops and if we jump anywhere in there we can be sure we're going to kind of slide through it and then end up somewhere else. I call that a no-op sled. Excuse me. So it illustrated. We grow. There's our null, yeah. We're going to overwrite that data there. And this area of the code down here that we filled would be all no-ops, okay? This would be real payload up here and this would be all no-ops. We're going to guess. We know it kind of changes from system to system but we're going to guess that within this range anywhere if we can put an address in there we're going to hit the no-op sled. So here's what will happen. We guess, we hit the no-op sled and then if we're successful we then slide through it and then executing real payload right there. Happiness. Okay. If you're anything like I was last night you probably didn't get any sleep and you drank way too much. So forgive me I'm a little slow. I'm just guzzling this water down here. Okay, so let's talk about heap a little bit. Heap is kind of fun. If we only have a limited size on the stack we can probably use the stack enough to get to the heap. That's one way to exploit the heap and put something in a recent transaction perhaps you know in a URL request where you have all the weird mime types and stuff that are passed prior to or right after the get. That might be stored on the heap for instance so we can put some shell code in there and just do the stack part enough to jump to it and that's an interesting idea. So we can store parts in different places in the program. Interesting places to inject across are like HTTP headers for instance. Other ones are files and those files will be stored on the heap somewhere. And this is a simplistic slide but we do enough to get EIP to point to the heap. Here's another interesting thing. Let's say we don't do a stack overflow at all. We do 100% homegrown heap overflow. Here's an interesting one. We have two C++ objects. Anybody here program in C++? Okay, so you know about inheritance and virtual, right? Operator overloading and V tables. That's what we're going to talk about. A C++ object stores its own function pointers if those functions can be modified by its children, okay? It's called inheritance and polymorphism. I'll show a diagram of that. The other thing to remember about C++ objects is that member variables, information stored within the object itself is subjected to the same types of buffer overflows as a stack object, okay? Here's the V table on a C++ object under NTA. So the V table is stored at the very beginning. This is where all the member variables are stored. Now, let's say we have a buffer in there. Now, if we fill that buffer, it's going to fill away from the V table. Obviously, this is going to be a problem, right? I mean, because we want to grow towards something that's important in housekeeping data. So let's say we have two of them in a way or near each other, somewhere on the heap. This is very interesting. See? I went over our second guy to show you again. I overflow. Over the V table pointer. Okay, so where do we make the V table pointer point? We have to supply our own V table pointer. How kind of us? We make a point back into the buffer we just overfilled. If you were at Black Hat, you can't answer this question. Okay, if you saw that. But what would be a very convenient function to change if I was overflowing the V table pointer? Don't everybody yell it out. Just raise your hands and I'll call on somebody. And you can win this book. What would be the most convenient function to change in the V table pointer? The destructor. The destructor, absolutely. The same is right. Yup. So the destructor is usually always virtual. So it's almost guaranteed to be there no matter what. And so that obviously makes the best choice. You could overwrite any function that matters but that just... I mean, that one's the best. Ryan mentioned that GCC always puts in a destructor even if you're only using it in C mode. Okay, sounds interesting. Okay, so injection is complete at this point. So here's some information. These are things I'm not talking about in this talk here, but this is some of the stuff that can happen. The first one I'd like you to all know is that kernel mode buffer overflows are a reality and they're all over in Windows NT. It's mostly third-party drivers with iOctl calls, which are not checked. iOctl calls can be made by anybody on the system, usually, because they put improper permissions on the device handle. Kernel mode buffer overflows are very interesting considering I can put anything I want in there, like the rootkit, and I'm in like Flint. Another one I'm not really going to talk a whole lot about is the off-by-one errors caused by... An off-by-one error can cause a frame pointer overwrite, but there was an article in FRAC about that recently. FRAC 55, I believe, and I think the guy that wrote it was KLOG. The other one is multi-stage attacks where you have to get the target into a particular state before you supply the buffer overflow. Does everybody here have heard of RSA ref, that's how that worked. It was complicated. And then a little bit of this stuff, the URL in my macoding is actually covered in the book, how to get around that. So let's talk about payload. Let me see where I am in time. I've actually... I'm five minutes from what I'm supposed to end, so I've actually got a number of slides up. I'm going to try to go as fast as I can. Yeah. So, things with the payload. We can use loaded functions, functions that are already located in memory. I'm going to also show you how you can load your own functions and DOLs, if you don't have the functionality you need. How to encode your data portion. And obviously, things that the payload can do is make a shell. I'm not actually going to show the actual process of making a shell up here, but I'll just talk about it a little bit. So the payload typically will look something like this. You'll have a nopsled, you'll have real code, and you'll have a data portion. That's what you're going to use later for arguments to function calls, typically. Or if you remember Dildog's paper, the DAO Windows buffer overflow, he stored in here a URL to go download a file from somewhere and execute it. So the URL was stored in this part. So there's lots of things you can store here in this data portion. Okay, so the first thing that has to happen with our payload is executing it. We have to find out where we are in memory. We could be anywhere. We have no idea. Again, if you're a black cat, you can't answer this question. I'm going to ask one of them in just a moment. But what we can do is a short call, which means call forward, one instruction, and then that next instruction will be popEDI. When you do a call, the first thing that happens is, it thinks you're going to do a subroutine call, right? So it takes the current instruction pointer and puts it on the stack. Well, then when they get to where we called to, we pop it back off the stack and store it somewhere for later. Let's say popEDI, let's say EDI is a convenient place to store it. What we've done is we now know, let me go back to the previous slide, where this location is right here. And because we know that, we can calculate the offset to get to our data. This is very important because we have to use this data. Okay, so we've got our, I call it getting bearings. The problem with this instruction, however, is that it's EB000000000, that's how it's encoded. Obviously, four in all characters isn't going to do our payload any good. So the quiz question for the next book is, how would I change this such that there are no known characters, but it's equivalent? You sir, over there. Yeah? XOR, the instruction with itself to receive zero. XOR, the instruction with itself to receive zero. You're on A track that is good, but it wouldn't work. XOR, the instruction with itself to receive zero. You're on A track that is good, but it wouldn't work in this case for this. You already, somebody else? Yes, sir? Okay, you are correct. The answer is, you make the call backwards so the sign is different and instead of zeros, we'll have Fs. I'll show that in the next slide. So no null bytes. We make it called backwards. I have it written down a little differently than you described it, but you're on exactly the same wavelength. The call goes backwards, see? And then I simply make the next instruction a jump to get over that part, because we're done with that, right? We don't want to run that twice. And then we pop EDI. That translates to EB, FF, FF, FF, FE. No null characters. I mean, essentially we've done exactly the same thing. Okay, now like what you were talking about earlier, sir, is the XOR protection. We have a data payload, right? We can't have any null characters in that either, but we're going to probably put a lot of strings and stuff in there. And strings need to be null terminated. So here's the trick to make your payload safe from this. You simply XOR it the entire thing with a byte, which then the result turns out not to have any null characters. You have to choose a byte that works for the payload, or excuse me, the data portion that you have. So we XOR every single byte and make it safe. There are no null characters. It looks garbly. It's not encrypted, of course. It's just pseudo obfuscated, I guess, in a way. Everybody understand how XOR works here? We XOR every single byte across, and if there was a zero, it'll now be the value of whatever XOR whatever we're XORing it with. And it'll just look kind of like gobbledygook when we're done. The key is, of course, we can XOR back to the original. So here's our typical payload. Our instruction code begins there. The first thing our instruction code is going to do, after, of course, it knows where it is in memory by doing that call, is going to begin to decode. So it'll go down here and decode it back to the original. This is on the remote computer once we've already got it online. Okay, so let's say we've decoded. We have a bunch of function names in here. Names like what's a good function? What was that? System. Yeah, that's a good one. Alright, so we don't know the address of that call yet? Because remember, our compiler didn't compile our non-existing code into it. So we have to go find out where that function lives in memory. So we can put the string system right here somewhere in this thing, and we can reference it to a call to getPROC address, which is not the slide I wanted, oh no. Hold on. Okay, I'm going to go back and forth here. GetPROC address and give it the pointer to the name that we just supplied, and boom, we have the address. When we have the address, we can actually use it. Okay, another way to do it is if we don't know where, or if we do know exactly where it is and we don't want to try to find it, we can just hard-code it in. And this is done in this example, we just simply call the addresses directly. Call in PROC address. Yes, I'll talk about that in a moment. Oh, yeah, so what this fellow said is in order to call GetPROC address, you have to know where it is clearly. And I'll talk about PE, executable header to talk about that. Okay, so the pros and cons to hard-coding pros, it makes, obviously, the code smaller. If the function isn't in the same place, that's a con. If dynamically loaded DLLs are an example, we're typically not going to be in the same place. Some DLLs, however, are always in the same place. Well, most of the time, in kernel 32 is an example of one of those. Okay, so dynamic function we're going to use load library and GetPROC address. Load library loads new DLLs, GetPROC address loads functions from DLLs which are currently loaded in memory. Now, in reference to what you just said, the function calls load library and GetPROC address are always in the same place, so we can hard-code the addresses of them. If we don't want to do that, we can go into the PE, executable header, and the first two entries in the iData table will be these two calls. Always, I've never seen it any other way. So we can load a new DLL and find any function by its ASCII name. I will note that sometimes, functions are loaded by ordinal, and I'm not actually talking about that in this talk. Most of the time, I've seen them by ASCII name. Okay, so this was the slide you saw. You call GetPROC address, pass out the ASCII of the function name you want to load. Now, this is called building a jump table. We do the same thing, GetPROC address, when we get the address, we store it away for later. And we do this for all the functions we're planning on using. There might be six or seven. So we just keep storing them as we load them. And we build a jump table. When we have the jump table, completely begin doing useful things. Jump through it, making the calls. Everybody kind of get that. Okay. Now, here's an elite thing, hash loading. I don't like having payloads with data portions of these huge string names for every function I need to load. Well, it turns out you don't need to do that. What you do is take a hash of the string itself, which turns out to be a four byte value in this case, and just store those. And then you can go into the process itself and just hash everything that's loaded, and if it matches, use it. There's almost no chance of a collision here. So we can locate any loaded function by checking the CRC of each loaded ASCII name. We don't send any ASCII names tomorrow. We're just sending the hash equivalents of them. Makes our payload much smaller. So to talk about that for a moment, we have to know about the PE header. So everybody know what that is? Every executable under NTE almost are in this format called portable executable. And inside of this format is a table of all the functions that it loads. So we have to go find the information. So we go to this optional header, and then from there we can get the ASCII name and the corresponding address of that function. That's all we need to do. It's already there for us. So over here on the right-hand side, this is our payload. The yellow area down there is the actual CRCs or the hashes that we've made. So we load one of those up. And then we go into the PE header, which is over here, and we actually in real time hash every single ASCII name. These are ASCII names, okay? We hash each one and match it against the CRC. If it matches, then we go find the function address and then boom, put it back. Replacing the original CRC with a function address. It's nice and clean. If we start out with 10 functions, we start out with 10 hashes, and when we get done, we have 10 function addresses in there. It doesn't change the size at all. It's nice and clean. So it's basically building another jump table. Boom, boom. And so forth. Okay, so there's our jump table. Much, much cleaner. Okay. But it's going to limit the number of characters that you can send through, say, the content filter to the application. What this really means is that we've limited the number of instructions that we can use. It means that the opcode values that we're able to send is a limited set. So we have to work with that. Now, a couple of years ago, my friend Cesar and I had a party came up with this solution for a problem. The problem was mime encoding. We had a number of instructions that we could work with if we had to pass only alphanumeric information. We couldn't pass anything weird. We only had short jumps, push and pop, and subtract. Now, the way this worked is we came up with this idea that we could just simply put a value on the stack, pop it, put it in a register, like eax, and then call subtract on it over and over and over again. So it was the actual value we really needed for the payload that we wanted, but we just did this over and over and over again. We actually built our payload in real time. It looked like a dream. Along with this technique we also had the bridge. So if we couldn't jump, we had to avoid the jump instruction. So what we did is we had the stack with the code on at this building, our new payload, so it's popping stuff into eax, it's subtracting until it's a real opcode that we want and then pushing it back. And then when it pushes it back, it comes down and it grows up. So as we execute down, the other side is growing up as we push onto it. And what happened was, is we calculated the size exactly, no jump instruction required, so that as it's growing up, as soon as we got done, they met in the middle perfectly. And boom, we just began executing real instruction. It's right off the bat, backwards bridge. Okay, so let's talk about loading a new DLL. What load library? I actually stole the same slide. This is the name of the DLL we want to load. We call load library and we end up passing, excuse me, no, this was the function address of load library and then we pass the ASCII name of the DLL we want to load and then boom, suddenly our process space becomes much larger because we just added a whole new DLL full of functions that we can use. What's a good function, what's a good DLL to load? That sounds good. I don't know what it exports, but it only sounds good. URL, mon, DLL. Oh, okay. There's another one that I have on the slide next, which is called WinInit, which does something similar as well. Winsock stuff is obviously good, so you might want to load that as well. This is thanks to Dildog's paper that I read. He just mentioned the WinInit DLL which comes with Internet Explorer. It's really, really useful for downloading files and getting calls. So I load WinInit DLL and I use Internet Open URL and Internet ReadFile to download a file from anywhere on the Internet. It takes two calls, very simple, no space and it downloads it to the local system at which point I can execute it. Pretty scary. So WinInit does all the hard work for me. It makes my payload smaller and the best part is I can store my file anonymously on GeoCity somewhere. So even if you caught the download, just as, for example, obviously, WS20032DLL has all the WinSock functionality. If I'm going to make a backdoor or something like that, I'm going to need to open a socket, perhaps, so I could load this up and do that. Another way to make calls on the system is through the use of what's called an interrupt call. Interrupt 2E under Windows NT is the system call and I can make a whole plethora of calls that way and it takes like two bytes to make each one. I just set up the registers of the parameters for that call and then I call interrupt 2E. Boom, I just made a syscall. I can start processes that way, open and close files, all of it available directly through the syscall interface. No monkeying around with DLLs. And these are always guaranteed to work. They don't change locations. I will make a caveat to that. Windows NT4 to Windows 2K, the syscall table changed slightly so that syscall numbers slightly modified. That's important to know. Same with Linux. In fact, I believe you can do it all just using this method. And under Linux, I believe it's interrupt 80 and then under NT, it's obviously interrupt 2E. Spawning a remote command shell, I can use create process under NT or syscall under Linux. I can pipe the output through a socket to attach to the process and open up a TCP socket by loading the windsock DLL on my side and have two way communications. I've just created an instant telnet server. If I want to get really crazy, I could not do it quite that way. I could make a covert channel. If I have access to the kernel, which most of the times I would, I could inject a TDI or Indus layer hook, which enables me to sniff all network traffic and send raw packets over the wire. I could watch ICMP packets and if there's a certain option in there, I could detect the next five characters as a command, like I was talking about earlier. It's covert. You would never know. Arnie Vitstrom, I don't know if you know him, but he has a site on the net where he distributes so it's another kind of interesting idea. It just sends ACP packets. There's no three-way handshake. There's no real TCP session. It's just a bunch of ACP packets of report 80 going back and forth. But it looks just like a netcat session. It's actually open and working. IIS, if I can't get into kernel mode, I can just patch something in the process I'm in. Obviously, it's accepting network input or I wouldn't be there in the first place. So I obviously could patch the location where that network input is processed. No need to get into kernel mode. That's where it looks at all the URL stuff. Worms. I'm not going to go into detail on this. Just know that mobile code is mobile code. Lysine deficiency. I want to talk about this a little bit. In terms of worms, if you are playing with them in the lab, make sure you build a fail-safe mechanism so they don't get out of your lab by mistake. What we do in our lab is we make it so you have to have a particular floppy disk in the drive. If the drive is not there, the worm will not infect the machine. It's all about lysine deficiency, and it's a very, very important thing to do if you're designing worm technology to be responsible, especially if you're distributing it on a website somewhere. Make sure that the lysine deficiency exists. It could do a number of things. If you don't want to deal with floppies, you could just make sure it only works if a particular machine responds to a ping or something like that. Okay, so to recap all this, the injection part is not the same as the payload part. They can be separated. If you're designing a worm remote shell and install a rootkit, there's many challenges with the injection part. Obviously, the way the character is encoded and all characters are obviously a good example of that. The stack size, deal with that. Highland and lowland address, and obviously the result, we could actually call through CPU registers and whatnot to get back to the stack if we can't use a lowland address. Filters can limit what we can put in the payload or in the injection vector. They could limit our opcode set, but we can work around those and figure out how to make working code. Typically, our payload is going to be encoded, X or encoded some way. We can build jump tables. We can load any DLL in any function we want to. We can hard code the addresses or we can load them dynamically. Importantly, also, we should use lysine deficiency if we're doing worm stuff. Thank you very much.