 Hi, my name is Paul Haas and this is my talk on advanced format string attacks. Hello, Paulie. A little bit of background contents of this going to be a little bit of background on me, an abstract of what format strings are, how you exploit them, a definition of sort of the way in which C functions are made vulnerable to it, context in terms of old, current and new attacks. I'm going to show you my new techniques for exploiting them and actually go through a couple of demos before actually showing you a couple of full exploits, popping root shells without doing any work. Then I'm going to show you my tools, how I do it, finally leading to a conclusion in Q&A. So I work for Redspin Incorporated as the lead web application security engineer. This talk isn't associated with that at all. It's just for fun because I like doing some binary analysis type of stuff. I'm a former CTF champion here at DEF CON, so hopefully I know what I'm talking about. You guys will make a decision for me. And for those of you who don't play Mario Kart apparently, we've got Robot and Tank on Rainbow Road. So the purpose of this talk is to take a vulnerable program like this, get a shell without doing any of this. So a little bit of brief background format string attacks for those who don't know or a vulnerability and a type of C function, printf type functions that basically can lead to shell code executed. Even though it's been sort of resolved, ignorance and vulnerability still exist, otherwise I wouldn't be talking about it. You know, it's especially common in academic exercises, CTFs, a lot of legacy pen tests. The reason I brought this up is actually because one of our clients had this issue and I didn't feel like doing any additional work on it. So I created a tool to exploit it for me. You know, this name of the talk is advanced format string attacks. So I assume you know a little bit about what format string attacks actually do and how they work. Thankfully, the tools I'm going to be talking about don't require any knowledge. They're pretty much pointing click as you'll see. Brief history, format string attacks have been around for about 20 years, sort of reaching the apex around 2000 for being popular, being really well known. The last format string attack that I've seen has been in 2010, so they still are around as you can see. The old technique used to be really kind of a pain to create. It required a long string of doing these manual poppies using %X, %P in order to sort of get to your interesting date on the stack. Often when you want to actually create the exploit, you have to use a variety of other tools. You know, look for show code in your core after a segfault. Basically, you know, it's a lot of having to, you know, read the manual, you know, consult the document, you know. And then once you wrote it for one, you know, program, you couldn't take that work and then bring it anywhere else. So, you know, the current has improved a little bit on this, but not much. It's still sort of the same technique. You know, some of the advances have been direct parameter access. So rather than having to pop, say, 10 things off the stack, you can just reference the 10th parameter directly. But again, you still needed to sort of use external tools to access, you know, where you wanted to actually overwrite your code for. The technique that I've been working on basically utilizes information that because you're dumping the stack and the stack of most programs, C programs, has interesting information in it, you can use this information basically leverage piece by piece of exploit leading to eventual, you know, compromise of the application. For example, most stacks contain data, code addresses, you know, addresses point to code, as well addresses to point to other information on the stack, such as, you know, data pointers or code pointers. And what's also interesting is that the string that we pass for a format string attack is also on the stack. So we can locate that as well as it's offset. And with this information, we basically can locate the address of anything on the stack, which is useful, for example, if we find a code pointer that we want to overwrite on the stack, we can find out the address of that pointer and then overwrite it with our format string attack. So here's the vulnerable code that I've been using. Obviously it's not going to work in secure environments, but, you know, you still see a lot of this, you know, bits and pieces here and there. So, you know, I wouldn't be expected, you know, I would be pretty expected to see, you know, a couple of pen tests that I've been doing for legacy applications. Let's talk about a little bit about the exploit steps for my new method. I first dumped the stack using, you know, an exploit string until I find that exploit string on the stack. I then find the address of that, that format string address on the stack. Then by finding the physical location and the pointer to that location, I basically can calculate the address of any other stack pointer, stack, you know, data that's present in my dump. This basically allows me, if I find any, you know, code pointers, return addresses, to say I can overwrite that address to point back to my shell code, which is also on the stack. One note with this is because I'm sort of leveraging multiple executions of a program to drive this information. It's really helpful to keep the format string length constant because if you change it around, so do stack values and stack addresses, and since you need that information to be relatively constant, it helps to basically just keep the same format string length across all these different runs as I'll show you. The first thing you have to do, stack dumps, two methods. One, you just dump, you know, the whole stack at once using, you know, a bunch of percent P's or percent X's. Second method is to execute a loop with, you know, incrementing values. In this case, I have here just a little bash script. I'll show you a demo of this that does it and makes it really convenient to show what's on the data or what's on the stack. You then take this stack information, which is based on percent P or percent X, is going to be code pointers, and you actually want to convert it to the, you know, string representation of that data. By doing that, you basically can find the actual string that's located through those pointers. And then by doing that, you can find the offset of that string from the stack dump. So right here I have a decently long one-liner for bash. This is basically running our vulnerable printf function. So basically this function, when you pass it, a string basically just prints it to the command line. And you can see it's actually vulnerable to, you know, a trivial format string attack, basically. So this one-liner basically is going to loop through the printf, you know, sending incremental values on the stack in order to see what's there, basically. So what this is, this is the offset that I'm passing. This happens to be the string that I'm passing to printf. This is what printf returns to me, a code or stack pointer. And this is actually what that resolves to in terms of a string. So scrolling up, see if I can find where I'm looking for. You see that at about offset 138, sort of divided between offset 137 to 139, we have percentage 138 dollar sign, which happens to be what I passed on the command line to printf. So here we've used knowledge of the exploit to basically find our string at offset 137, 138. So we use that in future step to sort of help exploit the program. The next thing we have to do, format string address, now that we have the physical location on the stack, we want to find the pointer to that string. Two ways to do this as well, a sequential loop as we did before, which will cause seg faults in the case of if the pointer at the given offset isn't actually a pointer, but some data value, it's going to seg fault, which might not be good in certain environments. For example, on the pen test, if there's aggressive IDS systems. The other way to do it, the more sneaky way is given the stack dump, since we already have all that list, since we have a list of pointers already, we can parse that list for values that actually are on the stack and then just obtain the offsets from those values only. This avoids seg fault is a little bit more elegant and allows us to basically dump all strings in just one run of the program basically. So let me show you the first brute force method, because it's a little bit more easy to run on the command line. How's it going guys? Very good. So same thing, a loop here basically, one liner that's going to basically pass a bunch of things to the same printf function, and try to get both the value as a pointer of the location as well as a string. So you can see here that again it's dumping offsets, showing for example what's the value, the pointer value as well as the string that that pointer resolves to. So going back up a little bit, let's see if we can find it. So at offset 38 pointer that ends with 6 0 8 actually references itself, telling us that that pointer value actually resolves to our pass argument. This is the second step of the attack basically. So given this offset that we found in the first stack dump and the address that we found the second, we basically have a method to allow arbitrary win on the stack. We know that since offset 100 is at, for example, stack pointer BFFF 100, and we know that a pointer is 4 bytes, we know for example that offset 1 is going to be minus 400 values from that offset basically. So we have a return address at offset 10. We can calculate the return or the stack offset of that. And then when we're attempting an exploit, we don't have to get, you know, for example detours or program link table or some other overwrite location. We can just overwrite this location on the stack, which we've already found from a previous step. Even though it's possible for example in advanced format swing attacks to sort of extract this information, we want to be sort of lazy and just have, you know, the operations from previous steps do our work for us. You know, two methods to do this as well. If we know what a return address looks like, they're pretty common on most, you know, Linux systems, we can use those. Or we can also guess them using sort of an algorithm that matches values that are close to each other that aren't data and aren't strings. Or we can also be lazy and just brute force everything on the stack and tell, you know, our exploit succeeds basically, which is a lot easier to do basically. Some issues with this new technique. As I mentioned, if we change the length of our format string, it's basically in effect addresses on the stack since our format string basically, as it grows, so does, so do stack values grow in addition to it. The easy way to resolve this is basically to keep all format strings, you know, during the technique the same size, you know, just pattern with, you know, null characters or useless characters so that when we actually attempt our exploit and append shell code to the format string, you know, that will happen to be the same length as all our previous executions of the program. The result of this is, you know, given two executions of a program, one to dump the stack and two, if you're being really clever in just dumping all the string addresses, you basically can get all the information you need to exploit that function basically. No brute forcing, just a little bit of math. It's also nice because the way this current technique works, you can shrink the length of the format string so that you can fit both the format string and the shell code in, say, less than 100 bytes, which is usually pretty, you know, reasonable for, you know, any user input field basically. However, if you don't brute force the override address or if you don't do some math, you still have to derive it from some other source such as, you know, a core dump or, you know, a known location like a program link table or the detours. So demo is a good heart of the talk. The first tool, which should be on your DEF CON CD, is a proof of concept tool in Python. I have instructions for running on Backtrack 4. Basically, it's a nice little suite for demonstrating sort of the way the attack works. You know, multiple options about where you want to overwrite, where you want to put your shell code. Again, since the proof of concept is missing some useful things, but, you know, those can be really easily added, you know, hopefully by anyone who has an understanding of this talk. So I'll demo the tool right now. You know, taking our same printf function that we know to be vulnerable to format string attack and, you know, just for the sake of vulnerability sync, we'll make it ownable and then executable by root. So, you know, anyone will inherit those permissions. I have a dummy account right now that has limited permissions. Obviously, I want to exploit this printf function, you know, gain root access. I don't actually have to do it to myself, so I'm going to basically use my tool. So that's a little hard to read, but basically it's just going over the options about, you know, where you want the exploit to overwrite and where you want the shell code to be. But the nice thing is you don't really have to read any of that because you can just, you know, click it with the binary and you're in basically. So, and the nice thing is this technique works on any similar exploiting, you know, type of vulnerability basically. The question was, would this work on a syslog thing where the output's going somewhere else? This attack basically works as long as you can read the output from former vulnerabilities and use those in new vulnerabilities, it's going to work basically. The next code I developed, especially for DEF CON, which will be as loud and invasive as possible, has been completely automated, so I sort of removed that reference to having to get an overwrite location. I'm using basically brute-forcing through the stack. I ported both the Python and Ruby to under 100 lines of code, so, you know, hopefully if I have time I'll go through it right now and, you know, talk about it a little bit. Basically it does the same thing except, you know, using a brute-force attack. So, let me show you that as well. So, again, same printf function, same dummy user, different binary. So, again, you just pass it the vulnerable function as a parameter and you get a root shell. So, even though you can't see it right now, for example, if I exit, it's actually brute-forcing a bunch of spaces trying to go through all the stack values until it finds it. So, you know, as long as it's there somewhere you're going to get in, so if that's all that matters, you can be as invasive as possible. I'll show you the Ruby version. It does the same thing, but for those who like Ruby better, you also can get a shell. Finally, sort of the meat of this talk. You know, if you're going to have this suite program, she might as well port it where someone can use it. In this case, I decided to move it to Metasploit, add it for the capability for remote exploit. The usefulness of this is that, you know, you can use an arbitrary payload, anything from Metasploit. You know, the code is there in Metasploit and there's a lot of applications. So, you know, if you want to extend, you know, your functions for another vulnerability, you can just use Metasploit for it. And for this, I created, you know, a sample vulnerable server that basically does the same thing, you know, as our printf vulnerable function except, you know, through a TCP connection. So, let me demo that right now. So, I'm not going to go through this too much, but basically it opens a portal on 4.5.4.6. You know, listens forks the process for, you know, step for connections. And then it basically sends, you know, SN printf vulnerability right here, you know, back to that socket that you connected to. So, let me make this and go ahead and run it. So, this is running on my VMware. I'm going to try to exploit it on my local inventory machine. Metasploit has taken a while, don't worry. That's correct. This assumes, you know, this is for kind of legacy environments where a lot of those protections are non-existent. So, while Metasploit is loading right now, I'll go ahead and just show you that it indeed basically just, you know, by piping something through Netcat to that server and through that port, it basically does the same thing, which is sort of this, you know, data disclosure. How's it going, guys? Very good. Oh, these are my co-workers from Redspen. So, they're here to play Mario Kart with me, basically. They don't have any relation to the talk whatsoever. Okay, here we go. So, right now I'm going to load my custom Metasploit module. Show payloads. What do I want to exploit today? I'm going to just use a reverse TCP buying shell to see all my options for this. Looks like it accepts a remote host, remote port local host. So, I'm going to go ahead and set those. One-third in this case is my VMware machine and 58.1 is my local machine. Hopefully, this will work. So, the nice thing is, since we're brute forcing the stack, you know, any return address on that stack is going to give us a shell. In this case, it happens to be, you know, six. Six, you know, overwrite return addresses on the other program. So, now you can share a shell with your friends and family if you would so choose to. And we can verify that by, you know, listing the sessions and then sending a command to them. So, you can see that, you know, I compromised this application six times and, you know, got root access on all of them. Looks like I'm doing really well on time, so I'm going to go through, you know, my brute forcing code in Python. Hopefully, you guys will get something out of it. It's only about 100 lines of code. Four steps. So, first thing I do, basically, is just, you know, standard, basically just have, you know, initialization, you know, setting up my variables, defining some functions. Second step, I basically go through that stack dump process as I showed you before. You know, I use this data, basically, to find, you know, an offset address of the found format string. The next step after that, basically, is, you know, I'm looking for that, the address of that format string. So, basically, I run another loop to find that address. And finally, when I'm actually exploiting, all I do is create the exploit string and then run through this brute forcing technique. So, you know, 68 lines of code, you know, hopefully pretty extendable. Used on a variety of, you know, similar instances. So, hopefully, you guys will find some use out of it. In conclusion, you know, format string output gives you everything you need to actually go from discovery to compromise. You know, it can be completely automated, as I've shown. You know, they have been easy to fix. Now they're easy to exploit. You know, for those who are interested in finding some of them, a good suggestion, Google code, a good dork to look for, for, you know, this type of thing is shown there. Let's basically look for a C language print type function, print app type function that doesn't begin with something like a constant, something that doesn't become something like STDIN or STDER, or something that begins with a parentheses. So, hopefully, something that is actually using, you know, a buffer, you know, a non-command line or something that uses something that the user is providing in terms of a buffer, basically. So, thanks. Hopefully, the tools that aren't on your CD will be put on Redspin and Defcon shortly, Monday, I'm hoping. I'm hoping that the Metasploit module will also be there as soon as I talk to those responsible for weird auto-format streaming stuff. You can contact me. Otherwise, you know, shout out to all those people at Shellfish and my people playing Mario Kart with me. Questions, Mario Kart, hopefully... It looks like we have plenty of time, but hopefully we'll be in another room if you don't have any right now. Can you repeat that one more time, please? So, the question was, does my approval concept assume that there's direct parameter access to exploit it? And the answer is yes. So, I basically assume that, you know, most modern versions or most versions that I've seen have used that direct parameter access. Again, the technique can still be modified. You just have to do it yourself. I don't know, I think it would be a good exercise, but, you know, I wouldn't be surprised if you could do it. You know, because if you dump the stack enough times, you're still going to have a general idea about where stuff is relative to everything else on the stack. So, you know, you still could do some sort of permutation attacks, and if you're brute-forcing anyway, you know, it's only a matter of time. Okay, guys, thanks a bunch.