 All right everyone, thanks for coming to this talk. This talk is titled Smashing the Stack with Hydra. We're from Columbia University's Intrusion Detection Systems Lab. My name is Inbo Song, I'm the middle author. This is Pratap, and our advisor, Sal Stolfo, is sitting down here in the middle row. So before I begin, even though I'm giving this talk, Pratap here did a lot of the work for this, so keep your scorn or praise upon him at the end. All right. So an overview of our project. So Hydra is a new polymorphic shellcode engine for X86 platforms. We designed this to bypass signature statistical and emulator-based IDS systems, basically. It does this by integrating several obfuscation techniques into one engine, such as self-cyphering, statistical memory, forking shellcode, and much more. And I'll talk about these in today's talk. So basics that are smashing the stack. This is from LF1's seminal paper and frack. You have, this is the stack frame when you create a, when you call a function, you have the EIP, which is the address of the function that called, this function is what we return to. It sets up the local variables. If you don't have bound checking, you can send a simple exploit with a no-ops-led payload return zone. The return zone overrides EIP, and when the function calls return, it jumps into a no-ops-led and passes into the payload. And you basically lose the execution context of your program to whatever the attacker sent. So just keep in mind what the three default zones look like for a regular shellcode. No-ops-led payload return zone. And hydro basically obfuscates all of these sections and adds more. Okay, so polymorphic shellcode. Why do we need polymorphic shellcode? Well, it's because IDS signatures for a shellcode is easy to write. For example, you can detect a string of hex 90s, and that's your basic no-ops-led. You can look for a bin SH, and that's an indication you're trying to open a shell. So many polymorphic engines use an encoder to Cypher the payload with a random key, but that doesn't work if your decoder is always the same, and IDS can just try and detect the decoder. So the decoder has to be polymorphic. It has to change every time. There's also statistical IDS systems out there that look at byte distributions, multi-gram byte distributions. There are also new IDS systems being introduced now based on dynamic emulation. There's an actual sensor on the network inline with your traffic, trying to execute all the bytes that come across the network. And people have actually gotten this to work on 100 megabit line rates, which is very impressive. There's also dynamic disassembly based IDS, which dynamically disassembles all network traffic and tries to look for large basic blocks. So normal code, normal data should not have large basic blocks when you disassemble them. And if you find large chunks of executable code, large chunks of instructions, it could be an indication of an exploit coming through. So here's a listing of Hydra's features. We have a knob instruction generator, recursive knob sleds. Everything is randomized, including register selection and clearing, randomized multilayer ciphering. We do junk code data insertion, multi-partite decoders. I'll explain what that means. It's basically we take the decoder, break it up into pieces and insert it into the payload itself. Multigrams, statistical memory. This is machine learning techniques to make your shell code look like normal traffic. Randomized return zones. I'll explain what that is. It's basically jumping into different parts of knob sled forking shell code. So there's some recent work now on how do you safely execute an exploit? When you exploit a process, that process typically hangs, it crashes. Your code may run, but it might also throw off an IDS alert. So this is how to safely execute a shell code and have the vulnerable process continue executing. We also do something called time lock ciphering for anti-immulator and anti-disassembly techniques. I'll talk about that. Often in American coding, the shell code can be pushed down to printable characters range. All right, so knob sled obfuscation. Knob's don't have to be hex 90. If you look at the Kled engine, for example, they use the ASCII characters A to Z. Any of those characters are actually valid knob's. They're not, you can't use them in line as knob's because they touch the stack and so on. But you can create a group of these, put them in front of the payload and execute that. And your payload will still be fine. Hydra contains a knob generator that basically goes out and builds a library of all possible knob instructions of multiple bite sizes. And we do this by setting up code to test. We write code to set up stack and register canary variables. Then we build a sled using a potential knob instruction. Then we write a validation code at the end to check those canary variables. And then we execute that payload, basically. And at the end, if that knob sled worked and all the canary variables are fine at the end, then that instruction can be used as a knob sled. And we use this to find approximately two million knob instructions that can be used. So it's not just, you're not limited to just hex 90. There's also multi-character knob's. For example, if you read the original FRAC article by the CLEC team, there's something called a recursive knob sled. So what you do is you find all one-bite knob instructions by brute force, there's only 256 possible choices. And then you find two-bite knob's where the second byte is the first knob, right? So it doesn't matter if you jump into the first byte or the second byte, they're both knob's. And larger knob's instructions recursively contain smaller knob's. So this is why it's called a recursive knob's sled. So hydro distinguishes between two types of knob's instructions, basic knob equivalents, which is what you put in front of the payload. And it basically just catches the execution jump and delivers the execution into the payload safely. And then the second type of knob's are what we call state-safe knob's. State-safe knob's can be inserted between instructions. These are the ops that do not leave the stack in an unsafe state. They don't randomly modify memory. Special conditions are put into place when we search for those. So we found about two million total knob equivalent instructions, the ones you can use to build a traditional knob's sled. And about 30,000 state-safe knob's. So you can see the large amount of variation you can generate in a polymorphic engine. You're not just limited to just hex 90. So register clearing operations, we have many different ways of selecting and clearing registers. Most of these methods involved generating random keys, moving the keys around, and then move registers, subtract, register, and so on. As long as it clears the register at the end. There's a lot of various things before that. Hydro provides a large library of such instructions and a platform to easily add more. So yeah, for some operations, the random keys generated to further obfuscate the payload. Multi-partite decoding. Hydra generates non-contiguous decoders. So in a traditional polymorphic engine, you have your knob's sled, you have your decoder, your payload, and knob's sled will catch the execution jump, pass that into the decoder. The decoder will dynamically reverse whatever encoding method you use to obfuscate the payload. And but typically decoders are contiguous, they're just one block of code. And those can be detected by idea sensors if you try hard enough. Signatures or statistical methods. So in Hydra, what we did was break up the decoder instructions, and then scatter those into the payload itself. The instructions jump between each other while decoding the payload. So you cannot easily write a signature for such a decoder. Currently only by-part decoding is implemented, where we have half of the decoder instructions up front and half in the back, and then we have the payload in between. And these two decoder portions will just jump between each other. But we plan to add more, true multi-part type decoding in the future. So multi-layer ciphering. Multi-layer ciphering is pretty much standard these days for shellcode obfuscation. You use clad, metasploit, even back to the ADM mutate days, people will just exhort the payload with randomly chosen keys. Even back in the mid-90s, people were using 32-bit keys to do encryption on their shellcode payloads. And that works very well against most AV-IDS sensors. So multi-layer ciphering is basically the newer way of doing it. You don't just use exhort or a lot. You can use any type of instruction as long as they're reversible. For example, rotate right, rotate left. You have exhort, add, subtract. There are many different types of instructions you can use as long as the payload is decrypted properly. If you add a key to your payload, you have to subtract it at some point before in the decoder. So if your encoder uses an add, your decoder has to use a subtract. And in Hydra, the cipher order is dynamic every time. It's completely random. You don't get the same cipher operations in each run, and you don't get the same keys in each run. And we'll have demos of this at the end. We use 32-bit keys, and it's generated randomly per invocation. Hydra uses six rounds of ciphering by default. I just lost my screen. Hold on a second. Okay, it's back. Six rounds of ciphering is used by default, but the user can specify a number. Inline junk code insertion. Hydra automatically spaces out your shellcode payload, basically. So you give it your shellcode. It takes that and spreads the instructions apart. In between the instructions, we can insert junk. We can insert arbitrary data, basically. And what we do is we typically insert no op instructions. You can also insert anti-disassembly code, random junk, or you can use this section for statistical memory attacks. Thank you. Statistical memory attacks, which we'll talk about in a minute. So statistical IDS sensors are the new way of detecting shellcode. Basically, they learn statistical models for normal content and try to detect exploits. What happened there? I think I just lost my screen again. Oh, okay. Statistical memory. So Hydra actually uses machine learning-based techniques to make shellcode look like normal traffic. Hydra, all you have to do is provide Hydra with instances of normal traffic that you want to mimic. And they will build statistical models for multi-gram distributions. So what is the frequency of a certain character distribution, certain two-by character distribution, three-byte, up to five, seven, so on. And it will take your shellcode, build a model of it, take normal traffic, build a model of it, and it will tweak your shellcode until the two models look very similar. And it uses machine learning techniques to do that. Unfortunately, I don't have time to explain that, but if you want to know the details of that, just find me afterwards and I'll point you to some papers. So we do this using Markov Chain Monte Carlo. It's basically machine learning technique to sample, build a distribution, sample from that distribution. And if you recall, we have the junk code insertion feature. When we space out the instructions, we take the statistical memory bytes and put it in those spaced out sections. Randomize return zones. So the randomized return address zone is basically a sequence of repeated target addresses. They point to the no-op sled. When you write a simple stack exploit, you're hoping that one of these addresses over ICIP on the stack, when the function returns, it jumps into your no-op sled. And the basic way to randomize this zone is to just add random offsets to each of the individual address components. It breaks signatures by completely randomizing each component of the address zone. So emulator-based IDS systems are the very new, newly introduced techniques. They exist mostly in the academic communities. What you do here is you basically build a stripped down x86 emulator and dynamically execute all network traffic. And look for self-decrypting behavior or large basic blocks. And to defeat this, we basically used something called syscall-based ciphering where we exploit some type of OS functionality to grab a key, which is used to decode the main cipher operations. So Hydro uses the time syscall. And our shellcode will actually call, make the time syscall, get the return value back, and use the most significant bits of that result as the key to decode the main cipher operations. So if you recall, we have the main cipher operations like XOR, those instructions are ciphered based on a key we get from syscall. If the syscall is not handled as most emulators can't, then the shellcode cannot be decoded properly and it will look like random traffic. But when it gets to the actual host and the OS can handle the syscall, then the proper shellcode is properly decoded and it will run. So this also introduces the concept of a shell life, which is where a user can specify how good, how long of a window this shellcode can run for. Because we use the time syscall, we can specify, say, seven most significant bytes and that gives you five minutes for when this shellcode is good. After five minutes, the shellcode will not decrypt properly and it can't be disassembled. Well, it's hard to disassemble it. So network IDS can emulate all possible syscalls. That's basically the main idea behind why we did this time cipher idea. It bypasses emulators and it bypasses dynamic disassembly based methods and it slows down human reverse engineers because it's very hard to figure out exactly where these little tiny mechanisms are. These are very small mechanisms that we put in a few bytes at most. So forking shellcode, this is something that just got recent attention at CanSec West. There's some people from Immunity Inc presented a talk on this. We did this at the exact same time. We didn't know that they were also working on it. So basically you have a target process. You exploit that target process by a stack overflow or something and what happens is that the target process will crash. Thanks. The target process will crash and that's not a desirable result. What you want to do is have that process keep going simultaneously, have your exploit execute. Because if the process crash, if it starts doing weird things then that might alert a sysadmin or some sort of AD sensor. So the solution is that we have mechanisms in Hydra that adds forking features to your shellcode where once the shellcode executes, it immediately forks the child executes the payload and the parent attempts to recover the exploited process. It tries to repair the stack, it tries to figure out the right return address so that the parent can actually go back to normal execution and we'll demonstrate all of these at the end. So this feature is kind of hard to get right. Recovery is very hard. Once you exploit the process, you typically lose EIP. So you have to kind of do some, this is simply on your target to figure out exactly where the offsets are, the proper offsets are. You have to understand the target address space. But once you, if you know how to do that, then Hydra adds all this forking feature in automatically. So, alpha new American coding, most polymorphic engines these days use alpha new American coding. It basically just drops all of your shellcode into printable characters range. We used the alpha two encoder, those functionalities incorporate into Hydra. And we also have the alpha new American no op generators and then the alpha new American no ops are incorporated into the alpha two encoder as well. So the main benefit of using Hydra is that it has all of these different features. They're all modular and they all work together. So this is, if you see on top, that is the traditional shellcode, nobsled payload return zone. And then you have Hydra shellcode, which is you have this recursive sled, alpha decoder, timelock, cypher, fork, and you have the payloads and the decoders all scattered around. So the goal of this is basically make it impossible to recognize that what you're looking at is shellcode. It's impossible to use the signatures for this. It's impossible to use statistical methods for this or emulator methods for this. Well, maybe not impossible, but very hard. That's the goal of this work. And I'll pass it off to Prathap who will demo this. All right. I'll be showing three quick demos here. The first one, if you just run Hydra without any common line options, you get a decryption loop. That's the multilevel cypher operations that'll be used for encoding the shellcode. And the second one would be, this is a small shell script that runs Hydra three times and shows us the alphanumeric encodings of the shellcode, such as three locations. We see that this is one payload. And this is the second payload. And this is the third payload. So we see that the content and the length of all the three strings are different. And this is because we're using two levels of polymorphic encodings. First at the binary level and the second one at the alphanumeric level. And the third demo would be multi-threading test. So we have three, this is a small program that we're going to exploit. We have three functions, main, A and B. So main calls A and A calls B and B is a function that will be exploited. And the goal of the multilevel, I mean multi-threaded shellcode is to return back into main skipping A from B. That is from B, you return back into A and we should be able to see the in main string. And this is the vulnerable, this is where we're exploiting the program. This is the shellcode. So I just compiled it already. So we see a shell here as well as we come into B, we skip A and we come into main. The reason why we skip A is because the payload would actually overwrite the EIP of A. So we just have to get the second, last EIP and that's the reason we come into main. And the third test would be the time lock test, I just invoke Hydra minus seven. So minus T tells Hydra that we need a time locked shellcode and seven is a precision and we get the shellcode here. So we see that the current time is this and depending on the precision, the time the shellcode expires at this particular time. We would actually want a higher precision, I think, let me just invoke it again. This is too small, let me get eight. Yeah, so the time increases now. Okay, okay, yeah, so I just copy this. I don't see though, I have to do that. Okay, it's too long. So apparently this last demo is a bit hard to pull off. This, the resolution is kind of messing up with the time test, I see this one. Yeah, the screen just turned off. Yeah, basically like when you put in an argument for the time lock shellcode, only within that time frame the shellcode would be executable and after that the decoder loop would fail and. Thanks, I think we're out of time, yeah. We should get off before they drag us off the stage. All right, thanks. We'll be around, if you have any questions or want to see a demo, just find us. We'll be walking around.